Acknowledgment: This article was prepared within the research project “Bayesian simultaneous equations model averaging — theoretical development and R package,” funded by the Polish National Science Centre, decision No. DEC-2021/43/B/HS4/01745.
Abstract. This article introduces the
rmsBMA R package for Bayesian model averaging in settings
with many candidate regressors relative to the available sample size.
The package implements reduced model space BMA by restricting
attention to models with at most \(M\)
regressors. This restriction preserves degrees of freedom for estimation
and inference and concentrates posterior mass on parsimonious
specifications, while still accounting for model uncertainty within the
admissible subset of models.
rmsBMA supports standard prior structures used in the
BMA literature, including Zellner-type \(g\)-priors and alternative model priors
with appropriate truncation under model space reduction. The package
further provides heteroscedasticity-consistent inference, Extreme Bounds
Analysis, jointness measures, and graphical tools for exploring prior
and posterior model probabilities, model size distributions, and
coefficient distributions. In addition, it introduces group
dilution, a non-empirical rule that penalizes models containing
multiple proxies for the same theoretical construct. An empirical
application illustrates the workflow.
The problem of model uncertainty plays a crucial role in scientific research that relies on statistical inference (Leamer 1978). In recent decades, the most prominent solution to this problem has been Bayesian model averaging (BMA) (Kass and Raftery 1995; Raftery 1995; Raftery, Madigan, and Hoeting 1997). This approach has become especially popular in economics1 (e.g., Liu and Maheu 2009; Ductor and Leiva-Leon 2016; Figini and Giudici 2017; D’Andrea 2022; Horvath, Horvatova, and Siranova 2024; K. Beck and Grabowski 2025); however, it has also been widely adopted in other fields (e.g., Sloughter, Gneiting, and Raftery 2013; Baran and Möller 2015; Aller, Ductor, and Grechyna 2021; Guliyev 2024; Payne, Ray, and Thomann 2024; J. Beck et al. 2025). To this day, BMA remains one of the primary tools for examining the determinants of economic growth (Fernández, Ley, and Steel 2001a, 2001b; Sala-I-Martin, Doppelhofer, and Miller 2004; Eicher, Papageorgiou, and Roehn 2007; Ley and Steel 2012; Moser and Hofmarcher 2014; Arin, Braunfels, and Doppelhofer 2019).
The growing interest in BMA has been fueled by the availability of R
packages such as BMA (Raftery,
Painter, and Volinsky 2005), BAS (Clyde, Ghosh, and Littman 2011),
BMS (Feldkircher and Zeugner
2015), and badp (K. Beck,
Wyszyński, and Dubel 2025), as well as the gretl BMA
package developed by Błażejowski and Kwiatkowski
(2015). Each of these packages provides unique functionalities.
The BMA package enables model averaging for generalized
linear models and survival models, BAS implements efficient
sampling algorithms, BMS offers a comprehensive prior
structure and adaptive sampling procedures, while badp
addresses the issue of simultaneity2. The package proposed by Błażejowski and Kwiatkowski (2015) extends BMA
methodology to the gretl environment.
rmsBMA extends this line of software by addressing
issues that remain unexplored in existing implementations. In many
empirical applications, researchers face situations in which the number
of potential determinants is very large relative to the available sample
size. In extreme cases, the number of candidate regressors may even
exceed the number of observations. Such situations frequently arise when
long questionnaires generate rich information on a limited number of
subjects. This setting often produces an extensive list of theory-driven
explanatory variables intended to capture specific dimensions of
behavior. Consequently, researchers aim to extract a limited subset of
regressors that explain variation in the dependent variable in a robust
and economically meaningful way. This objective is consistent with the
principle of model parsimony in applied research. A classical example is
the literature on the money demand function, which considers a broad set
of potential determinants. Nevertheless, a workable theory of money
demand typically emphasizes only a small number of key variables (Judd and Scadding 1982; Laidler 1993).
The challenge posed by a large number of potential regressors
relative to a limited sample size can be addressed within the BMA
framework by restricting the model space. In particular, the researcher
may limit attention to models containing at most \(M\) regressors. This constraint ensures
that a sufficient number of observations remains available for
meaningful statistical inference, while simultaneously focusing the
analysis on the most relevant determinants. rmsBMA
facilitates Bayesian model averaging within a reduced model space and
provides graphical and analytical tools that allow users to fully
exploit this framework.
Another challenge in selecting robust determinants from a large pool of regressors is multicollinearity. In the Bayesian model averaging and model selection literature, this issue is typically addressed through the dilution prior proposed by George (2010). However, this approach is empirical in nature, as it relies on the correlation structure of the regressors and therefore departs from a fully Bayesian treatment of prior specification.
To address this limitation, the rmsBMA package
introduces a non-empirical dilution rule, termed group
dilution. This approach allows users to specify dilution parameters
for groups of variables associated with the same underlying theory. As a
result, prior probabilities can be appropriately adjusted for models
that include multiple proxies capturing the same fundamental
determinant.
Finally, the rmsBMA package allows users to specify a
rich prior structure, compute jointness measures (Doppelhofer and Weeks 2009; Ley and Steel 2007;
Hofmarcher et al. 2018), conduct Extreme Bounds Analysis (EBA)3, perform
model selection procedures, and generate a wide range of graphical
representations of the results. Consequently, rmsBMA
combines capabilities that are not jointly available in existing R
packages or alternative software environments and further contributes
original methodological innovations.
The remainder of the manuscript is structured as follows. BMA
provides a theoretical overview of Bayesian model averaging, with a
particular focus on Bayesian estimation, prior structures, jointness
measures, and Extreme Bounds Analysis. Data preparation is described in
data, while model_space discusses the construction and estimation of the
model space. using_bma presents an overview of the rmsBMA
functions for performing Bayesian model averaging, computing jointness
measures, and presenting the estimation results. The details of the
model prior specifications are discussed in priors. Finally, sum
concludes.
This section outlines the main theoretical foundations of the package. setup presents the model setup, \(g\)-regression, and the calculation of the marginal likelihood. The principal BMA statistics are described in bma. Subsections g_prior, model_prior, and dilution_prior discuss the \(g\)-prior, model prior, and dilution prior specifications. Jointness measures are examined in joint. Finally, eba introduces Extreme Bounds Analysis.
rmsBMA is designed to accommodate variations of the
standard linear regression model:
\[ y = \alpha l_N + X_j \beta_j + \varepsilon, \]
where \(y = (y_1, y_2, \ldots, y_N)'\) denotes the \(N \times 1\) vector of the dependent variable and \(N\) is the total number of observations. The parameter \(\alpha\) is a scalar intercept term, and \(l_N\) is an \(N \times 1\) vector of ones. The matrix \(X_j\) is an \(N \times k_j\) matrix of regressors included in model \(j\), and \(\beta_j\) is the corresponding \(k_j \times 1\) vector of coefficients. The index \(j\) denotes the model specification, the total number of possible models is \(2^K\) (including the model with no regressors), \(K\) denotes the total number of potential regressors, and \(k_j\) is the number of regressors in model \(j\). The error term \(\varepsilon\) is an \(N \times 1\) vector assumed to follow a normal distribution,
\[ \varepsilon \sim N(0_N, h^{-1} I_N), \]
where \(h = \sigma^{-2}\) is the error precision. As is common in the literature, noninformative priors are imposed on parameters that are common to all models (Koop 2003). The prior on the precision parameter is given by
\[ p(h) \propto \frac{1}{h}, \]
while the prior on the intercept is specified as
\[ p(\alpha) \propto 1. \]
The prior on the coefficient vector \(\beta_j\) is assumed to be normally distributed:
\[ \beta_j \mid h \sim N(\underline{\beta_j}, h^{-1}\underline{V_j}). \]
The prior mean reflects the assumption that regressors have no systematic effect on the dependent variable. Accordingly,
\[ \underline{\beta_j} = 0_{k_j}. \]
The prior covariance matrix \(\underline{V_j}\) is specified using Zellner’s \(g\)-prior (Zellner 1986), such that
\[ \underline{V_j} = \left(g_j X_j^{\top} X_j\right)^{-1}, \]
where \(g_j\) is a hyperparameter chosen by the user. Consequently,
\[ \beta_j \mid h \sim N\!\left(0_{k_j}, h^{-1}\left(g_j X_j^{\top} X_j\right)^{-1}\right). \]
The posterior on \(\beta_i\) follows a multivariate t distribution with mean
\[ E(\beta_j|y,M_j) \equiv \bar{\beta_j} = \bar{V_j}X_j^{\top}y \]
and covariance matrix
\[ Var(\beta_j|y,M_j) = \frac{N\bar{s_{j}^2}}{N-2}\bar{V_j} \]
where \(M_j\) denotes model \(j\) the number of degrees of freedom,
\[ \bar{s}_{j}^2 = \frac{\frac{1}{1+g_j}y^{\top}P_{Z_j}y+\frac{g_j}{1+g_j}(y-\bar{y}l_N)^{\top}(y-\bar{y}l_N)}{N} \]
and
\[ P_{Z_j} = I_N - Z_i(Z_j^{\top}Z_j)^{-1}Z_j^{\top} \]
with \(Z_j = (l_N,X_N)\). Within this framework the marginal likelihood of model \(j\) is given by
\[ p(y|M_j) \propto \left(\frac{g_j}{1+g_j}\right)^{-\frac{k_j}{2}}\left[\frac{1}{1+g_j}y^{\top}P_{Z_j}y+\frac{g_j}{1+g_j}(y-\bar{y}l_N)^{\top}(y-\bar{y}l_N)\right]^{-\frac{N-1}{2}} \]
There are two main extensions to that framework. Firstly, the user might want to opt out of \(g\)-regression and perform Bayesian model averaging using classical OLS estimates4 - the so-called Bayesian averaging of classical estimates (BACE) approach proposed by Sala-I-Martin, Doppelhofer, and Miller (2004). Within this approach marginal likelihood of a model \(M_j\) may be written as (Leamer 1978):
\[ l_(y|M_j) \propto N^{-k_{j}}\left[(y-\bar{y}l_N)^{\top}(y-\bar{y}l_N)\right]^{-\frac{N}{2}} \]
The second extension allows the user to utilize a heteroscedasticity-consistent covariance matrix5. In this case, the homoscedastic variance expression derived under the normal error assumption is replaced with the HC sandwich estimator of MacKinnon and White (1985),
\[ \widehat{\mathrm{Var}}_{HC}(\hat{\beta}_j) = \frac{N}{N-k_j}(Z_j^{\top} Z_j)^{-1}Z_j^{\top}\mathrm{diag}(\hat{\varepsilon}_j^{2})Z_j(Z_j^{\top} Z_j)^{-1}, \]
where \(\hat{\varepsilon}_j = y - Z_j \hat{\beta}_{OLS,j}\) denotes the vector of OLS residuals. This modification provides inference that remains valid under unknown forms of heteroscedasticity. While posterior means are derived under the \(g\)-prior specification, the robust covariance matrix replaces the homoscedastic variance estimator for inference purposes.
The setup described in setup can be used to construct the model space for a given set of potential regressors. The total number of possible models equals \(2^K\). However, within the reduced model space approach, the user may restrict attention to models containing at most \(M\) regressors. In this case, the total number of admissible models \((J)\) is given by
\[ J = \sum_{m=0}^{M} \binom{K}{m}. \]
Given the considered model space it is possible to perform Bayesian model averaging6. The posterior model probability (PMP) of model \(j\) given the data is
\[ P(M_{j}|y)=\frac{P(y|M_{j})P(M_{j})}{\sum_{j=1}^{J}P(y|M_{j})P(M_{j})} \]
where \(P(M_{j})\) denotes prior model probability. In other words, the PMP represents the share of model \(j\) in the total posterior probability mass. Alternatively, for BACE the PMP for model \(j\) is calculated as:
\[ P(M_{j}|y)=\frac{l(y|M_{j})P(M_{j})}{\sum_{j=1}^{J}l(y|M_{j})P(M_{j})}. \]
With PMPs, it is possible to calculate useful BMA statistics. Let’s denote by \(\pi_k\) the random variable which is equal to one if the \(k^{\text{th}}\) regressor should be considered as the determinant of the dependent variable. The posterior inclusion probability (PIP) for the regressor is given by:
\[ P(\pi_k = 1 |y) = \sum_{j=1}^{J} \mathbf{1} (k^{\text{th}} \text{ regressor is in model } M_{j}) \cdot P(M_{j}|1) \]
where the indicator function \(\mathbf{1}\) is equal to one if the regressor is part of the model \(M_j\) and zero otherwise. In other words, the PIP tells us how likely it is that the given regressor has impact on the variable of interest.
Another important statistic is the posterior mean (PM) of a given parameter \(\beta_k\). Let’s denote by \(\pi_{\beta_k}\) the random variable which is equal to one if the given parameter is present in the model, and zero otherwise. The posterior mean of \(\beta_k\) is given by:
\[ \mathbb{E}(\beta_k|y)=\sum_{j=1}^{J}\widehat{\beta_k}_{j} \cdot P(M_{j}, \pi_{\beta} = 1 |y) \]
where \(\widehat{\beta_k}_{j}\) is the value of the coefficient \(\beta_k\) in model \(j\). It tells us what is the mean (or expected) value for the parameter taking into account all considered models.
The posterior variance of the parameter \(\beta_k\) is equal to:
\[ \begin{split} Var(\beta_k|y)=\ &\sum_{j=1}^{J} Var(\beta_{k,j}|y,M_j)\cdot P(M_{j},\pi_{\beta_k}=1|y) +\\[1ex] &\sum_{j=1}^{J}\Bigl[\widehat{\beta}_{k,j}-\mathbb{E}(\beta_k|y)\Bigr]^{2}\cdot P(M_{j},\pi_{\beta_k}=1|y) \end{split} \]
where \(Var(\beta_{k,j}|y,M_{j})\) denotes the conditional variance of the coefficient \(\beta\) in model \(M_{j}\) (in other words assuming that the model \(M_j\) is the true model). Posterior standard deviation (PSD) of \(\beta_k\) is then defined as the square root of the variance:
\[ SD(\beta_k | y) = \sqrt{ Var(\beta_k | y) } \]
Alternatively, one might be interested in the values of the mean and variance on the condition of inclusion of a given parameter, i.e. assuming that it is definitely a part of the model. Note that this is usually determined by the presence of a related regressor. The conditional posterior mean (PMcon) for a parameter \(\beta_k\) is given by:
\[ \mathbb{E}(\beta_k | \pi_{\beta_k}=1,y)=\frac{\mathbb{E}(\beta_k|y)}{P(\pi_{\beta_k} = 1|y)}. \]
Similarly, the conditional variance is:
\[ Var(\beta_k|\pi_{\beta_k}=1,y)=\frac{Var(\beta_k|y)+\mathbb{E}(\beta_k|y)^2}{P(\pi_{\beta_k}=1|y)}-\mathbb{E}(\beta_k|\pi_{\beta_k}=1,y)^2 \]
and so the conditional standard deviation (PSDcon) is:
\[ SD(\beta_k | \pi_{\beta_k}=1,y) = \sqrt{ Var(\beta_k | \pi_{\beta_k}=1,y)} \]
The next set of statistics that can be calculated within the BMA framework concerns the assessment of the sign of a coefficient associated with a specific regressor. The first measure is the posterior sign certainty (PSC), defined as the posterior probability that a coefficient has the same sign as its posterior mean. PSC is given by
$$ P((_k)y) = \[\begin{cases} \displaystyle \sum_{j=1}^{J} P(M_j \mid y)\, CDF\!\left(t_{k,j}; \mathrm{DF}_j\right), & \text{if } \operatorname{sign}\!\left[E(\beta_k \mid y)\right] = 1, \\[12pt] \displaystyle 1 - \sum_{j=1}^{J} P(M_j \mid y)\, CDF\!\left(t_{k,j}; \mathrm{DF}_j\right), & \text{if } \operatorname{sign}\!\left[E(\beta_k \mid y)\right] = -1, \end{cases}\]$$
where
\[ t_{k,j} \equiv \frac{\hat{\beta}_{k \mid M_j}} {\widehat{\mathrm{SD}}_{k \mid M_j}}, \]
\(\mathrm{DF}_j\) denotes the number of degrees of freedom in model \(j\), and \(\mathrm{CDF}(\cdot;\mathrm{DF}_j)\) denotes the cumulative distribution function of the Student-\(t\) distribution with \(\mathrm{DF}_j\) degrees of freedom.
The second measure is the posterior probability of a positive sign of the coefficient, denoted by \(P(+)\). It is defined as
\[ P(\beta_k > 0 \mid y) = \sum_{j \in \mathcal{M}_k} P(M_j \mid y)\, CDF\!\left( t_{k,j}; \mathrm{DF}_j \right). \]
While PSC measures the probability that the coefficient retains the same sign as its posterior mean, \(P(+)\) directly measures the posterior probability that the coefficient is strictly positive.
The BMA statistics allow the assessment of the robustness of the
examined regressors. Raftery (1995),
classifies a variable as weak, positive, strong, and very strong when
the posterior inclusion probability (PIP) is between 0.5 and 0.75,
between 0.75 and 0.95, between 0.95 and 0.99, and above 0.99,
respectively. Raftery (1995) also refers
to the variable as robust when the absolute value of the ratio of
posterior mean (PM) to posterior standard deviation (PSD) is above 1,
indicating that the regressor improves the power of the regression.
Masanjala and Papageorgiou (2008) propose
a more stringent criterion, where they require the statistic to be
higher than 1.3, while Sala-I-Martin,
Doppelhofer, and Miller (2004) argue for 2, corresponding to
$90
In order to obtain the marginal likelihood function given in equation
g_like and described in setup, the user needs to specify the
hyperparameter g in Zellner’s \(g\)-prior. The parameter g
controls the prior variance of the regression coefficients and thus
determines the degree of shrinkage toward zero. Larger values of
g imply weaker shrinkage (i.e., more diffuse priors), while
smaller values impose stronger shrinkage toward the null model.
rmsBMA package offers the user the choice between the
following conventionally applied choices for \(g\):
Unit information prior (Kass and Wasserman 1995)
\[g = \frac{1}{N}\]
Risk inflation criterion (Foster and George 1994)
\[g = \frac{1}{K^{2}}\]
Benchmark prior (Fernández, Ley, and Steel 2001a)
\[g = \frac{1}{\max\!\left(N, K^{2}\right)}\]
Prior that mimics Hannan-Quinn information criterion (Fernández, Ley, and Steel 2001a)
\[g = \frac{1}{\left(\log m\right)^{3}}\]
Square root of unit information prior (Fernández, Ley, and Steel 2001a)
\[g = \sqrt{\frac{1}{N}}\]
Moreover, the user may specify any strictly positive numerical value for the hyperparameter \(g\).
To perform BMA one needs to specify prior model probability7. The package offers two main options. The first is binomial model prior (Sala-I-Martin, Doppelhofer, and Miller 2004):
\[ P(M_{j})=(\frac{EMS}{K})^{k_{j}}(1-\frac{EMS}{K})^{K-k_{j}} \]
where \(EMS\) is the expected model size and \(k_{j}\) is a number of regressors in model \(j\). If \(EMS = \frac{K}{2}\), the binomial model prior simplifies to a uniform model prior with \(P(M_{j}) = \frac{1}{2^K}\) for every \(j\), meaning that all models are assumed to have equal probabilities. The second is binomial-beta model prior Ley and Steel (2009) given by:
\[ P(M_{j}) \propto \Gamma(1+k_{j}) \cdot \Gamma(\frac{K-EMS}{EMS}+K-k_{j}). \]
where \(\Gamma\) is the gamma function. In the context of the binomial-beta prior \(EMS = \frac{K}{2}\) corresponds to equal probabilities on model sizes.
When reduced model space Bayesian model averaging is performed, both the binomial and beta-binomial model priors are truncated. As a result, the prior distribution over model size is restricted to models containing at most \(M\) regressors (where \(M\) is specified by the user), and the prior is renormalized so that the total probability mass over this admissible subset equals one.
In order to account for potential multicollinearity between regressors one can use dilution prior introduced by George (2010). The dilution prior involves augmenting the model prior (binomial or binomial-beta) with a function that accounts for multicollinearity:
\[ P_{D}(M_{j}) \propto P(M_{j})|\mathrm{COR}_j|^{\omega} \]
where \(P_{D}(M_{j})\) is the diluted model prior, \(|\mathrm{COR}_j|\) is the determinant of the correlation matrix of regressors in model \(j\), and \(\omega\) is the dilution parameter. The lower the correlation between regressors, the closer \(|\mathrm{COR}_j|\) is to one, resulting in a smaller degree of dilution.
The dilution prior proposed by George
(2010) is empirical in nature, as it relies on the determinant of
the correlation matrix of the regressors, denoted by \(|\mathrm{COR}_j|\), and therefore departs
from a fully Bayesian treatment of prior specification. To address this
limitation, the rmsBMA package introduces a non-empirical
dilution rule, termed group dilution. This approach enables
users to specify dilution parameters for groups of variables
representing the same underlying theoretical construct. Consequently,
prior model probabilities can be systematically adjusted for
specifications that include multiple proxies for a common fundamental
determinant.
As group dilution is introduced here for the first time, it is presented in two stages. First, the case of a single group-specific dilution parameter is considered. Second, the framework is extended to allow for a vector of group-specific dilution parameters.
For the case of a single group-specific dilution parameter, let \(\gamma_j = (\gamma_{j,1}, \gamma_{j,2}, \ldots, \gamma_{j,K}) \in \{0,1\}^K\) denote the indicator vector associated with model \(j\), where \(\gamma_{j,k} = 1\) if regressor \(k\) is included in the model and \(\gamma_{j,k} = 0\) otherwise.
Each regressor \(k\) is assigned to a group \(\mathrm{GR}(k) \in \{0,1,2,\ldots\}\), where \(\mathrm{GR}(k) = 0\) indicates that the regressor does not belong to any group, while positive integers identify groups of related regressors. The grouping structure is specified by the researcher, who assigns regressors associated with the same theoretical construct to common groups labeled \(h=1,2,\ldots\).
Let \(p \in (0,1]\) denote a scalar group-specific dilution parameter that reflects the researcher’s prior belief regarding the extent to which multiple proxies capture the same underlying determinant. For a given model \(j\) and group \(h \ge 1\), define the number of included regressors from group \(h\) as
\[ c_{j,h} = \sum_{k=1}^{K} \gamma_{j,k} \, \mathbb{I}\{ \mathrm{GR}(k) = h \}, \]
where \(\mathbb{I}\{\cdot\}\) denotes the indicator function. The group-specific dilution exponent is defined as
\[ d_{j,h} = \max\{0,\, c_{j,h} - 1\}. \]
Thus, models including zero or one regressor from group \(h\) incur no penalty, while each additional regressor beyond the first contributes one unit to the dilution exponent. The total dilution exponent for model \(j\) is obtained by summing across all groups,
\[ D_j = \sum_{h \ge 1} d_{j,h} = \sum_{h \ge 1} \max\{0,\, c_{j,h} - 1\}. \]
The resulting group-based dilution prior component for model \(j\) is given by
\[ \phi_j \propto p^{D_j}. \]
This construction penalizes models that include multiple regressors from the same group, while allowing regressors from different groups to enter the model independently. Regressors not assigned to any group (\(\mathrm{GR}(k)=0\)) do not affect the group dilution prior. Finally, the diluted model prior is given by:
\[ P_{G}(M_{j}) \propto P(M_{j})p^{D_j}. \]
The second case, involving a vector of group-specific dilution parameters, generalizes the previous specification by allowing heterogeneous dilution across groups. This extension acknowledges that the researcher may assign different degrees of prior redundancy to different sets of proxies. Let
\[ \mathbf{p} = (p_h)_{h \ge 1} \]
denote a collection of dilution parameters, where \(p_h \in (0,1]\) is assigned to group \(h\). Using the previously defined quantities \(c_{j,h}\) and \(d_{j,h}\), the group-based dilution prior component for model \(j\) is given by
\[ \phi_j = \prod_{h \ge 1} p_h^{\, d_{j,h}} = \prod_{h \ge 1} p_h^{\, \max\{0,\, c_{j,h} - 1\}}. \]
Thus, each group contributes independently to the overall dilution of model \(j\), with the strength of the penalty governed by its corresponding parameter \(p_h\). Regressors not assigned to any group (\(\mathrm{GR}(k)=0\)) do not affect the dilution prior. When all group-specific parameters are equal, i.e., \(p_h = p\) for all \(h \ge 1\), the dilution prior reduces to
\[ \phi_j = p^{D_j}, \]
which corresponds to the scalar-parameter case.
To determine whether regressors are substitutes or complements, various authors have developed jointness measures8. Assuming two different covariates \(a\) and \(b\), let \(P(a\cap b)\) be the posterior probability of the inclusion of both variables, \(P(\overline{a}\cap \overline{b})\) the posterior probability of the exclusion of both variables, \(P(\overline{a}\cap b)\) and \(P(a\cap \overline{b})\) denote the posterior probability of including each variable separately. The first measure of jointness is simply \(P(a\cap b)\).However, this measure ignores much of the information about the relationships between the regressors. Doppelhofer and Weeks (2009) measure is defined as:
\[ J_{DW}=\text{log}\left[\frac{P(a\cap b) \cdot P(\overline{a}\cap \overline{b})}{P(\overline{a}\cap b) \cdot P(a\cap \overline{b})}\right]. \]
If \(J_{DW} < -2\), \(-2 < J_{DW} < -1\), \(-1 < J_{DW} < 1\), \(1 < J_{DW} < 2\), and \(J_{DW} > 2\), the authors classify the regressors as strong substitutes, significant substitutes, not significantly related, significant complements, and strong complements, respectively. Jointness measure proposed by Ley and Steel (2007) is given by:
\[ J_{LS}=\frac{P(a\cap b)}{P(\overline{a}\cap b)+P(a\cap \overline{b})}. \]
The measure takes values in the range \([0, \infty)\), with higher values indicating a stronger complementary relationship.
Finally, Hofmarcher et al. (2018) measure of jointness is:
\[ J_{HCGHM}=\frac{(P(a\cap b)+\rho) \cdot P(\overline{a}\cap \overline{b})+\rho)-(P(\overline{a}\cap b)+\rho) \cdot P(a\cap \overline{b})+\rho)}{(P(a\cap b)+\rho) \cdot P(\overline{a}\cap \overline{b})+\rho)+(P(\overline{a}\cap b)+\rho) \cdot P(a\cap \overline{b})+\rho)+\rho}. \]
Hofmarcher et al. (2018) advocate the use of the Jeffreys (1946) prior, which results in \(\rho=\frac{1}{2}\). The measure takes values from -1 to 1, where values close to -1 indicate substitutes, and those close to 1 complements.
Extreme Bounds Analysis (EBA) was developed by Leamer and Leonard (1981),Leamer+1983,Leamer+1985. EBA provides one of the most restrictive tests of robustness for the regressors under examination. The application of this approach led to well-known findings, such as the fragility of growth determinants reported by Levine and Renelt (1992). The restrictiveness of EBA subsequently motivated the development of less stringent variants of the method (Granger and Uhlig 1990; Sala-I-Martin 1997), and ultimately contributed to the emergence of Bayesian model averaging.
However, the restrictiveness of EBA is also one of its main virtues:
variables that pass the EBA test can be regarded as robust determinants
in the strongest possible sense. The rmsBMA package allows
the user to implement a highly restrictive version of EBA to identify
which regressors satisfy this demanding robustness criterion.
The idea of the test involves estimating all considered \(J\) models and identifying the specifications that yield the lowest and highest estimated coefficient values for a given regressor \(k\). The upper extreme bound (\(B_{k,U}\)) for regressor \(k\) is calculated as
\[ B_{k,U} = \beta_{k,\max} + 2*SE_{k,\max}, \]
and the lower extreme bound (\(B_{k,L}\)) as
\[ B_{k,L} = \beta_{k,\min} - 2*SE_{k,\min}, \]
where \(\beta_{k,\max}\) and \(\beta_{k,\min}\) denote the highest and lowest estimated values of the coefficient for variable \(k\), respectively, and \(SE_{k,\max}\) and \(SE_{k,\min}\) are the corresponding standard errors.
The EBA test is passed if
\[ \mathrm{sgn}(B_{k,U}) = \mathrm{sgn}(B_{k,L}). \]
Consequently, this test is extremely stringent, as even a single poorly specified model may lead to its failure—a situation analogous to a false negative result. Nevertheless, if a regressor passes this test, its robustness is difficult to question.
This section demonstrates how to prepare the data for estimation. The first step involves installing the package and subsequently loading it into the R session.
Throughout the manuscript, we use the data from K. Beck (2020) on the determinants of
international trade in the European Union (Trade_data and
Trade_data_small). The package includes this datasets
together with a detailed description of all variables. Since
Trade_data is cross-sectional, while the package provides
dedicated functionality for panel data analysis, we begin the
presentation of the package using panel data on the determinants of
international migration (migration_panel) from (Afonso, Alves, and Beck 2025). Detailed
information on the dataset can be accessed by:
A visual inspection of the dataset can be performed by typing:
## Year_0 Pair_ID Migration MigrationLAG Unempl Earn Tax
## 1 2000 Austria-Belgium 0.2129378562 0.045619177 3.00 1603.506 10.052
## 2 2000 Austria-Czechia 0.2253256590 0.248093396 2.96 16950.652 8.924
## 3 2000 Austria-Denmark 0.0438465263 0.050595928 0.68 3344.318 7.284
## 4 2000 Austria-Estonia 0.0006882383 0.003676347 5.58 17559.100 10.484
## 5 2000 Austria-Finland 0.0186243528 0.014729586 4.04 1384.974 1.312
## 6 2000 Austria-France 0.0997188822 0.043595237 3.46 1112.894 3.586
## 7 2000 Austria-Germany 0.2866766860 0.041541835 4.20 399.480 10.260
## 8 2000 Austria-Greece 0.2938173505 0.022388828 5.58 7028.734 9.238
## 9 2000 Austria-Hungary 0.0530022686 0.172597748 1.00 17712.410 2.918
## 10 2000 Austria-Ireland 0.0236846386 0.118876974 0.70 5127.134 9.300
The first two columns represent the time identifier and the cross-sectional identifier; the cross-sectional unit in this dataset is a country pair. The third column contains the dependent variable, while the remaining columns correspond to the regressors.
The researcher may wish to estimate a fixed effects model or
standardize the data. This can be accomplished using the
data_preparation function. As an illustration, we introduce
both time and cross-sectional fixed effects, along with data
standardization. The user should specify which column indicates time
(time) and which indicates the cross-sectional unit
(id). To include both time and cross-sectional fixed
effects, effect = "twoway" should be set. Finally, to
standardize the data, the user should specify
standardize = TRUE.
data <- data_preparation(migration_panel,
time = "Year_0",
id = "Pair_ID",
fixed_effects = TRUE,
effect = "twoway",
standardize = TRUE)
data[1:10,1:6]## Migration MigrationLAG Unempl Earn Tax Social
## 1 0.70147677 -0.17353020 0.1761765 0.68749881 0.63980838 -0.06951980
## 2 -0.05598238 -0.22073133 0.2605423 -0.08453441 -0.69734342 -0.91727505
## 3 0.14474255 0.13184524 -0.2108058 -0.25694993 1.67466773 -4.56049857
## 4 0.07771323 0.03533121 0.7190521 0.52405682 -2.27192796 -1.44458719
## 5 0.16120052 0.06954068 0.3339039 0.78919703 -1.08013135 0.05181206
## 6 0.20350209 -0.08221904 -0.1429463 0.28403595 -0.96170452 -0.62389309
## 7 -0.72208939 -1.00970587 0.5576566 0.50115030 1.35152754 0.29407464
## 8 0.12113608 -1.12842641 -2.0228364 -1.30269692 0.14380556 -1.24323369
## 9 -3.32285180 -1.09109771 -0.5867838 -0.82988872 0.32622341 0.58728702
## 10 -0.08231775 0.20050315 -1.0361234 0.69604548 -0.07277945 -1.19482749
As shown above, the data_preparation function removes
the columns corresponding to the time and cross-sectional identifiers.
This produces the data format required by the model_space
function, where the dependent variable is placed in the first column and
the regressors occupy the remaining columns. If the user has panel data
without explicit time and cross-sectional identifiers, the
data_preparation function can be used to introduce fixed
effects and perform standardization.
As mentioned earlier the main dataset used in this manuscript is
cross-sectional on the determinants of international trade. In this case
the user can still utilize data_preparation function for
standardization of the data.
## LNTRADE B LNDGEO L LNRGDPPROD RGDPpcDIFF
## 1 0.685661658 -0.3412902 -0.4526402 4.8914368 0.37588807 -1.15600149
## 2 -0.135645463 -0.3412902 -0.5931909 -0.2038099 -0.33175575 1.80791818
## 3 -1.205596707 -0.3412902 0.4563165 -0.2038099 -1.13129985 -0.22585889
## 4 0.910798215 2.9210424 -2.4681456 -0.2038099 0.16270947 -0.06282635
## 5 0.105898299 -0.3412902 -0.4733266 -0.2038099 0.09997771 -1.35414601
## 6 -0.944259347 -0.3412902 0.1662654 -0.2038099 -1.01660137 0.80007366
## 7 0.001856051 -0.3412902 0.2434943 -0.2038099 0.04544252 -1.24811400
## 8 0.960858672 -0.3412902 -0.2855048 -0.2038099 1.26747625 -1.08291700
## 9 2.020557180 2.9210424 -1.1960182 4.8914368 1.44481745 -1.28218726
## 10 -0.157994767 -0.3412902 0.2164248 -0.2038099 0.26432517 -0.39265022
However, in the remainder of the manuscript, we use the
unstandardized version of Trade_data. For detailed
information on the dataset, type:
Before conducting Bayesian model averaging or Extreme Bounds
Analysis, the model space must first be specified. In the
rmsBMA package, the model_space function is
used to construct the model space. The first argument of the function is
df, a data matrix in which the first column contains the
dependent variable and the remaining columns correspond to the
regressors. The second argument, M, determines the maximum
number of variables allowed in the considered models. Consequently, this
parameter governs the size of the model space given the total number of
regressors and enables reduced model space Bayesian model averaging. If
the user does not specify M, it is set equal to the total
number of regressors, and the standard version of Bayesian model
averaging is performed.
The third argument, g, allows the user to specify the
\(g\)-prior. The user may select one of
the predefined options described in g_prior or provide any positive
numerical value. The default option is "UIP" (the unit
information prior). Setting g = "None" instructs the
function to omit \(g\)-regression and
compute results based on Bayesian averaging of classical estimates. When
HC = TRUE, the function replaces the conventional
covariance matrix of the estimators with the
heteroscedasticity-consistent covariance matrix described in equation
hc, and the corresponding robust standard errors are computed from its
diagonal elements.
Suppose we wish to consider a model space that includes all models with at most 10 regressors, estimated using the benchmark prior and the conventional covariance matrix:
Alternatively user could consider model space of identical size, however, with the use of Bayesian averaging of classical estimates and heteroscedasticity consistent covariance matrix:
For the reminder of the manuscript we are going to utilize smaller
model space available to the user within the package and constructed
using Trade_data_small
The object modelSpace, constructed according to the
specification of the model_space function described above,
is included in the package. The model_space function
returns a list consisting of five components, comprising the full model
space and additional elements required for subsequent use in the
bma function.
bma functionThe bma function enables users to perform Bayesian model
averaging using the object obtained with the model_space
function. The round parameter specifies the decimal place
to which the BMA statistics should be rounded in the results.
The bma function returns a list containing 15 elements.
However, most of these elements are only required for other functions.
The main objects of interest are the two tables with the BMA statistics.
The results obtained with binomial model prior are first on the
list.
## PIP PM PSD PMcon PSDcon P(+) PSC
## CONST NA 7.608 0.714 7.608 0.714 1.000 1.000
## B 0.709 0.299 0.230 0.422 0.151 0.707 0.852
## LNDGEO 1.000 -1.234 0.076 -1.234 0.076 0.000 1.000
## L 0.519 0.258 0.288 0.497 0.204 0.515 0.755
## LNRGDPPROD 1.000 0.911 0.019 0.911 0.019 1.000 1.000
## HUMAN 0.052 -0.005 0.051 -0.087 0.205 0.018 0.509
## INFVAR 0.999 -0.049 0.011 -0.049 0.011 0.000 1.000
## ARABLE 0.115 0.001 0.002 0.005 0.004 0.105 0.547
## ARABLEpw 0.113 0.000 0.001 -0.003 0.002 0.011 0.546
## LAND 0.050 0.000 0.000 0.000 0.000 0.019 0.506
## LANDpc 0.984 0.000 0.000 0.000 0.000 0.984 0.992
PIP denotes the posterior inclusion probability, PM denotes the posterior mean, PSD denotes the posterior standard deviation. PMcon, PSDcon,denote the posterior mean and posterior standard deviation, respectively, conditional on the inclusion of the regressor in the model. Users should base their interpretation of the results on conditional BMA statistics only when they believe that certain regressors must be included. PSC is posterior sign certainty or posterior probability that the sing of a regressor is the same as the sing of PM, and P(+) is the posterior probability of the positive sign.
Two variables stand out: LNDGEO and
LNRGDPPROD, representing geographical distance and the
product of real GDPs, respectively. Both exhibit posterior inclusion
probabilities (PIP) approximately equal to one, and the ratio of the
posterior mean to the posterior standard deviation is well above two in
terms of absolute value. The posterior sign certainty (PSC) and \(P(+)\) indicate that the signs of the
coefficients are extremely stable across the model space. These results
demonstrate that the two variables are robust determinants of
international trade and justify their central role in the gravity model
of trade (Tinbergen 1962), which has
become a workhorse framework in the field (Feenstra 2015).
INFVAR and LANDpc, representing the
difference in inflation variability and land per capita,
respectively, are also characterized by high posterior inclusion
probabilities (PIP) and a large absolute ratio of the posterior mean
(PM) to the posterior standard deviation (PSD). However, the reported
values of PM and PSD for LANDpc are both equal to zero.
This results from the difference in measurement units between the
dependent variable and this regressor. To mitigate this issue, the user
may standardize the data prior to estimation using the
model_space function. The variable B (common
border) also satisfies the least stringent robustness criterion proposed
by Raftery (1995). The remaining
regressors may be regarded as fragile. The results obtained with
binomial-beta model prior is included as the second object in the
bma list.
## PIP PM PSD PMcon PSDcon P(+) PSC
## CONST NA 7.599 0.720 7.599 0.720 1.000 1.000
## B 0.728 0.302 0.225 0.415 0.152 0.725 0.861
## LNDGEO 1.000 -1.231 0.076 -1.231 0.076 0.000 1.000
## L 0.564 0.277 0.288 0.491 0.204 0.560 0.778
## LNRGDPPROD 1.000 0.911 0.019 0.911 0.019 1.000 1.000
## HUMAN 0.068 -0.006 0.058 -0.089 0.205 0.022 0.511
## INFVAR 0.999 -0.049 0.011 -0.049 0.011 0.000 1.000
## ARABLE 0.149 0.001 0.002 0.005 0.003 0.136 0.561
## ARABLEpw 0.145 0.000 0.001 -0.003 0.002 0.014 0.559
## LAND 0.065 0.000 0.000 0.000 0.000 0.025 0.508
## LANDpc 0.984 0.000 0.000 0.000 0.000 0.984 0.992
The results obtained under the binomial–beta model prior corroborate
those derived from the binomial model prior. The third element of the
bma output list contains the results of the Extreme Bounds
Analysis:
## Variable Lower bound Minimum Mean Maximum Upper bound EBA test %(+)
## 1 CONST -4.695 -2.764 14.627 32.947 35.090 FAIL 74.483
## 2 B 0.011 0.311 1.477 2.720 3.440 PASS 100.000
## 3 LNDGEO -1.815 -1.551 -1.274 -1.043 -0.642 PASS 0.000
## 4 L -0.786 0.311 1.287 3.247 4.341 FAIL 100.000
## 5 LNRGDPPROD 0.834 0.880 0.914 0.967 1.045 PASS 100.000
## 6 HUMAN -2.228 -0.963 -0.108 0.843 1.997 FAIL 32.618
## 7 INFVAR -0.200 -0.134 -0.077 -0.048 -0.027 PASS 0.000
## 8 ARABLE -0.037 -0.019 -0.001 0.012 0.023 FAIL 49.571
## 9 ARABLEpw -0.065 -0.054 -0.026 0.003 0.011 FAIL 13.090
## 10 LAND 0.000 0.000 0.000 0.000 0.000 FAIL 65.880
## 11 LANDpc 0.000 0.000 0.000 0.000 0.000 FAIL 35.193
The columns of the table report the following statistics: the lower bound (Equation lb), the minimum value of the coefficient, the average value of the coefficient, the maximum value of the coefficient, the upper bound (Equation ub), the result of the EBA test (Equation eba_test), and the percentage of positive coefficients across the model space.
Notably, four variables—B, LNDGEO,
LNRGDPPROD, and INFVAR—pass the stringent EBA
criterion. However, the issue related to the measurement units of the
LANDpc regressor reappears. When standardized data are
used, this variable also satisfies the EBA test.
The fourth object in the list is a table containing the prior and posterior expected model sizes for the binomial and binomial-beta model priors.
## Prior model size Posterior model size
## Binomial 5 5.542
## Binomial-beta 5 5.702
The results indicate that, after observing the data, the researcher should expect approximately five to six regressors in the model under both the binomial and the binomial–beta model priors. The posterior expected model size can be computed as the sum of the posterior inclusion probabilities.
The table also illustrates a potential pitfall of using reduced model space Bayesian model averaging. Both the prior and posterior expected model sizes may exceed the size of the largest model considered. If this occurs for the posterior expected model size, the researcher should consider enlarging the maximum model size. However, a high posterior expected model size may simply reflect a high prior expected model size. Consequently, careful experimentation and robustness checks are recommended.
The model_pmp function allows the user to compare prior
and posterior model probabilities over the entire model space in the
form of a graph. The models are ranked from the one with the highest to
the one with the lowest posterior model probability. The function
returns a list with three objects:
The user can retrieve each graph separately from the list; however, the function automatically displays a combined graph.
The graphs demonstrate that great share of the posterior probability
mass is concentrated within just a couple of models. To view the results
for only the best models, the user can use the top
parameter.
The last graph is particularly illuminating. It demonstrated that the
best three models are associated with vastly higher posterior model
probabilities than the rest. It will be very instructive to see those
models. The models in question will be identified after implementing
model_sizes (through best_models, which is
covered in the section below).
The model_sizes function displays prior and posterior
model probabilities on a graph for models of different sizes. Similarly
to the model_pmp function is returns a list with three
objects:
Again, the user can retrieve each graph separately from the list; however, the function automatically displays a combined graph.
The graph illustrates that the posterior distribution over model sizes is largely independent of the prior specification, as it appears very similar under both the binomial and the binomial–beta model priors.
The best_models function allows the user to view a
chosen number of the best models in terms of posterior model
probability. The function returns a list containing six objects:
The parameter estimate pertain only to the results that
will be automatically displayed after running the function. The
parameter criterion determines whether the models should be
ranked according to posterior model probabilities calculated using the
binomial (1) or binomial-beta (2) model prior. To obtain the inclusion
array for the 10 best models ranked with the binomial model prior, the
user needs to run:
## 'No. 1' 'No. 2' 'No. 3' 'No. 4' 'No. 5' 'No. 6' 'No. 7' 'No. 8'
## Const 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.00
## B 1.000 1.000 0.000 1.000 0.000 0.000 1.000 1.00
## LNDGEO 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.00
## L 0.000 1.000 1.000 0.000 0.000 1.000 1.000 1.00
## LNRGDPPROD 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.00
## HUMAN 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.00
## INFVAR 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.00
## ARABLE 0.000 0.000 0.000 1.000 0.000 0.000 1.000 0.00
## ARABLEpw 0.000 0.000 0.000 0.000 0.000 1.000 0.000 1.00
## LAND 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.00
## LANDpc 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.00
## PMP 0.310 0.190 0.150 0.038 0.037 0.034 0.032 0.03
## R^2 0.918 0.919 0.918 0.919 0.915 0.918 0.920 0.92
A value of 1 indicates the inclusion of a given regressor in a model,
while the second-to-last row reports the posterior model probability
associated with each specification. The table shows that the first three
models account for 32
To generate a knitr table with the estimation output for
the three best models ranked according to the binomial–beta model prior,
the user should run:
| ‘No. 1’ | ‘No. 2’ | ‘No. 3’ | |
|---|---|---|---|
| Const | 7.462 (0.646)*** | 7.363 (0.643)*** | 7.929 (0.608)*** |
| B | 0.456 (0.143)*** | 0.37 (0.147)** | NA |
| LNDGEO | -1.215 (0.067)*** | -1.201 (0.067)*** | -1.289 (0.057)*** |
| L | NA | 0.425 (0.193)** | 0.554 (0.188)*** |
| LNRGDPPROD | 0.911 (0.018)*** | 0.911 (0.018)*** | 0.915 (0.018)*** |
| HUMAN | NA | NA | NA |
| INFVAR | -0.05 (0.011)*** | -0.048 (0.011)*** | -0.048 (0.011)*** |
| ARABLE | NA | NA | NA |
| ARABLEpw | NA | NA | NA |
| LAND | NA | NA | NA |
| LANDpc | NA | NA | NA |
| PMP | 0.258 | 0.19 | 0.125 |
| R^2 | 0.918 | 0.919 | 0.918 |
The table shows that the first three models account for 26
Finally, to obtain a gTree table with estimation output for the top 3 models ranked by the binomial-beta model prior, the user needs to run:
best_3_models <- best_models(bma_results, criterion = 2, best = 3)
grid::grid.draw(best_3_models[[6]])Within the BMA framework, it is possible to establish the nature of
the relationship between pairs of examined regressors using the
jointness measures. This can be accomplished using the
jointness function. The latest jointness measure,
introduced by Hofmarcher et al. (2018),
has been shown to outperform older alternatives developed by Ley and Steel (2007) and Doppelhofer and Weeks (2009)9. Therefore, the Hofmarcher et al. (2018) measure is the default
option in the jointness function10.
## B LNDGEO L LNRGDPPROD HUMAN INFVAR ARABLE ARABLEpw LAND
## B NA 0.418 -0.417 0.418 -0.363 0.418 -0.301 -0.379 -0.368
## LNDGEO 0.456 NA 0.038 1.000 -0.895 0.999 -0.770 -0.774 -0.899
## L -0.291 0.129 NA 0.038 -0.047 0.037 -0.006 0.059 -0.040
## LNRGDPPROD 0.456 1.000 0.129 NA -0.895 0.999 -0.770 -0.774 -0.899
## HUMAN -0.376 -0.865 -0.115 -0.865 NA -0.894 0.667 0.672 0.794
## INFVAR 0.456 0.999 0.128 0.999 -0.864 NA -0.769 -0.773 -0.898
## ARABLE -0.280 -0.703 -0.035 -0.703 0.574 -0.702 NA 0.547 0.670
## ARABLEpw -0.365 -0.710 0.029 -0.710 0.583 -0.709 0.426 NA 0.674
## LAND -0.382 -0.870 -0.108 -0.870 0.735 -0.869 0.577 0.584 NA
The entries above the main diagonal report the results obtained under
the binomial model prior, whereas the entries below the diagonal
correspond to the binomial-beta model prior. Positive values of the
measure indicate a complementary relationship between regressors, while
negative values indicate they are substitutes. Consistent with the
previous findings, LNDGEO and LNRGDPPROD
exhibit strong complements. By contrast, LNRGDPPROD and
LAND are relatively strong substitutes. Notably,
LNDGEO and B are identified as complements,
implying that geographical distance and the presence of a common border
capture distinct dimensions of the forces behind the international
trade.
To obtain the results for the Ley and Steel (2007) measure, the user should run:
## B LNDGEO L LNRGDPPROD HUMAN INFVAR ARABLE ARABLEpw LAND
## B NA 2.437 0.418 2.437 0.053 2.434 0.122 0.094 0.050
## LNDGEO 2.677 NA 1.078 Inf 0.055 1736.689 0.130 0.127 0.053
## L 0.564 1.295 NA 1.078 0.047 1.077 0.125 0.153 0.048
## LNRGDPPROD 2.677 Inf 1.295 NA 0.055 1736.689 0.130 0.127 0.053
## HUMAN 0.070 0.072 0.065 0.072 NA 0.055 0.027 0.028 0.019
## INFVAR 2.674 1809.490 1.294 1809.490 0.072 NA 0.130 0.127 0.053
## ARABLE 0.168 0.174 0.175 0.174 0.035 0.174 NA 0.037 0.025
## ARABLEpw 0.134 0.170 0.203 0.170 0.038 0.170 0.050 NA 0.024
## LAND 0.067 0.069 0.065 0.069 0.025 0.069 0.032 0.032 NA
For the Doppelhofer and Weeks (2009) measure type:
## B LNDGEO L LNRGDPPROD HUMAN INFVAR ARABLE ARABLEpw LAND
## B NA NaN -1.965 NaN -0.057 0.347 -0.034 -0.669 -0.121
## LNDGEO NaN NA NaN NaN NaN NaN NaN NaN NaN
## L -1.601 NaN NA NaN -0.205 -0.705 0.151 0.645 -0.115
## LNRGDPPROD NaN NaN NaN NA NaN NaN NaN NaN NaN
## HUMAN -0.005 NaN -0.122 NaN NA -0.021 -0.402 -0.333 -0.386
## INFVAR 0.534 NaN -0.414 NaN 0.230 NA 0.170 0.425 -0.101
## ARABLE 0.088 NaN 0.281 NaN -0.417 0.456 NA -0.593 -0.461
## ARABLEpw -0.513 NaN 0.718 NaN -0.330 0.708 -0.625 NA -0.465
## LAND -0.052 NaN -0.033 NaN -0.380 0.149 -0.474 -0.468 NA
A drawback of the measures proposed by Ley and
Steel (2007) and Doppelhofer and Weeks
(2009), as illustrated in the previous two tables, is that they
may yield problematic outputs such as NaN or
Inf. This issue does not arise with the measure introduced
by Hofmarcher et al. (2018). Avoiding such
numerical irregularities constitutes one of several arguments advanced
by Hofmarcher et al. (2018) in favor of
their approach and helps explain why their measure performs more
reliably than the alternatives.
The coef_hist function allows the user to plot the
distribution of estimated coefficients. It returns a list containing a
number of objects equal to the number of regressors plus one (for the
constant). The first object in the list is a graph of the constants,
while the remaining objects are graphs of the coefficients for the other
regressors. The graph for the constants collects coefficients from the
entire model space, whereas the graphs for the other regressors only
collect coefficients from the models that include the given
regressor.
There are two principal approaches to visualizing the posterior
distributions of the coefficients. The first approach relies on a
histogram. The coef_hist function allows the user to
control the binning of the histogram through several arguments
(BW, binW, BN, and
num). To specify 80 bins for each graph, the following code
can be used:
bin_sizes <- matrix(80, nrow = 11, ncol = 1)
coef_plots <- coef_hist(bma_results, BN = 1, num = bin_sizes)
coef_plots[[3]]The second option allows the user to plot kernel densities.
The choice of appropriate plotting options is left to the user’s preferences regarding the style of presentation and the size of the model space.
library(gridExtra)
grid.arrange(coef_plots[[3]], coef_plots[[5]], coef_plots2[[3]],
coef_plots2[[5]], nrow = 2, ncol = 2)One additional option available to the user with the
coef_hist function is weighting coefficients by posterior
model probabilities. This can be accomplished using the
weight parameter, whose default value is set to
NULL. Setting weight to
"binomial" or "beta" results in plotting
histograms based on the binomial or binomial–beta model prior,
respectively. For example, in the case of the binomial–beta model prior,
it can be observed that the value of the posterior mean is mainly driven
by the model characterized by the highest posterior model
probability.
The posterior_dens allows the user to plot posterior
distributions of coefficients. Similarly, to coef_hist
function, posterior_dens returns a list containing a number
of objects equal to the number of regressors plus one. The first object
in the list is a graph of the posterior distribution for the lagged
dependent variable, while the remaining objects are graphs of the
posterior distribution for the other regressors. The user needs to
specify whether to plot posterior distributions based on the binomial
(prior = "binomial") or binomial-beta
(prior = "beta") model prior, as well as whether to use
regular (SE = "standard") or robust
(SE = "robust") standard errors used in the calculation of
posterior objects.
distPlots <- posterior_dens(bma_results, prior = "binomial")
grid.arrange(distPlots[[3]], distPlots[[5]], nrow = 2, ncol = 1)This section provides a more detailed description of the available model prior options. Subsection ems discusses the consequences of changes in the expected model size, while subsection dil describes the dilution priors.
The bma function calculates BMA statistics using both
the binomial and binomial-beta model priors. By default, the
bma function sets the expected model size (EMS) to \(K/2\), where \(K\) denotes the total number of regressors.
The binomial model prior with \(EMS =
K/2\) leads to a uniform model prior, assigning equal
probabilities to all models. In contrast, the binomial-beta model prior
with \(EMS = K/2\) assumes equal
probabilities across all model sizes. However, the user can modify the
prior model specification by changing the EMS parameter.
First, consider the consequence of concentrating prior probability mass
on small models by setting \(EMS =
2\).
Before turning to the main BMA results, let us focus on the changes in the posterior probability mass with respect to model sizes.
## Prior model size Posterior model size
## Binomial 2 4.871
## Binomial-beta 2 5.385
The results show that decreasing the prior expected model size led to a small decline in the posterior expected model size. The consequences of this change in the prior expected model size are best illustrated using the prior and posterior probability mass over model sizes.
For the dataset considered in the manuscript, the resulting differences in posterior inference are negligible, as the likelihood contribution from the data substantially outweighs the influence of the prior specification.
The corresponding changes in the distribution of posterior probability mass over the model space are likewise negligible.
The principal posterior summary statistics under the binomial model prior exhibit only minimal change.
## PIP PM PSD PMcon PSDcon P(+) PSC
## CONST NA 7.754 0.720 7.754 0.720 1.000 1.000
## B 0.531 0.235 0.246 0.443 0.147 0.530 0.765
## LNDGEO 1.000 -1.257 0.081 -1.257 0.081 0.000 1.000
## L 0.307 0.162 0.267 0.527 0.198 0.306 0.652
## LNRGDPPROD 1.000 0.913 0.018 0.913 0.018 1.000 1.000
## HUMAN 0.015 -0.001 0.027 -0.085 0.206 0.005 0.502
## INFVAR 0.998 -0.050 0.011 -0.050 0.011 0.000 0.999
## ARABLE 0.033 0.000 0.001 0.005 0.004 0.030 0.513
## ARABLEpw 0.032 0.000 0.001 -0.003 0.002 0.003 0.513
## LAND 0.014 0.000 0.000 0.000 0.000 0.006 0.501
## LANDpc 0.941 0.000 0.000 0.000 0.000 0.941 0.970
The results for binomial-beta model prior are very similar.
## PIP PM PSD PMcon PSDcon P(+) PSC
## CONST NA 7.642 0.718 7.642 0.718 1.000 1.000
## B 0.668 0.285 0.235 0.426 0.151 0.666 0.832
## LNDGEO 1.000 -1.239 0.078 -1.239 0.078 0.000 1.000
## L 0.468 0.235 0.286 0.502 0.203 0.465 0.731
## LNRGDPPROD 1.000 0.912 0.019 0.912 0.019 1.000 1.000
## HUMAN 0.044 -0.004 0.046 -0.087 0.205 0.015 0.507
## INFVAR 0.999 -0.050 0.011 -0.050 0.011 0.000 1.000
## ARABLE 0.096 0.000 0.002 0.005 0.004 0.087 0.539
## ARABLEpw 0.094 0.000 0.001 -0.003 0.002 0.009 0.538
## LAND 0.042 0.000 0.000 0.000 0.000 0.016 0.505
## LANDpc 0.975 0.000 0.000 0.000 0.000 0.975 0.987
It is also very instructive to examine the jointness measures calculated under the new prior specification.
## B LNDGEO L LNRGDPPROD HUMAN INFVAR ARABLE ARABLEpw LAND
## B NA 0.062 -0.459 0.062 -0.059 0.063 -0.055 -0.081 -0.060
## LNDGEO 0.336 NA -0.385 1.000 -0.970 0.996 -0.934 -0.937 -0.972
## L -0.420 -0.064 NA -0.385 0.368 -0.385 0.362 0.385 0.371
## LNRGDPPROD 0.336 1.000 -0.064 NA -0.970 0.996 -0.934 -0.937 -0.972
## HUMAN -0.291 -0.913 0.055 -0.913 NA -0.966 0.905 0.907 0.942
## INFVAR 0.336 0.998 -0.064 0.998 -0.911 NA -0.930 -0.933 -0.967
## ARABLE -0.240 -0.809 0.088 -0.809 0.725 -0.807 NA 0.871 0.906
## ARABLEpw -0.305 -0.813 0.141 -0.813 0.729 -0.811 0.626 NA 0.908
## LAND -0.296 -0.916 0.060 -0.916 0.829 -0.914 0.727 0.731 NA
The jointness measures likewise exhibit only minor changes.
Next, to examine the consequences of concentrating prior probability
mass on larger models, EMS was set to eight, which exceeds
the maximum number of regressors permitted in the model.
## Prior model size Posterior model size
## Binomial 8 6.336
## Binomial-beta 8 5.825
The posterior model size increased for both the binomial prior and the binomial-beta model prior. The most interesting aspect is the new graphs of prior and posterior probability mass over the model sizes.
In both cases, the posterior probability mass has concentrated near the models with higher number of regressors. This conclusion is further supported by the graphs of posterior model probability across the entire model space.
The Figure demonstrates that the change in the expected model size led to a substantial increase in the posterior model probability for the best models. The increase in the expected model size also influenced the main BMA statistics.
## PIP PM PSD PMcon PSDcon P(+) PSC
## CONST NA 7.540 0.731 7.540 0.731 1.000 1.000
## B 0.824 0.325 0.204 0.394 0.152 0.820 0.908
## LNDGEO 1.000 -1.218 0.073 -1.218 0.073 0.000 1.000
## L 0.747 0.357 0.273 0.478 0.204 0.740 0.867
## LNRGDPPROD 1.000 0.909 0.020 0.909 0.020 1.000 1.000
## HUMAN 0.122 -0.011 0.077 -0.091 0.204 0.040 0.521
## INFVAR 1.000 -0.049 0.011 -0.049 0.011 0.000 1.000
## ARABLE 0.270 0.001 0.003 0.005 0.003 0.248 0.613
## ARABLEpw 0.262 -0.001 0.002 -0.003 0.002 0.024 0.608
## LAND 0.117 0.000 0.000 0.000 0.000 0.045 0.514
## LANDpc 0.993 0.000 0.000 0.000 0.000 0.993 0.997
The PIPs increased considerably, however, the main conclusions from the main exercise remain the same. Similar situation is present in the case of binomial-beta model prior.
## PIP PM PSD PMcon PSDcon P(+) PSC
## CONST NA 7.586 0.722 7.586 0.722 1.000 1.000
## B 0.748 0.307 0.222 0.411 0.152 0.745 0.871
## LNDGEO 1.000 -1.229 0.076 -1.229 0.076 0.000 1.000
## L 0.600 0.293 0.287 0.488 0.204 0.595 0.795
## LNRGDPPROD 1.000 0.910 0.019 0.910 0.019 1.000 1.000
## HUMAN 0.078 -0.007 0.062 -0.089 0.205 0.026 0.513
## INFVAR 1.000 -0.049 0.011 -0.049 0.011 0.000 1.000
## ARABLE 0.171 0.001 0.002 0.005 0.003 0.157 0.571
## ARABLEpw 0.167 -0.001 0.001 -0.003 0.002 0.015 0.568
## LAND 0.075 0.000 0.000 0.000 0.000 0.029 0.509
## LANDpc 0.987 0.000 0.000 0.000 0.000 0.986 0.993
Again, it is instructive to examine the jointness measures.
## B LNDGEO L LNRGDPPROD HUMAN INFVAR ARABLE ARABLEpw LAND
## B NA 0.648 0.088 0.648 -0.496 0.648 -0.302 -0.419 -0.503
## LNDGEO 0.496 NA 0.494 1.000 -0.756 1.000 -0.459 -0.475 -0.765
## L -0.226 0.200 NA 0.494 -0.420 0.493 -0.234 -0.164 -0.411
## LNRGDPPROD 0.496 1.000 0.200 NA -0.756 1.000 -0.459 -0.475 -0.765
## HUMAN -0.401 -0.845 -0.175 -0.845 NA -0.755 0.234 0.256 0.518
## INFVAR 0.495 0.999 0.200 0.999 -0.844 NA -0.459 -0.475 -0.765
## ARABLE -0.285 -0.658 -0.072 -0.658 0.511 -0.657 NA -0.015 0.238
## ARABLEpw -0.377 -0.666 -0.006 -0.666 0.522 -0.665 0.342 NA 0.253
## LAND -0.407 -0.851 -0.167 -0.851 0.695 -0.850 0.514 0.522 NA
One of the main issues associated with identifying robust regressors is multicollinearity. Some regressors may approximate the same underlying factor influencing the dependent variable. Multicollinearity may result from the absence of observable variables associated with a specific theory or from a theory failing to provide a unique candidate for a regressor. Moreover, some regressors may share a common determinant. The dilution prior proposed by George (2010), described in detail in the section below, constitutes one of the available approaches for addressing this issue.
To apply the dilution prior, the user must set
dilution = 1 in the bma function. The user can
also manipulate the dilution parameter \(\omega\). The default option is
dil.Par = 0.5, as recommended by George (2010).
The effect of implementing the dilution prior is well depicted by the distribution of prior probability mass over the model sizes.
The change in the prior distribution is more visible for the binomial-beta model prior. In panel b, the prior probability mass has decreased for larger models and increased for smaller models. However, this change is not uniform, as models characterized by the highest degree of multicollinearity are subject to the greatest penalty in terms of prior probability mass.
Before moving to the BMA statistics, it is instructive to examine the
change in the dil.Par parameter.
bma_results_dil01 <- bma(
modelSpace = modelSpace,
round = 3,
dilution = 1,
dil.Par = 0.1
)
size_graphs_dil01 <- model_sizes(bma_results_dil01)As we can see, decreasing the value of \(\omega\) diminishes the impact of dilution
on the model prior. Conversely, raising the dil.Par
parameter increases the degree of dilution.
bma_results_dil2 <- bma(
modelSpace = modelSpace,
round = 3,
dilution = 1,
dil.Par = 2
)
size_graphs_dil2 <- model_sizes(bma_results_dil2)An especially strong impact can be seen for the binomial-beta prior.
However, even after giving such priority to the penalty for multicollinearity, the main BMA statistics remain stable.
## PIP PM PSD PMcon PSDcon P(+) PSC
## CONST NA 7.712 0.721 7.712 0.721 1.000 1.000
## B 0.530 0.225 0.238 0.424 0.151 0.528 0.763
## LNDGEO 1.000 -1.251 0.078 -1.251 0.078 0.000 1.000
## L 0.556 0.289 0.299 0.520 0.202 0.552 0.775
## LNRGDPPROD 1.000 0.912 0.019 0.912 0.019 1.000 1.000
## HUMAN 0.065 -0.005 0.056 -0.084 0.205 0.022 0.510
## INFVAR 0.999 -0.049 0.011 -0.049 0.011 0.000 1.000
## ARABLE 0.131 0.001 0.002 0.005 0.004 0.119 0.554
## ARABLEpw 0.096 0.000 0.001 -0.003 0.002 0.008 0.540
## LAND 0.047 0.000 0.000 0.000 0.000 0.018 0.506
## LANDpc 0.978 0.000 0.000 0.000 0.000 0.978 0.989
As discussed in dilution_prior, the dilution prior proposed by George (2010) is empirical in nature and, as such, departs from a strictly Bayesian framework. To address this concern, the present manuscript proposes a non-empirical alternative termed group dilution.
Within the dataset considered here, three pairs of variables may
plausibly capture the same underlying determinants of international
trade: ARABLE and ARABLEpw, LAND
and LANDpc, as well as B and L.
The first two pairs are self-explanatory, as they represent level and
per-worker/person transformations of the same underlying quantities. In
the case of B and L, countries that share a
common border are more likely to share a common language, suggesting
potential informational overlap between these regressors.
In order to implement group dilution, the user must provide two vectors. The first vector must have a length equal to the number of regressors. In this vector, a value of 0 indicates that a given regressor is not associated with any other regressor, that is, it does not represent the same underlying determinant as any other variable. Positive consecutive natural numbers indicate membership in a group of related regressors, namely those capturing the same underlying determinant. Consequently, for the case outlined in the previous paragraph:
The numerical indices can be displayed alongside the regressor names by executing:
## group_vec
## [1,] "B" "1"
## [2,] "LNDGEO" "0"
## [3,] "L" "1"
## [4,] "LNRGDPPROD" "0"
## [5,] "HUMAN" "0"
## [6,] "INFVAR" "0"
## [7,] "ARABLE" "2"
## [8,] "ARABLEpw" "2"
## [9,] "LAND" "3"
## [10,] "LANDpc" "3"
The second vector must have a length equal to the number of groups of
related regressors. Each element of this vector specifies the dilution
parameter associated with the corresponding group. If the researcher
considers it unlikely that B and L represent
the same underlying determinant, the dilution parameter for the group
containing these variables may be set to 0.8. If, by contrast, the
researcher regards the relationship between ARABLE and
ARABLEpw as more probable, and the association between
LAND and LANDpc as even stronger, the dilution
parameters for these groups may be set to 0.6 and 0.4, respectively.
Consequently, the dilution parameter vector is given by:
The results under the binomial model prior with group dilution can be obtained by executing:
bma_results_dil3 <- bma(
modelSpace = modelSpace,
Narrative = 1,
Nar_vec = group_vec,
p = par_vec,
round = 3
)
bma_results_dil3[[1]]## PIP PM PSD PMcon PSDcon P(+) PSC
## CONST NA 7.622 0.714 7.622 0.714 1.000 1.000
## B 0.693 0.295 0.233 0.426 0.151 0.692 0.845
## LNDGEO 1.000 -1.236 0.076 -1.236 0.076 0.000 1.000
## L 0.491 0.247 0.289 0.503 0.204 0.487 0.742
## LNRGDPPROD 1.000 0.911 0.019 0.911 0.019 1.000 1.000
## HUMAN 0.054 -0.005 0.051 -0.088 0.205 0.018 0.509
## INFVAR 0.999 -0.049 0.011 -0.050 0.011 0.000 1.000
## ARABLE 0.113 0.001 0.002 0.005 0.003 0.102 0.546
## ARABLEpw 0.111 0.000 0.001 -0.003 0.002 0.011 0.545
## LAND 0.022 0.000 0.000 0.000 0.000 0.008 0.502
## LANDpc 0.984 0.000 0.000 0.000 0.000 0.984 0.992
For the binomial-beta with group dilution execute:
## PIP PM PSD PMcon PSDcon P(+) PSC
## CONST NA 7.614 0.720 7.614 0.720 1.000 1.000
## B 0.710 0.298 0.229 0.419 0.151 0.708 0.853
## LNDGEO 1.000 -1.233 0.077 -1.233 0.077 0.000 1.000
## L 0.532 0.265 0.289 0.498 0.204 0.528 0.762
## LNRGDPPROD 1.000 0.911 0.019 0.911 0.019 1.000 1.000
## HUMAN 0.069 -0.006 0.058 -0.089 0.205 0.023 0.512
## INFVAR 0.999 -0.050 0.011 -0.050 0.011 0.000 1.000
## ARABLE 0.144 0.001 0.002 0.005 0.003 0.131 0.559
## ARABLEpw 0.141 0.000 0.001 -0.003 0.002 0.013 0.557
## LAND 0.028 0.000 0.000 0.000 0.000 0.011 0.503
## LANDpc 0.984 0.000 0.000 0.000 0.000 0.983 0.992
Similarly to the dilution prior proposed by George (2010), the application of the group dilution prior does not lead to any meaningful changes in the results. This finding suggests that multicollinearity does not constitute a substantive issue in the dataset under consideration.
The manuscript introduces the rmsBMA package, which
implements reduced model space Bayesian model averaging. This framework
enables the user to conduct BMA and perform meaningful inference even in
situations where the number of regressors exceeds the number of
observations. By restricting the model space, a sufficient number of
observations is preserved for reliable statistical inference, while the
analysis remains focused on the most relevant determinants. To the best
of our knowledge, this functionality is currently unique to the
rmsBMA package.
Another distinctive feature of the rmsBMA package is the
newly introduced group dilution prior, which addresses the
challenge of identifying robust determinants in the presence of
multicollinearity. This approach enables users to specify dilution
parameters for groups of variables associated with the same underlying
theoretical construct. As a consequence, prior probabilities can be
systematically adjusted for models that include multiple proxies
capturing the same fundamental determinant. Moreover, the group
dilution prior provides a non-empirical alternative to the dilution
prior proposed by George (2010). The
latter is empirical in nature, as it relies on the correlation structure
of the regressors and therefore departs from a fully Bayesian treatment
of prior specification.
Finally, the rmsBMA package enables users to specify a
flexible class of \(g\)-priors, model
priors, and dilution structures; compute jointness measures (Doppelhofer and Weeks 2009; Ley and Steel 2007;
Hofmarcher et al. 2018); conduct Extreme Bounds Analysis; perform
model selection procedures; and generate a comprehensive range of
graphical outputs. These include visualizations of prior and posterior
probabilities over the model space and model size, as well as
histograms, kernel density estimates, weighted histograms of
coefficients, and posterior density plots.
Consequently, rmsBMA integrates functionalities that are
not jointly available in existing R packages or alternative software
environments, while also introducing original methodological
contributions.
For a detailed review of BMA applications in economics, see (Moral-Benito 2015; Steel 2020).↩︎
For more on the simultaneity problem within the BMA framework, see (Moral-Benito 2012, 2013, 2016; Lenkoski, Eicher, and Raftery 2014; León-González and Montolio 2015; Mirestean and Tsangarides 2016; Chen, Mirestean, and Tsangarides 2018; Moral-Benito, Allison, and Williams 2019).↩︎
Hlavac (2016) developed an R package for EBA.↩︎
This option is chosen by setting parameter
g = "None" in the model\_space function, and
described in model_space.↩︎
This option is chosen by setting parameter
HC = TRUE in the model\_space function, and
described in model_space.↩︎
For an introduction to BMA see Kass and Raftery (1995),Raftery+1995,Raftery+1997,Doppelhofer+2009,amini2011bayesian,Beck+2017,Fragoso+2018.↩︎
For a thorough discussion of model priors see Sala-I-Martin, Doppelhofer, and Miller (2004),Ley+2009,George+2010,Eicher+2011.↩︎
To learn more about jointness measures, we recommend reading Doppelhofer and Weeks (2009), Ley+2007, Hofmarcher+2018 in that order.↩︎
See section joint for the interpretations of jointness measures.↩︎
We compute the measure using the command with the
subsetting operator [1:9, 1:9] is applied to restrict the
output to the first nine rows and columns for presentation purposes.↩︎