plm tries to follow as close as possible the way models are fitted
using lm. This relies on the following steps, using the
formula-data with some modifications:
model.frame by getting the
relevant arguments (formula, data,
subset, weights, na.action and
offset) and the supplementary argument,model.frame the response
y (with pmodel.response) and the model matrix
X (with model.matrix),plm.fit
with X and y as arguments.Panel data has a special structure which is described by an
index argument. This argument can be used in the
pdata.frame function which returns a
pdata.frame object. A pdata.frame can be used
as input to the data argument of plm. If the
data argument of plm is an ordinary
data.frame, the index argument can also be
supplied as an argument of plm. In this case, the
pdata.frame function is called internally to transform the
data.
Next, the formula, which is the first and mandatory
argument of plm is coerced to a Formula
object.
model.frame is then called, but with the
data argument in the first position (a
pdata.frame object) and the formula in the
second position. This unusual order of the arguments enables to use a
specific model.frame.pdata.frame method defined in
plm.
As for the model.frame.formula method, a
data.frame is returned, with a terms
attribute.
Next, the X matrix is extracted using
model.matrix. The usual way to do so is to feed the
function with two arguments, a formula or a
terms object and a data.frame created with
model.frame. lm uses something like
model.matrix(terms(mf), mf) where mf is a
data.frame created with model.frame.
Therefore, model.matrix needs actually one argument and not
two and we therefore wrote a model.matrix.pdata.frame which
does the job ; the method first checks that the argument has a
term attribute, extracts the terms (actually
the formula) and then computes the model’s matrix
X.
The response y is usually extracted using
model.response, with a data.frame created with
model.frame as first argument, but it is not generic. We
therefore created a generic called pmodel.response and
provide a pmodel.response.pdata.frame method. We illustrate
these features using a simplified (in terms of covariates) example with
the SeatBelt data set:
library("plm")
data("SeatBelt", package = "pder")
SeatBelt$occfat <- with(SeatBelt, log(farsocc / (vmtrural + vmturban)))
pSB <- pdata.frame(SeatBelt)We start with an OLS (pooling) specification:
formols <- occfat ~ log(usage) + log(percapin)
mfols <- model.frame(pSB, formols)
Xols <- model.matrix(mfols)
y <- pmodel.response(mfols)
coef(lm.fit(Xols, y))##   (Intercept)    log(usage) log(percapin) 
##     7.4193570     0.1657293    -1.1583712which is equivalent to:
##   (Intercept)    log(usage) log(percapin) 
##     7.4193570     0.1657293    -1.1583712Next, we use an instrumental variables specification. Variable
usage is endogenous and instrumented by three variables
indicating the law context: ds, dp, and
dsp.
The model is described using a two-parts formula, the first part of the RHS describing the covariates and the second part the instruments. The following two formulations can be used:
formiv1 <- occfat ~ log(usage) + log(percapin) | log(percapin) + ds + dp + dsp
formiv2 <- occfat ~ log(usage) + log(percapin) | . - log(usage) + ds + dp + dspThe second formulation has two advantages:
- sign in the second RHS part of the formula.The formula is coerced to a Formula, using the
Formula package. model.matrix.pdata.frame then
internally calls model.matrix.Formula in order to extract
the covariates and instruments model matrices:
mfSB1 <- model.frame(pSB, formiv1)
X1 <- model.matrix(mfSB1, rhs = 1)
W1 <- model.matrix(mfSB1, rhs = 2)
head(X1, 3) ; head(W1, 3)##    (Intercept) log(usage) log(percapin)
## 8            1 -0.7985077      9.955748
## 9            1 -0.4155154      9.975622
## 10           1 -0.4155154     10.002110##    (Intercept) log(percapin) ds dp dsp
## 8            1      9.955748  0  0   0
## 9            1      9.975622  1  0   0
## 10           1     10.002110  1  0   0For the second (and preferred formulation), the dot
argument should be set and is passed to the Formula
methods. . has actually two meanings:
which correspond respectively to dot = "seperate" (the
default) and dot = "previous". See the difference between
the following two examples:
##       occfat log(usage) log(percapin) state year farsocc farsnocc usage
## 8  -3.788976 -0.7985077      9.955748    AK 1990      90        8  0.45
## 9  -3.904837 -0.4155154      9.975622    AK 1991      81       20  0.66
## 10 -3.699611 -0.4155154     10.002110    AK 1992      95       13  0.66
##    percapin unemp  meanage   precentb   precenth  densurb   densrur
## 8     21073  7.05 29.58628 0.04157167 0.03252657 1.099419 0.1906836
## 9     21496  8.75 29.82771 0.04077293 0.03280357 1.114670 0.1906712
## 10    22073  9.24 30.21070 0.04192957 0.03331731 1.114078 0.1672785
##         viopcap    proppcap vmtrural vmturban fueltax lim65 lim70p mlda21 bac08
## 8  0.0009482704 0.008367458     2276     1703       8     0      0      1     0
## 9  0.0010787370 0.008940661     2281     1740       8     0      0      1     0
## 10 0.0011257068 0.008366873     2005     1836       8     1      0      1     0
##    ds dp dsp
## 8   0  0   0
## 9   1  0   0
## 10  1  0   0##       occfat log(usage) log(percapin) ds dp dsp
## 8  -3.788976 -0.7985077      9.955748  0  0   0
## 9  -3.904837 -0.4155154      9.975622  1  0   0
## 10 -3.699611 -0.4155154     10.002110  1  0   0In the first case, all the covariates are returned by
model.frame as the . is understood by default
as “everything”.
In plm, the dot argument is internally set
to previous so that the end-user doesn’t have to worry
about these subtleties.
mfSB2 <- model.frame(pSB, formiv2)
X2 <- model.matrix(mfSB2, rhs = 1)
W2 <- model.matrix(mfSB2, rhs = 2)
head(X2, 3) ; head(W2, 3)##    (Intercept) log(usage) log(percapin)
## 8            1 -0.7985077      9.955748
## 9            1 -0.4155154      9.975622
## 10           1 -0.4155154     10.002110##    (Intercept) log(percapin) ds dp dsp
## 8            1      9.955748  0  0   0
## 9            1      9.975622  1  0   0
## 10           1     10.002110  1  0   0The IV estimator can then be obtained as a 2SLS estimator: First, regress the covariates on the instruments and get the fitted values:
##    (Intercept) log(usage) log(percapin)
## 8            1 -1.0224257      9.955748
## 9            1 -0.5435055      9.975622
## 10           1 -0.5213364     10.002110Next, regress the response on these fitted values:
##   (Intercept)    log(usage) log(percapin) 
##     7.5641209     0.1768576    -1.1722590The same can be achieved in one command by using the
formula-data interface with
plm:
##   (Intercept)    log(usage) log(percapin) 
##     7.5641209     0.1768576    -1.1722590or with the ivreg function from package AER
(or with the newer function ivreg in package
ivreg superseding AER::ivreg()):
##   (Intercept)    log(usage) log(percapin) 
##     7.5641209     0.1768576    -1.1722590