This chapter introduces core concepts of xxM as a modeling framework and software from a practical standpoint. Key elements of model specification in xxM are introduced in the context of fitting a bivariate random-intercepts model.
We use the bivariate random intercepts model example from Mehta, Neale, and Flay (2005).
\[ y_{pij} = 1 \times \eta_{pj} + e_{pij}, \] \[ e ~ N(0, R) \]
\[ \eta_{pj} = \alpha_{p0} + u_{pj}, \] \[ u~N(0,G) \]
Mixed-effects model matrices \( R \) and \( G \) correspond to \( \Theta \) and \( \Psi \) matrices in SEM. Henceforth, we will use to \( \Theta \) and \( \Psi \) for observed and latent residual covariance matrix, respectively. For two dependent variables, we can write the equation as:
\[ y_{1ij} = 1 \times \eta_{1j} + e_{1ij} \] \[ y_{2ij} = 1 \times \eta_{2j} + e_{2ij} \]
The above bivariate random-intercepts model has 8 parameters:
The following two-level path diagram accurately represents all parameters except the grand means.
A complete model involves just three objects:
These objects are very simple and are constructed and added in a sequential fashion. An advantage of this approach is that complex models just involve more of these objects. The process of constructing and adding these objects remains the same.
A complete xxM
model script has just five succinct commands to accomplish these goals:
xxmModel()
xxmSubmodel()
xxmWithinMatrix()
xxmBetweenMatrix()
xxmRun()
The following sections describes how these five xxM commands are used to construct a complete model.
An n-level xxM
model is composed of n
submodels. The very first step in specifying an xxM
model is to create a model object by invoking xxmModel()
. The idea is to declare names of all levels.
brim <- xxmModel(levels = c("student", "teacher"))
Internally, xxM assigns level numbers \( (lnum = {1,2,\dotsc,P}) \) to each level declared in xxmModel(). The order of levels is also important. In this case, students are influenced by teachers. Hence, the student level must be declared before the teacher level. The intent behind this function is obvious.The function creates an object called brim (i.e., bivariate random intercepts model) from a list of level names. The left hand side is the name of the xxM
model. The choice of the name is arbitrary and other names such as ladyGaga or cow would work. However, it is better to use a short, but descriptive name, as this name will be used in all subsequent commands. Internally, the above invocation literally creates a model with placeholders for the student and teacher submodels, as depicted below.
More formally, the function takes a single parameter aptly called levels and expects to receive a list of level names. The function creates an xxM
object and returns a handle or a pointer to the object in memory.
The following command presents a generic invocation of xxmModel()
.
xxmObject <- xxmModel(levels = list)
At this stage it appears that we have data for students and teachers. Presumably students are nested within teachers. In multilevel modeling jargon, we have two levels with students hierarchically nested within teachers. While the term level is obvious in this simple case, xxM can be used with fairly complex data-structures in which the notion of levels may not be so easy to identify. Levels represent any concrete or abstract set of entities across which some attribute is expected to vary. Very simply, a level involves multiple entities of some kind (e.g., students, situations, responses, occasions etc.) for whom there is an attribute or a variable (e.g., IQ or achievement) of interest. Each level may have its own set of observed and/or latent variables. The definition of levels also implies a flow of influence. Levels have a one-directional relationship. A higher order level is said to influence a lower order level. In this case, we know that teachers impact student outcomes. The directional relationships are implicitly conveyed to xxM in how the levels are ordered in the list.
Our goal is to be able to specify a latent variable model within and across levels to capture the hypothesized flow of influence. The actual model will be presented later, for now we focus on the mechanics of specifying a model in xxM. We begin with the submodels for students and teachers.
Each level may have its very own complete SEM model with observed dependent and exogenous independent variables, latent variables, measurement model, and structural model involving all possible regressions
(observed on observed, observed on latent, latent on observed, and latent on latent). Before we can begin to specify the actual model, we need to provide our model object- brim
, with basic information about each level. This is accomplished by the xxmSubmodel()
function.
brim <- xxmSubmodel( model = brim,
level = student,
parents = c(teacher),
ys = c(y1, y2),
xs = ,
etas = ,
data = student )
These assignment statements build the submodel with the constituent parts. The xxmSubmodel()
function adds basic information to our xxM
model object, brim
.
xxM
object to which this information is being added.xxM
. In this case, the teacher level is a parent of the student level. If there were additional levels of nesting, these would be added to the list of parents as well. The following code provides an example with four levels:parents = c( family, teacher, school, neighborhood )
The corresponding submodel for the teacher level is:
brim <- xxmSubmodel( model = brim,
level = teacher,
parents = ,
ys = ,
xs = ,
etas = c( eta1, eta2 ),
data = teacher )
The teacher level does not have a parent
, nor does it have observed dependent or independent variables. The teacher level does have two latent variables. The latent variables represent random-intercepts of student level dependent variables (y1 and y2). If teachers were nested within a higher level such as school, the parents argument would be:
parents = c( school )
For two level data structures, a single dataset is adequate. However with complex dependent data-structures it is most convenient to provide data for each level separately. Each dataset must include information about how each observation at a lower level is linked to a higher level. In general, datasets must have three types of variables:
The student data has four columns student, teacher, y1 and y2. The first column is for the ID variable for the current level, in this case student. Student has a single parent, teacher. The ID columns must have the same name as the name of the corresponding level. Practically, it means that if in your dataset the ID variables are named SID and TID, these must be renamed to student and teacher.
The teacher dataset (teacher) has no observed variables, nor does it have parents. Yet, it is necessary to have a dataset listing teacher IDs. The teacher data has a single column teacher.
A complete checklist to ensure that the data requirements are met:
factor
type. str(myLevel1Data)
and str(myLevel2Data)
.So far we created a model object called brim by invoking xxmModel()
and declared submodels for student and teacher by invoking xxmSubModel()
. At this point, our model object called brim, looks as follows and is just a shell of the final desired model. The next logical step would be to specify the actual model, i.e., how observed and latent variables relate to each other. This is accomplished by defining parameter matrices.
From an xxM perspective, the above model is specified in terms of parameters and matrices associated with each level and links among variables across levels. This sounds complicated, but in reality we will simply repeat what we have already stated in previous sections:
Parameters of interest in the above model specification are best specified as matrices. This allows complex models to be expressed succinctly. We begin by translating our scalar model formulation into xxM parameter matrices.
At level-1, we only have variances and a covariance for the residuals. The residual covariance matrix is called the theta matrix (\( \Theta \)). The matrix is symmetric with three free parameters: two variances and a covariance \( (\theta_{12}= \theta_{21}) \).
At level-2, we have a covariance and variances among the latent variables, along with their means. Latent covariance and mean matrices are called \( Psi \) (psi) and \( \alpha \) (alpha), respectively:
The above three matrices (\( \Theta \), \( \Psi \), & \( \alpha \)) capture information about variables within each level. The missing piece is the link among variables across levels. In this case, the teacher level latent variables influence student level observed variables. Very concretely, the effects are:
\[ y_{1ij} = 1 \times \eta_{1j} + 0 \times \eta_{2j} + e_{1ij} \] \[ y_{2ij} = 0 \times \eta_{1j} + 1 \times \eta_{2j} + e_{2ij} \]
We can represent the effects in a table, with columns representing the independent variables and rows representing the dependent variables. In xxM
matrices, columns independent variables and rows represent dependent variables. With this convention we can now represent the coefficients in a single matrix \( \Lambda \) (lambda):
In LISREL and xxM, \( \Lambda \) matrix is used to capture measurement relationship. In this case, level-2 latent variables are said to be measured by level-1 observed variables.
We have now defined four matrices that completely specify the underlying bivariate random-intercepts model. Once the model itself is clearly defined, the actual specification is really very trivial.There are just two commands for specifying parameter matrices:
brim <- xxmWithinMatrix(
)
brim <- xxmBetweenMatrix(
)
Essentially, we want to add the above four matrices to complete the model. Of the four matrices, the first matrix is a within-student matrix \( \theta \), the next two matrices are within-teacher matrices \( \alpha \) & \( \Psi \) and the last matrix (\( \Lambda \)) connects the two levels and is therefore an across-level or between matrix. At this point several things must be obvious:
xxmWithinMatrix()
three times, first for the student level and then twice for the teacher level. xxmBetweenMatrix()
will be called once connecting the teacher level to the student level.brim <- xxmWithinMatrix( model = brim,
level = student,
type = theta
)
brim <- xxmWithinMatrix( model = brim,
level = teacher,
type = psi,
)
brim <- xxmWithinMatrix( model = brim,
level = teacher,
type = alpha,
)
brim <- xxMBetweenMatrix( model = brim,
parent = teacher,
child = student,
type = lambda,
)
Now we know the general procedure for adding a matrix to the model. Let us now examine how a matrix to be added is actually specified.
Note that the first three parameter matrices (\( \Theta \), \( \Psi \), and \( \alpha \)) are somewhat different from the last matrix (\( \Lambda \)). The first three matrices include model parameters that are to be freely estimated.
In contrast, all four elements of the last matrix are fixed. We already know their values. This idea of free vs. fixed parameters is central in SEM. In essence, for each parameter we need to tell xxM
if the parameter is to be estimated or if the parameter is to be fixed to some known value. For each parameter matrix, we need to define two separate matrices:
It is easier done than said. We use a two part name including:
lambda_pattern <- matrix(c(0, 0, 0, 0), 2, 2)
lambda_value <- matrix(c(1, 0, 0, 1), 2, 2)
All elements of pattern matrix for \( \Lambda \) are zero indicating that none of the parameters are free to be estimated. Instead, all parameters are to be fixed to some known values. The value matrix provides the corresponding values.
The diagonal elements are to be fixed to 1.0, whereas the off-diagonal elements are to be fixed to 0.0. Compare the specification of \( \Lambda \) with that of the \( \Theta \) matrix:
theta_pattern <- matrix(c(1, 1, 1, 1), 2, 2)
theta_value <- matrix(c(1.1, 0.2, 0.2, 2.3), 2, 2)
All four elements of the \( \Theta \) matrix are to be freely estimated. Hence all four elements in the pattern matrix are 1s. The value matrix provides start values. At this point, you may complain that you do not actually have any idea as to what these values may be. In general, almost any reasonable set of values will work. More specifically, for a residual covariance matrix, the following rules work very well in practice:
So far, we have described
Now we will see how these matrices are constructed in R
and added to our xxM
model.
We create pattern
and value
matrices for each of the four model matrices, and add these matrices to our model as described earlier:
# Theta create
theta_pattern <- matrix( c(1,1,1,1), 2,2 )
theta_value <- matrix( c(1.1,.2,.2,2.3), 2,2 )
# Theta add
brim <- xxmWithinMatrix( model = brim,
level = student,
type = theta,
pattern = theta_pattern,
value = theta_value)
# Psi create
psi_pattern <- matrix( c(1,1,1,1), 2,2 )
psi_value <- matrix( c(.1,.05,.05,.2), 2,2 )
# psi add
brim <- xxmWithinMatrix( model = brim,
level = teacher,
type = psi,
pattern = psi_pattern,
value = psi_value)
# Alpha create
alpha_pattern <- matrix( c(1,1) , 2,1)
alpha_value <- matrix( c(1.1,2.1) , 2,1)
# Alpha add
brim <- xxmWithinMatrix( model = brim,
level = teacher,
type = alpha,
pattern = alpha_pattern,
value = alpha_value)
# Lambda create
lambda_pattern <- matrix( c(0,0,0,0), 2,2 )
lambda_value <- matrix( c(1,0,0,1), 2,2 )
# Lambda add
brim <- xxmBetweenMatrix( model = brim,
parent = teacher,
child = student,
type = lambda,
pattern = lambda_pattern,
value = lambda_value)
The following diagram illustrates a one-to-one correspondence between the path-diagram, corresponding model matrices, and xxM
specification.
Note:
All elements of the factor covariance matrix (psi: \( \psi \)), residual covariance matrix (theta: \( \Theta \)), and the factor-mean matrix (alpha: \( \alpha \)) are freely estimated. All values in respective pattern matrices are 1.
Factor covariance matrix and residual covariance matrix are symmetric. Hence, there are only three free-parameters. The xxM package constrains off-diagonal elements to be equal \( (\theta_{21} = \theta_{12}) \).
All elements of the across-level factor-loading matrix (lambda:\( \Lambda \)) are fixed to known values. All values in lambda_patten matrix are 0. The corresponding value matrix indicates that the diagonal elements are fixed to 1.0 and off-diagonal elements are fixed to 0.0.
If all went well above and xxM did not produce any error messages, then our model object brim has all the information it needs to estimate the model parameters. We can begin estimation by issuing a simple command:
brim <- xxmRun(brim)
[su_tabs] [su_tab title=“xxM”]
library(xxm)
data(brim.xxm, package = "xxm")
For each parameter matrix, construct three related matrices:
theta_pattern <- matrix(c(1, 1, 1, 1), nrow = 2, ncol = 2)
theta_value <- matrix(c(1, 0, 0, 1), nrow = 2, ncol = 2)
alpha_pattern <- matrix(1, 2, 1)
alpha_value <- matrix(c(0, 0), 2, 1)
psi_pattern <- matrix(c(1, 1, 1, 1), , 2, 2)
psi_value <- matrix(c(0.1, 0, 0, 0.1), 2, 2)
lambda_pattern <- matrix(c(0, 0, 0, 0), 2, 2)
lambda_value <- matrix(c(1, 0, 0, 1), 2, 2)
xxmModel()
is used to declare level names. The function returns a model object that is passed as a parameter to subsequent stattements.
xm <- xxmModel(levels = c("student", "teacher"))
## A new model was created.
For each declared level xxmSubmodel()
is invoked to add corresponding submodel to the model object. The function adds three pieces of information:
1. parents declares a list of parents of the current level.
2. variables declares names of observed dependent (ys), observed independent (xs) and latent variables (etas) for the level.
3. data R data object for the current level.
### Submodel: Student
xm <- xxmSubmodel(model = xm, level = "student", parents = c("teacher"), ys = c("y1",
"y2"), xs = , etas = , data = brim.student)
## Submodel for level `student` was created.
### Submodel: Teacher
xm <- xxmSubmodel(model = xm, leve = "teacher", parents = , ys = , xs = , etas = c("eta1",
"eta2"), data = brim.teacher)
## Submodel for level `teacher` was created.
For each declared level xxmWithinMatrix()
is used to add within-level parameter matrices. For each parameter matrix, the function adds the three matrices constructed earlier:
### Within-Matrix: Level :: Student, Type :: observed residual-covariance
xm <- xxmWithinMatrix(model = xm, level = "student", type = "theta", pattern = theta_pattern,
value = theta_value)
##
## 'theta' matrix does not exist and will be added.
## Added `theta` matrix.
### Within-Matrix: Level :: Teacher, Type :: latent means
xm <- xxmWithinMatrix(model = xm, level = "teacher", type = "alpha", pattern = alpha_pattern,
value = alpha_value)
##
## 'alpha' matrix does not exist and will be added.
## Added `alpha` matrix.
### Within-Matrix: Level :: Teacher, Type :: latent residual covariance
xm <- xxmWithinMatrix(model = xm, level = "teacher", type = "psi", pattern = psi_pattern,
value = psi_value)
##
## 'psi' matrix does not exist and will be added.
## Added `psi` matrix.
Pairs of levels that share parent-child relationship have regression relationships. xxmBetweenMatrix()
is used to add corresponding parameter matrices connecting the two levels.
For each parameter matrix, the function adds the three matrices constructed earlier:
### Between-Matrix: Parent :: Teacher, Child :: Student, Type ::
### factor-loadings
xm <- xxmBetweenMatrix(model = xm, parent = "teacher", child = "student", type = "lambda",
pattern = lambda_pattern, value = lambda_value)
##
## 'lambda' matrix does not exist and will be added.
## Added `lambda` matrix.
Estimation process is initiated by xxmRun()
. If all goes well, a quick printed summary of results is produced.
xm <- xxmRun(xm)
## ------------------------------------------------------------------------------
## Estimating model parameters
## ------------------------------------------------------------------------------
## 456.5781921615
## 454.3359125436
## 452.6051730894
## 451.0652787053
## 450.0531684945
## 448.6173091428
## 447.5320743183
## 447.1954369085
## 445.8740258784
## 445.7721481833
## 445.4488373694
## 445.4357270893
## 445.4338550574
## 445.4331653124
## 445.4331434919
## 445.4331407646
## 445.4331407251
## Model converged normally
## nParms: 8
## ------------------------------------------------------------------------------
## *
## 1: student_theta_1_1 :: 0.969 [ 0.000, 0.000]
##
## 2: student_theta_1_2 :: 0.338 [ 0.000, 0.000]
##
## 3: student_theta_2_2 :: 0.893 [ 0.000, 0.000]
##
## 4: teacher_psi_1_1 :: 0.752 [ 0.000, 0.000]
##
## 5: teacher_psi_1_2 :: 0.128 [ 0.000, 0.000]
##
## 6: teacher_psi_2_2 :: 0.159 [ 0.000, 0.000]
##
## 7: teacher_alpha_1_1 :: -0.107 [ 0.000, 0.000]
##
## 8: teacher_alpha_2_1 :: 0.278 [ 0.000, 0.000]
##
## ------------------------------------------------------------------------------
Once parameters are estimated, confidence inetrvals are estimated by invoking xxmCI()
. Depending on the the number of observations and the complexity of the dependence structure xxmCI()
may take very long. xxMCI()
displays a table of parameter estimates and CIS.
A summary of results may be retrived as an R list by a call to xxmSummary()
s <- xxmSummary(xm)
s
## $fit
## $fit$deviance
## [1] 445.4
##
## $fit$nParameters
## [1] 8
##
## $fit$nObservations
## [1] 150
##
## $fit$aic
## [1] 461.4
##
## $fit$bic
## [1] 485.5
##
##
## $estimates
## child parent to from label estimate low high
## 1 student student y1 y1 student_theta_1_1 0.9691 0.670784 1.4741
## 2 student student y1 y2 student_theta_1_2 0.3381 0.090586 0.6823
## 3 student student y2 y2 student_theta_2_2 0.8935 0.618407 1.3589
## 8 teacher teacher eta1 eta1 teacher_psi_1_1 0.7524 0.293264 1.6633
## 9 teacher teacher eta1 eta2 teacher_psi_1_2 0.1283 -0.156114 0.5437
## 10 teacher teacher eta2 eta2 teacher_psi_2_2 0.1592 0.001000 0.5548
## 11 teacher teacher eta1 One teacher_alpha_1_1 -0.1073 -0.529914 0.3154
## 12 teacher teacher eta2 One teacher_alpha_2_1 0.2781 0.002623 0.5537
xxM
model object may hog a large amount of RAM outside of R's memory. This memory will automatically be released, when R's workspace is cleared by a call to rm(list=ls())
or at the end of the R session. Alternatively, xxmFree()
may be called to release memory.
xm <- xxmFree(xm)
[/su_tab] [su_tab title=“SAS: Proc Mixed”]
Proc Mixed data = brim covtest;
CLASS teacher vars;
MODEL y = vars / solution noint;
RANDOM vars / subject = teacher type = un;
REPEATED vars/ subject = student(teacher) type = un;
RUN;
[/su_tab] [/su_tabs]
For the current dataset, the parameter estimates are:
[latexpage]