Syntax for FMM Regression for Multiple Observations Within Cases

Red Owl

Join Date: Nov 2016
Posts: 127

Syntax for FMM Regression for Multiple Observations Within Cases

14 Dec 2018, 14:14

I need to estimate a 3-latent-class FMM regression (with a continous dependent variable) analyzing a data set that includes multiple observations within cases.

A hypothetical example of the data structure is:

Code:

caseid    obsnum    score    predvar1    predvar2    predvar3    covariate1    covariate2
ID01       1        55        3           10          11           0             0
ID01       2        72        8           9           6            30            0
ID01       3        36        10          11          2            30            0
.          .        .         .           .           .            .             .
.          .        .         .           .           .            .             .
.          .        .         .           .           .            .             .
ID01       46       89        7           4           14           30            0
ID01       47       78        12          2           9            30            0
ID01       48       45        5           6           6            30            0
ID02       1        49        3           10          11           41            1
ID02       2        68        8           9           6            41            1
ID02       3        59        10          11          2            41            1
.          .        .         .           .           .            .             .
.          .        .         .           .           .            .             .
.          .        .         .           .           .            .             .
ID02       46        61       7           4           14           41            1
ID02       47        96       12          2           9            41            1
ID02       48        40       5           6           6            41            1
.          .        .         .           .           .            .             .
.          .        .         .           .           .            .             .
.          .        .         .           .           .            .             .
ID300      48       83        5           6           6            27            0

The dependent variable, score, is continuous, and varies across observations (obsnum) within cases (caseid).
The predictor variables, predvar1-predvar3, are also continuous and also vary across observations within cases.
One covariate, covariate1, is continuous, and the other, covariate2, is binary.
Both covariates are constant across observations within cases and only vary across cases.

There are 300 cases with 48 observations within each case for a total of 14,400 observations.

If I did not have multiple observations within cases, I would structure the FMM regression as:

Code:

fmm 3, lcprob(covariate1 covariate2): regress score predvar1 predvar2 predvar3

I could modify the code to address the correlated-response problem due to the multiple observations within cases by adding robust standard errors clustered on caseid as:

Code:

fmm 3, lcprob(covariate1 covariate2) vce(cluster caseid): regress score predvar1 predvar2 predvar3

However, that would still allow me to predict posterior class membership probabilities only at the observation level within cases rather than at the case level, and I am not sure how I could then classify the cases into a latent class since the posterior class membership probabilities will vary within the cases.

I would appreciate any advice about how to structure the syntax for this 3-latent-class FMM regression in order to classify cases into the latent classes identified with FMM. Can this be done with -fmm- or -gsem- syntax?

(I'm familiar with and have used Pacifico's lclogit, from SSC, to conduct latent class conditional logistic regression for binary choice variables. My current project is a conjoint analysis where the choice variable is continuous, and I'm hoping I can use -fmm- or -gsem- for the analysis.)

Red Owl
Stata/IC 15.1, Windows 10 (64-bit)

Tags: None

Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

14 Dec 2018, 16:32

If you were not fitting an FMM, but instead you were just regressing the score on the 3 independent variables, how would you structure your data? Would you be fitting some sort of hierarchical model? If so, be aware that Stata 15 can't mix continuous latent variables (like random intercepts) with categorical ones (like latent classes).

More conceptually, what are you trying to do? If you had point in time data, this would make sense, but the fact is that you have 3 observations per person.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Red Owl

Join Date: Nov 2016

Posts: 127
#3

14 Dec 2018, 19:46

Weiwen Ng Thanks as always for your reply.

I don't have three separate dependent variables and the data were all collected at a single time point.

I have one continuous dependent variable that was collected in a single survey as 48 rating scores (0-100) from each of 300 subjects based on their reactions to 48 different profiles (combinations of levels) of three continuous predictor variables (predvar1, predvar2, and predvar3). The profiles of the predictor variable values were designed using optimal experimental design techniques to maximize D-efficiency.

Conceptually, this is a decision or judgment study modeled after a well established method that is usually called conjoint analysis in marketing or social judgment analysis in social psychology. This type of decision analysis is typically done using either (a) OLS regression with multiple responses from a single subject or (b) multilevel modeling with multiple responses from multiple subjects (controlling for correlated-response error due to the within-subjects measurement).

In my study, subjects were asked to consider multiple (k=48) combinations of levels of three continuous factors (predvar1-predvar3) believed to affect their satisfaction (or a similar judgment) measured on a continuous scale (score). After considering the levels of the factors (predvar1-predvar3) for a given profile, the subjects assigned a rating score for that profile on a continuous scale. Each of the 300 subjects in my project provided 48 rating scores in response to 48 profiles consisting of varying levels of predvar1-predvar3. (That produced 14,400 observations reflecting 48 observations for each of 300 subjects.) I also collected data on two demographic characteristics of the participants to use as covariates in the analysis, but that is of lesser importance in terms of my current question.

I want to use FMM because I do not accept the single-rational-actor assumption that there is only one grand model of decision preferences for the population I am studying. Instead, I want to explore the possibility of multiple latent classes of decision preferences within my population of interest and to develop a separate decision model (essentially a regression model) for each of the discovered latent classes.

The problem I am facing is that I don't know the correct -fmm- or -gsem- syntax to indicate that there are multiple (i.e., 48) responses per subject for each of my 300 subjects. The posterior class membership probabilities that I predict with FMM are, therefore, estimated at the observation level rather than at the case level -- because FMM considers each observation as a case. That makes it difficult (impossible?) to classify cases because the posterior class probabilities vary for the various observations within a case.

What I am hoping to learn in response to this post is the syntax in either -fmm- or -gsem- that I can use to conduct an FMM regression for a single continuous dependent variable with three continuous predictor variables, when multiple instances of the dependent variable are provided by each case at a single point in time. I would also be interested to know how to incorporate in the syntax a set of covariates, which vary only by case and are constant within cases. (I already know how to use lcprob() in FMM when each case provides only a single observation.)

I hope the example data structure and information I provided in my original post will be clearer with these additional details.

Thanks for any help you can provide with the syntax.

Red Owl
Stata/IC 15.1, Windows 10 (64-bit)
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#4

15 Dec 2018, 17:38

Originally posted by Red Owl View Post

...
I have one continuous dependent variable that was collected in a single survey as 48 rating scores (0-100) from each of 300 subjects based on their reactions to 48 different profiles (combinations of levels) of three continuous predictor variables (predvar1, predvar2, and predvar3). The profiles of the predictor variable values were designed using optimal experimental design techniques to maximize D-efficiency.

Conceptually, this is a decision or judgment study modeled after a well established method that is usually called conjoint analysis in marketing or social judgment analysis in social psychology. This type of decision analysis is typically done using either (a) OLS regression with multiple responses from a single subject or (b) multilevel modeling with multiple responses from multiple subjects (controlling for correlated-response error due to the within-subjects measurement).

In my study, subjects were asked to consider multiple (k=48) combinations of levels of three continuous factors (predvar1-predvar3) believed to affect their satisfaction (or a similar judgment) measured on a continuous scale (score). After considering the levels of the factors (predvar1-predvar3) for a given profile, the subjects assigned a rating score for that profile on a continuous scale. Each of the 300 subjects in my project provided 48 rating scores in response to 48 profiles consisting of varying levels of predvar1-predvar3. (That produced 14,400 observations reflecting 48 observations for each of 300 subjects.) I also collected data on two demographic characteristics of the participants to use as covariates in the analysis, but that is of lesser importance in terms of my current question.

I want to use FMM because I do not accept the single-rational-actor assumption that there is only one grand model of decision preferences for the population I am studying. Instead, I want to explore the possibility of multiple latent classes of decision preferences within my population of interest and to develop a separate decision model (essentially a regression model) for each of the discovered latent classes.

...

I thought that the standard procedure with multiple observations per person would be to fit a multilevel model, but I wanted to make sure of this. Here's the problem: Stata 15 isn't capable of fitting models that contain both continuous and categorical latent variables. The continuous latent variable in this context would be the random intercept for each person. The categorical latent variable is, obviously, the latent class.

Perhaps someone who knows better will reply, but I don't think that an FMM extension of the multilevel model approach is feasible in Stata 15. I can tell you how I think the gsem syntax would look, and if you ran this, I bet Stata would say more or less the first sentence in the paragraph above:

Code:

gsem (score <- predvar? M1[caseid], gaussian) (C <- covariate1 covariate2), lclass(C 3) lcinvariant(none)

Conceptually, your objection is sensible. I know that finite mixtures of item response theory models are a thing, and what you are trying to do would seem like a parallel concept. I will counter, however, that your model isn't assuming only one model of decision preferences - it's more like it's telling you the average set of decision preferences. If readers then assume that everyone responds according to that one set of average preferences, then that's their fault (albeit everyone has probably done this at some point).

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Red Owl

Join Date: Nov 2016

Posts: 127
#5

16 Dec 2018, 01:02

Weiwen Ng Thanks so much.

I have used multilevel models for this purpose in the past and may have to do it in this case. However, as you know, that will produce a single model rather than the separate models I had hoped to obtain with FMM.

I did try your code with my data set, and I did receive the error message you had predicted. I'm going to contact Stata support to see if there is any other undocumented approach I can try as a last resort with Stata.

If I can't find a way to do this in Stata, I'll explore whether it can be done with one of the R packages that implement FMM or possibly with MPlus or LatentGold.

If I find a solution, I'll post it here.

Red Owl
Stata/IC 15.1, Windows 10 (64-bit)
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#6

17 Dec 2018, 08:44

Originally posted by Red Owl View Post

Weiwen Ng Thanks so much.

I have used multilevel models for this purpose in the past and may have to do it in this case. However, as you know, that will produce a single model rather than the separate models I had hoped to obtain with FMM.

I did try your code with my data set, and I did receive the error message you had predicted. I'm going to contact Stata support to see if there is any other undocumented approach I can try as a last resort with Stata.

If I can't find a way to do this in Stata, I'll explore whether it can be done with one of the R packages that implement FMM or possibly with MPlus or LatentGold.

If I find a solution, I'll post it here.

Red Owl
Stata/IC 15.1, Windows 10 (64-bit)

If you find a specific R package or packages with functionality similar to Stata for FMM and LCA/LPA, I'd love to know about it. Finding which R package to use is one thing that has hindered my adoption of R.

We are all aware that with a multilevel model, you will receive one set of results with the average response. You are undoubtedly aware, though, that the random slopes and intercepts can give you some sense of how your sample varies. You also alluded to some characteristics you had which you thought would be associated with the latent class. Example 6 of mixed does show a model that estimated different mean growth rates and different variances in growth rates between boys and girls. It may be possible to get something like what you want, albeit with a different set of tools.

Meanwhile, it seems entirely logical that we ask for the capability to estimate models with both categorical and continuous latent variables in Stata 16.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement

Syntax for FMM Regression for Multiple Observations Within Cases

Comment

Comment

Comment

Comment

Comment