Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fitting latent class analysis model within continuous and binary variables

    Dear Stata Users,
    I am trying to estimate the following model:
    Code:
    Profit Domestic_Good Govern_aid Domestic_Good x Govern_aid
    Where “Profit” is profit of a firm (gvkey) in a given year (fyear); “Domestic_Good” – is a binary outcome equal to 1 if a firm uses domestically produced goods in its production and zero otherwise; “Govern_aid” is a percentage of governmental aid the firm receives. The coefficient under interest is “Domestic_Good x Govern_aid”. Before running the regression I need to group observations within each year into homogenous classes using latent class model. I assume that there are different types of firms that have different exposure to “Domestic_Good x Govern_aid”. I need to estimate the model using expectation maximization algorithm. Can you please help me how to do this? (the number of classes should equal to 10).





    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str6 gvkey double fyear float(Profit Govern_aid Domestic_Good Domestic_Good_Govern_aid)
    "001003" 1987  -.04348366    -.3082871 1    -.3082871
    "001003" 1988  -1.0623115    -.7197108 1    -.7197108
    "001004" 1987   .06634676      .218626 0            0
    "001004" 1988   .06223206   .069745064 0            0
    "001004" 1989   .05128117    -.4812982 1    -.4812982
    "001004" 1990   .04331039    -.1567301 1    -.1567301
    "001004" 1991    .0446404   -.29046023 1   -.29046023
    "001004" 1992 .0013825137   -.03921402 1   -.03921402
    "001004" 1993   .04418078   .010171056 0            0
    "001004" 1994   .04576015    .04414248 0            0
    "001004" 1995   .06578331    .14665353 0            0
    "001004" 1996   .06505051    .13149571 0            0
    "001004" 1997  .063185364  -.034426987 1  -.034426987
    "001004" 1998    .0568946   -.29442823 1   -.29442823
    "001004" 1999   .06502337   -.57947755 1   -.57947755
    "001004" 2000   .04971404    1.0814219 0            0
    "001004" 2001  -.15628795   -.51685584 1   -.51685584
    "001004" 2002 -.034037974    .13492393 0            0
    "001004" 2003  .024447165     .2280464 0            0
    "001004" 2004   .06012163     .4134352 0            0
    "001004" 2005    .0672745     .1650939 0            0
    "001004" 2006   .06735225     .2222587 0            0
    "001004" 2007   .06177252    -.4368991 1    -.4368991
    "001004" 2008   .10781982    .25527966 0            0
    "001004" 2009   .07807629   -.13767487 1   -.13767487
    "001004" 2010   .09402896    .33128345 0            0
    "001004" 2011  .064509116    -.4100298 1    -.4100298
    "001004" 2012   .11333438     .4500034 0            0
    "001004" 2013   .09227814   -.09624827 1   -.09624827
    "001004" 2014  -.05669359   -.09596556 1   -.09596556
    "001004" 2015    .0387043    -.0456534 1    -.0456534
    "001004" 2016   .05958378    .28017437 0            0
    "001004" 2017   .06139985    .09440386 0            0
    "001009" 1987   .05621487   -.20167345 1   -.20167345
    "001009" 1988   .12529714     .4917264 0            0
    "001009" 1989   .11172394  -.005341172 1  -.005341172
    "001009" 1990   .08092978    -.2338522 1    -.2338522
    "001009" 1991   .05654528      .770165 0            0
    "001009" 1992   .09453684    .16616976 0            0
    "001009" 1993   .09340063     .6258514 0            0
    "001009" 1994   .07021874 -.0027548075 1 -.0027548075
    "001011" 1988   .20208144    1.1455235 0            0
    "001011" 1989  -.15924017   -.13693374 1   -.13693374
    "001011" 1990  -.26331362    .08611465 0            0
    "001011" 1991   -.3427602   -.17631167 1   -.17631167
    "001011" 1992   -.4025051     .1734942 0            0
    "001011" 1993  -.16086596    2.2963157 0            0
    "001011" 1994  -.08814143   -.07715726 1   -.07715726
    "001012" 1987  -.03114572   .007808805 0            0
    "001012" 1988  -.04062851   -.12377375 1   -.12377375
    "001013" 1987   .08753087     .2179482 0            0
    "001013" 1988   .08167444    -.3476874 1    -.3476874
    "001013" 1989   .10767787    .18414843 0            0
    "001013" 1990   .11269678    .44386065 0            0
    "001013" 1991    .1012144    -.2356139 1    -.2356139
    "001013" 1992    .0642345     .6261415 0            0
    "001013" 1993  .064123236     .4024621 0            0
    "001013" 1994   .04008248     .3668667 0            0
    "001013" 1995   .04199138    .16907322 0            0
    "001013" 1996  .034853037       .52984 0            0
    "001013" 1997  .024422204   -.57246476 1   -.57246476
    "001013" 1998  .033177745     .7924265 0            0
    "001013" 1999   .02824538     .4544656 0            0
    "001013" 2000   .12121974   -.07422364 1   -.07422364
    "001013" 2001  -.07820755    -.5869634 1    -.5869634
    "001013" 2002   -.3177378    -.3542563 1    -.3542563
    "001013" 2003  -.06071073    .11506605 0            0
    "001013" 2004  .015099168    -.3131507 1    -.3131507
    "001013" 2005    .0477568    .25765538 0            0
    "001013" 2006   .04633717    -.4380039 1    -.4380039
    "001013" 2007   .06755581   -.07600397 1   -.07600397
    "001013" 2008  -.02018989    -.4259862 1    -.4259862
    "001013" 2009   -.6599663   -.04695535 1   -.04695535
    "001013" 2010   .09582397      .981482 0            0
    "001016" 1987  -.13767068     .1718787 0            0
    "001017" 1987   .03736314   -.15747285 1   -.15747285
    "001017" 1988  -.05259182    -.3294851 1    -.3294851
    "001017" 1989   -.2873082   -.51127315 1   -.51127315
    "001017" 1990   .08427543     .3156046 0            0
    "001017" 1991    .3920818    .02538979 0            0
    "001017" 1992 -.066185944    -.2947895 1    -.2947895
    "001017" 1993  .066498056    .14714718 0            0
    "001017" 1994   .05605729    1.6039975 0            0
    "001020" 1987    .1196854    .23144484 0            0
    "001021" 1988   .08219262   -.06222588 1   -.06222588
    "001021" 1989  -.03423223    -.4357524 1    -.4357524
    "001021" 1990   .06011396  -.004986525 1  -.004986525
    "001021" 1996   .12422745    -.4367221 1    -.4367221
    "001021" 1997    .1522945    .16471565 0            0
    "001023" 1987   .10116904    .07030249 0            0
    "001028" 1987   .03848406    -.2745371 1    -.2745371
    "001028" 1988  .033615317    -.4615061 1    -.4615061
    "001028" 1989   -.3129978   -.53746355 1   -.53746355
    "001028" 1990  -1.4715513    -.7133849 1    -.7133849
    "001034" 1987  .062956676     .4765148 0            0
    "001034" 1988    .1152793   -.09594667 1   -.09594667
    "001034" 1989   .07660044     .4059094 0            0
    "001034" 1990   .06915995     .0526402 0            0
    "001034" 1991   .01782257    .20336723 0            0
    "001034" 1992   .02723884    -.0946424 1    -.0946424
    end

  • #2
    (the number of classes should equal to 10).
    Unless you simulated data, you would not know that you have 10 classes. Normally, in latent class analysis, we would fit a number of models with varying numbers of latent classes, then use BIC to select the final number of classes. Also, with just two indicators of the latent class, I doubt you'd be able to identify 10 latent classes.

    If you want to do this, you could read SEM examples 50 through 52. You'd see how to fit a latent class model, e.g.

    Code:
    gsem (Govern_aid -> cons) (Domestic_good -> , logit), lclass(C 2)
    You could then read the part in the SEM example that deals with modal class assignment, i.e. we assume you belong to the latent class with the highest posterior probability of membership. You should note that this isn't quite correct; we are really not 100% sure which latent class you belong to. It may do as an approximation.

    I guess that after this, you could fit a model to each class separately, or include the modal class as a categorical variable in your regression.

    You might actually want to read about finite mixture modeling. You are assuming some sort of regression relationship that differs across some latent classes. FMM is not LCA. In FMM, you use the independent variables in the model just as is. You're assuming that there are, for example, 3 latent classes where the betas for domestic goods, governing aid, and the interaction term differ. You could expand an FMM to include other observable predictors of class membership.

    I do not know of any type of model where we estimate a latent class model, then separately estimate regression models that differ by those latent classes. There is no way you will be able to identify 10 latent classes with just two indicators, so I question if this is what you really want to do.

    You mention you need to estimate the model via the EM algorithm. Stata uses the EM algorithm to start the model, and then switches to its usual maximization routine after a default number of EM steps. Why do you need to estimate the whole model just by EM? If you have some substantive reason, then the gsem command has an option to change the maximum number of EM iterations (it's 20 by default; the EM algorithm is slower than the usual algorithm). You could change it to some high number, e.g.

    Code:
    gsem (Govern_aid -> cons) (Domestic_good -> , logit), lclass(C 2) emopts(iterate(1000))
    I thought there was an option to specify the EM algorithm only, but I'm not seeing it in the gsem manual.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment

    Working...
    X