How to include interactions between an endogenous variable and exogenous variables in GSEM

MJ KIM

Join Date: Apr 2016

Posts: 4
#1

How to include interactions between an endogenous variable and exogenous variables in GSEM

18 Apr 2016, 16:10

Hello all,

I am using GSEM to estimate the effect of X, which is endogenous, on Y, with the instrumental variable IV and bunch of other exogenous variables Zs.
The problem is, my X and two of Zs (let's call them Z1 and Z2) are interacted and I'm not so sure how to deal with this.

I've tried the code below but STATA says that I can't include interaction terms between latent variables.

. gsem (Y <- X Z1 Z2 X#Z1 X#Z2 Z3 Z4 Z5, Oprobit) (X <- IV), vce(cluster Z5)

Y, X, Z1, and Z2 are all ordinal variables and IV is continuous.

Since I've never used gsem command, I'm not sure about the details... Any help will be greatly appreciated!!

Best,
MJ
Tags: None
MJ KIM

Join Date: Apr 2016

Posts: 4
#2

18 Apr 2016, 16:14

Edit: My dataset is cross-sectional and all variables are observed.
Comment
Roman Mostazir

Join Date: Apr 2014

Posts: 874
#3

18 Apr 2016, 19:15

You are expected to provide the exact code you used and show us what Stata provided as output. I don't think 'Oprobit' is a valid command. See FAQ, section-12 for how to make useful posts. For your codes, I think the problem is not to use Stata's factor variable notation. If you are using Z1 Z2 as continuous (assuming X is continuous), the codes should be:

Code:

gsem (Y <- X Z1 Z2 c.X#c.Z1 c.X#c.Z2 Z3 Z4 Z5, oprobit) (X <- IV)

If you are using Z1 Z2 as categorical:

Code:

gsem (Y <- X i.Z1 i.Z2 c.X#i.Z1 c.X#i.Z2 Z3 Z4 Z5, oprobit) (X <- IV)

Roman
Comment

MJ KIM

Join Date: Apr 2016
Posts: 4

18 Apr 2016, 19:57

Dear Roman,

I apologize. This is my first time posting here and didn't know the rules.

Here's the actual code I ran:

Code:

gsem (V29 <- i.V43 i.V32 i.INCOME_Q i.V43#i.V32 i.V43#i.INCOME_Q i.SEX AGE i.MARITAL i.DEGREE i.SUB_KNOW_Q i.OBJ_KNOW_Q i.V38
>  i.ASIA i.NOR_AM i.CT_AM i.W_EUR i.E_EUR i.N_EUR i.S_EUR, oprobit) (i.V43 <- TEMP_ABS), vce(cluster REGION)

So all the main variables are ordinal, including the endogenous variable.
And here's the message I got:

Code:

note: Latent variable V29 was specified with option family(ordinal),but family(gaussian) is the only option allowed.  Assuming
      family(gaussian) for V29.
note: Latent variable V29 was specified with option link(probit),but link(identity) is the only option allowed.  Assuming
      (identity) for V29.
interactions between latent variables are not allowed
r(198);

Comment

Roman Mostazir

Join Date: Apr 2014
Posts: 874

18 Apr 2016, 20:41

Not all are being treated as ordinal categorical here in your code. Age is being treated as continous.

Is your Stata updated to the latest? Type update all in the Stata command box/do file. Showing some example of your data may be helpful for everyone. Please see the FAQ-12 to check how to post example data. If your Stata is updated, try fitting a simpler model with each variable at once and see where the problem occurs. Also check your variable names. I will be away and not sure if will be able to look at this but I can run the following example, replicating your problem with few ordinal variables, without any trouble:

Code:

set obs 300

gen o = floor((4-1+1)*runiform()+1) //generate ordinal outcome

//Generate some ordinal categorical variable

gen x= floor((3-1+1)*runiform()+1)
gen z1 = floor((4-1+1)*runiform()+1)
gen z2 = floor((3-1+1)*runiform()+1)
gen t = floor((4-1+1)*runiform()+1)

lis o x z1 z2 t in 1/10, clean //data example

       o   x   z1   z2   t  
  1.   3   1    1    2   3  
  2.   4   3    1    1   2  
  3.   3   3    2    2   1  
  4.   3   2    3    1   3  
  5.   3   1    3    1   1  
  6.   4   3    4    3   1  
  7.   4   2    1    1   1  
  8.   2   3    4    1   2  
  9.   3   2    2    3   2  
 10.   3   1    4    2   4  


/*Run the model*/

gsem (o <- i.x i.z1 i.z2 i.x#i.z1 i.x#i.z2, oprobit) (i.x <- t)

/*Output:*/
*****************************************************

Iteration 0:   log likelihood = -739.97586  
Iteration 1:   log likelihood = -732.38946  
Iteration 2:   log likelihood = -732.38777  
Iteration 3:   log likelihood = -732.38777  

Generalized structural equation model           Number of obs     =        300

Response       : o
Family         : ordinal
Link           : probit

Response       : x
Base outcome   : 1
Family         : multinomial
Link           : logit

Log likelihood = -732.38777

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
o <-         |
             |
           x |
          2  |   .2237729   .4013629     0.56   0.577     -.562884     1.01043
          3  |   .6640192   .4130772     1.61   0.108    -.1455973    1.473636
             |
          z1 |
          2  |  -.0389763   .3200235    -0.12   0.903    -.6662108    .5882581
          3  |   .3964114   .3393525     1.17   0.243    -.2687073     1.06153
          4  |   .1178644   .3305149     0.36   0.721    -.5299329    .7656617
             |
          z2 |
          2  |   .2070285   .2622743     0.79   0.430    -.3070197    .7210768
          3  |   .2599257   .2744904     0.95   0.344    -.2780656     .797917
             |
        x#z1 |
        2 2  |   .0813855   .4534593     0.18   0.858    -.8073785    .9701494
        2 3  |  -.9779816   .4642522    -2.11   0.035    -1.887899    -.068064
        2 4  |    -.05667   .4531824    -0.13   0.900    -.9448911    .8315511
        3 2  |  -.4603238   .4729382    -0.97   0.330    -1.387266    .4666181
        3 3  |  -.7826331   .4878125    -1.60   0.109    -1.738728    .1734618
        3 4  |  -.5461176   .4646152    -1.18   0.240    -1.456747    .3645113
             |
        x#z2 |
        2 2  |  -.2701369   .3806659    -0.71   0.478    -1.016228    .4759545
        2 3  |  -.0952382   .3745396    -0.25   0.799    -.8293224    .6388459
        3 2  |  -.1082127   .3865593    -0.28   0.780     -.865855    .6494295
        3 3  |  -.2724948    .386346    -0.71   0.481    -1.029719    .4847294
-------------+----------------------------------------------------------------
1.x          |  (base outcome)
-------------+----------------------------------------------------------------
2.x <-       |
           t |   .0376513   .1265049     0.30   0.766    -.2102938    .2855963
       _cons |  -.1088241   .3299806    -0.33   0.742    -.7555742     .537926
-------------+----------------------------------------------------------------
3.x <-       |
           t |   .1336069    .125077     1.07   0.285    -.1115395    .3787533
       _cons |  -.3028232    .333219    -0.91   0.363    -.9559205    .3502741
-------------+----------------------------------------------------------------
o            |
       /cut1 |  -.6918708   .2947713    -2.35   0.019    -1.269612   -.1141297
       /cut2 |   .0475338   .2918675     0.16   0.871     -.524516    .6195835
       /cut3 |    .755419   .2940588     2.57   0.010     .1790743    1.331764
------------------------------------------------------------------------------

Last edited by Roman Mostazir; 18 Apr 2016, 21:25. Reason: Typo corrected

Roman

Comment

MJ KIM

Join Date: Apr 2016
Posts: 4

18 Apr 2016, 21:33

Thank you so much for suggesting the update and replicating the code.
I'm afraid that I can't post my data here since it is not public but here's how my main variables look like. TEMP_ABS is a continuous variable of the absolute values of temperature anomalies:

Code:

. tab V29

Q12a Protect environment: pay |
           much higher prices |      Freq.     Percent        Cum.
------------------------------+-----------------------------------
                 Very willing |      2,028        4.65        4.65
               Fairly willing |     12,094       27.72       32.36
Neither willing nor unwilling |     10,178       23.33       55.69
             Fairly unwilling |     10,419       23.88       79.57
               Very unwilling |      8,916       20.43      100.00
------------------------------+-----------------------------------
                        Total |     43,635      100.00

. tab V43

     Q14e A rise in world's temperature |
               caused by climate change |      Freq.     Percent        Cum.
----------------------------------------+-----------------------------------
Extremely dangerous for the environment |     11,823       27.62       27.62
                         Very dangerous |     15,523       36.27       63.89
                     Somewhat dangerous |     11,231       26.24       90.13
                     Not very dangerous |      3,492        8.16       98.29
Not dangerous at all for the environmen |        731        1.71      100.00
----------------------------------------+-----------------------------------
                                  Total |     42,800      100.00

. tab V32

          Q13a To do about |
environment: too difficult |      Freq.     Percent        Cum.
---------------------------+-----------------------------------
            Agree strongly |      3,897        8.82        8.82
                     Agree |     11,972       27.09       35.91
Neither agree nor disagree |      7,347       16.63       52.54
                  Disagree |     16,254       36.79       89.33
         Disagree strongly |      4,716       10.67      100.00
---------------------------+-----------------------------------
                     Total |     44,186      100.00

. tab INCOME_Q

   INCOME_Q |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |      2,457       14.05       14.05
          2 |      2,294       13.12       27.16
          3 |      1,474        8.43       35.59
          4 |      2,012       11.50       47.10
          5 |      1,242        7.10       54.20
          6 |      2,136       12.21       66.41
          7 |      1,601        9.15       75.56
          8 |      1,426        8.15       83.72
          9 |      1,611        9.21       92.93
         10 |      1,237        7.07      100.00
------------+-----------------------------------
      Total |     17,490      100.00

I updated my STATA and ran the same code again, but I still get the same error message as before.

Comment

wbuchanan

Join Date: Mar 2014
Posts: 1362

19 Apr 2016, 04:40

MJ KIM see:

Code:

help sem_and_gsem_syntax_options

//  nocapslatent              do not treat capitalized Names as latent

I suspect that would be the easiest candidate to eliminate from your previous example that resulted in

Code:

note: Latent variable V29 was specified with option family(ordinal),but family(gaussian) is the only option allowed. Assuming
     family(gaussian) for V29.
note: Latent variable V29 was specified with option link(probit),but link(identity) is the only option allowed. Assuming
     (identity) for V29.
interactions between latent variables are not allowed
r(198);

Comment

Roman Mostazir

Join Date: Apr 2014

Posts: 874
#8

19 Apr 2016, 08:04

wbuchanan nailed it. That's the evil. Either change the name of the variables (V29) to lowercase or use the option 'nocapslatent'

Code:

gsem (V29 <- i.V43 i.V32 i.INCOME_Q i.V43#i.V32 i.V43#i.INCOME_Q /// i.SEX AGE i.MARITAL i.DEGREE i.SUB_KNOW_Q i.OBJ_KNOW_Q i.V38 /// i.ASIA i.NOR_AM i.CT_AM i.W_EUR i.E_EUR i.N_EUR i.S_EUR, oprobit) /// (i.V43 <- TEMP_ABS), vce(cluster REGION) nocapslatent

Regarding dataset, we actually do not ask for the whole data rather a sub-sample (may be 10/20 rows) of your data so that we can play with. That saves time for everyone. Alternatively, an easy way will be to install this program and read the help file on how to provide data examples ssc install dataex .

Roman
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1362
#9

19 Apr 2016, 11:46

MJ KIM depending on your comfort level you could also simulate some data that have similar properties to the data you are working with (particularly if some of the properties are responsible for the issue at hand).
Comment

Announcement