gsem in stata

Anne Wanyonyi

Join Date: Jun 2016
Posts: 88

04 Aug 2020, 01:39

How can I use gsem when I have dependent binary variables?

Using the example in the manual:

Code:

webuse womenwk

Code:

 
 generate selected = 0 if wage < . generate notselected = 0 if wage >= .

Code:

 
 gsem (wage <- educ age L)(selected <- married children educ age L@1,family(gaussian, udepvar(notselected))), var(L@1 e.wage@a e.selected@a)

I have generated a dummy variable (minimumwage) which I want to use in place of wage.

Code:

 
 generate minimumwage = (wage>20)

However, I get an error message when I estimate the equation using minimumwage instead of wage using probit

Code:

 
 gsem (wage <- educ age L, probit)(selected <- married children educ age L@1,family(binomial, udepv > ar(notselected))), var(L@1 e.wage@a e.selected@a) option family() invalid; suboptions not allowed with family bernoulli

What is the right syntax to consider?

Tags: None

Chris Boudreaux

Join Date: Jul 2020

Posts: 83
#2

04 Aug 2020, 05:43

Try , logit
Comment
Anne Wanyonyi

Join Date: Jun 2016

Posts: 88
#3

04 Aug 2020, 06:31

Thank you Chris Boudreaux

Still getting an error message with logit

Code:

. gsem (minimumwage <- educ age L, logit)(selected <- married children educ age L@1,family(binomial, udepv > > ar(notselected))), var(L@1 e.wage@a e.selected@a) option family() invalid; suboptions not allowed with family bernoulli

I would also like to know why logit is prefered to probit

Last edited by Anne Wanyonyi; 04 Aug 2020, 06:36.
Comment
Chris Boudreaux

Join Date: Jul 2020

Posts: 83
#4

04 Aug 2020, 06:52

Hi Anne,

It tells you in the error log that the problem is with the family() option. When I use gsem, I use logit and don't specify the family options at all. So my model would look something like:

Code:

gsem (minimumwage <- educ age L, logit)(selected <- married children educ age L@1, logit))

Also, note that if all models have binary dependent variables and you want to use logit, you can simply specify the logit part in the options at the end. If there is a reason for you to specify the family options, I suggest you read through the gsem manual.

I don't "prefer" logit to probit. I only recognize that they are substitutes and you can often use either. The coefficients are scaled differently, but the average partial effects are usually similar.
Comment

Anne Wanyonyi

Join Date: Jun 2016
Posts: 88

04 Aug 2020, 07:15

Hi Chris,

I have run the model with just logit, but I am not getting convergence:

Code:

 gsem (minimumwage <- educ age L)(selected <- married children educ age L@1,logit)

Code:

 gsem (minimumwage <- educ age L, logit)(selected <- married children educ age L@1,logit)

Allowing for correlation between error terms brings an error message:

Code:

 
   gsem (minimumwage <- educ age L, logit)(selected <- married children educ age L@1, logit, udepv >  > ar(notselected))), var(L@1 e.wage@a e.selected@a)

Comment

Chris Boudreaux

Join Date: Jul 2020

Posts: 83
#6

04 Aug 2020, 07:24

What does the error message say? It might be due to the break between udepv > > ar(notselected). Or perhaps that was just something you directly copied.

Regarding convergence, sometimes gsem models take a long time to converge. Did it actually fail to converge, or is it just taking a long time? I have run gsem models that take several days to find a solution.
Comment
Anne Wanyonyi

Join Date: Jun 2016

Posts: 88
#7

04 Aug 2020, 07:34

This is the error message:

Code:

gsem (minimumwage <- educ age L)(selected <- married children educ age L@1,logit, udepvar(notselected)), var(L@1 e.wage@a e.selected@a) latent variable udepvar not found; 'udepvar' specifies a latent variable. For 'udepvar' to be valid, 'udepvar' must begin with a capital letter.

but to my understanding, udepvar is not a latent variable. When I ran the model in the manual that uses wage (a continuous variable), instead of what I am considering (minimumwage -a dummy variable), I don't get the error message. (I am interested in running a model with dummy variables, that's why I generated the minimumwage variable)

It seems the model with just the covariates and dependent variables is taking long to converge. The iterations are currently at more than 80.
Comment
Chris Boudreaux

Join Date: Jul 2020

Posts: 83
#8

04 Aug 2020, 07:45

You have an extra comma between logit and udepvar(notselected). Try removing that comma so that there is only one after L@1. Perhaps that is the reason it is treating as latent. I'm not sure.

I would give it more time. Do you have many observations? If so, you can try taking a random sample on a smaller set. The convergence should be much quicker. If not, you might look into a simpler integration method just to see if it converges. In the options, you can try typing 'intmethod(laplace)' or 'intmethod(ghermite)' and specifying fewer integration points (e.g., intpoints(3)). You might not want to use these other methods for the final model, but it should at least give you an idea about whether it will actually converge or not.
Comment

Anne Wanyonyi

Join Date: Jun 2016
Posts: 88

04 Aug 2020, 08:04

Without the comma, I get this error message:

Code:

latent variable logit not found;
'logit' specifies a latent variable.
For 'logit' to be valid, 'logit' must begin with a capital letter.

I am wondering whether use of a 0/1 dummy variable could be helpful, instead of dividing it into selected and notselected.

Code:

 
  generate selected = 0 if wage < . generate notselected = 0 if wage >= .

I have also looked at a similar example in the manual incorporating union, and its still divided into Ilunion and ulunion

Code:

generate llunion = 0 if union == 1 (1,433 missing values generated)
generate ulunion = 0 if union == 0 (709 missing values generated)

Code:

gsem (minimumwage <- age grade i.smsa i.black tenure 1.union L)(llunion <- i.black tenure i.south L@1, logit, udepvar(ulunion))),var(L@1 e.wage@a e.llunion@a)

Will it be wrong to estimate the model as follows: (here, the O/1union dummy is used instead of the Ilunion and ulunion)

Code:

gsem (minimumwage <- age grade i.smsa i.black tenure 1.union L)(union <- i.black tenure i.south L@1, logit, ),var(L@1 e.wage@a e.union@a)

It's my first time using gsem, so I am heavily relying on the manual to understand it.

I have 2000 observations.

Comment

Chris Boudreaux

Join Date: Jul 2020

Posts: 83
#10

04 Aug 2020, 08:45

Sorry. I am unfamiliar with the selected and unselected, so that probably explains my inability to help you. If you code the dummies 0/1 like you plan, I think it should work.

N=2000 observations should not pose a problem.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#11

04 Aug 2020, 09:13

In #7, there are 2 commas after L@1, which is not appropriate. If you wish to add several equations, using parethesis for the family and link will be helpful. That said, SEM (as well as gsem ) may be challenging at first time. Starting with simple models, then improving it: good remedy.

Best regards,

Marcos
Comment
Anne Wanyonyi

Join Date: Jun 2016

Posts: 88
#12

04 Aug 2020, 11:34

Dear Marcos Almeida ,

Removing the commas after L@1 and using parenthesis still returns an error message:

Code:

gsem (minimumwage <- educ age L)(selected <- married children educ age L@1,udepvar(notsele > cted)), var(L@1 e.wage@a e.selected@a) family(binomial) link(logit) option udepvar() not allowed

I have previously worked with Heckman selection model, but it requires the outcome dependent variable to be continuous. Since my both my depedent variables are binary (ie the response and selection dependent variables), I was looking for an alternative and settled on gsem since it allows for this. But I am open to alternative models which I can use in place of gsem
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10102
#13

04 Aug 2020, 12:29

See

Code:

help heckprobit

for a probit model with sample selection. I am sure you can program this in gsem, but you need a deep understanding of the syntax.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4441
#14

04 Aug 2020, 12:36

re: #12 - I think you need to review the help file; as far as I can see, the "udepvar(notselected)" should be a sub-option to the "family" option - thus, more like "family(logit, udepvar(notselected)" - note, however, that I am confused by some of your text and this might not be what you want/mean
Comment
Anne Wanyonyi

Join Date: Jun 2016

Posts: 88
#15

04 Aug 2020, 15:16

Thank you for the heckprobit suggestion Andrew Musau

I have looked at the manual and run the example in Stata. I have noticed that the dummy dependent variable in both equations don't have missing values. My question is whether heckprobit is a good alternative when one of the dependent variables has a number of missing values.
Comment

Announcement

gsem in stata

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment