Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • gsem in stata

    How can I use gsem when I have dependent binary variables?

    Using the example in the manual:
    Code:
    webuse womenwk
    Code:
     
     generate selected = 0 if wage < . generate notselected = 0 if wage >= .
    Code:
     
     gsem (wage <- educ age L)(selected <- married children educ age L@1,family(gaussian, udepvar(notselected))), var(L@1 e.wage@a e.selected@a)
    I have generated a dummy variable (minimumwage) which I want to use in place of wage.

    Code:
     
     generate minimumwage = (wage>20)
    However, I get an error message when I estimate the equation using minimumwage instead of wage using probit

    Code:
     
     gsem (wage <- educ age L, probit)(selected <- married children educ age L@1,family(binomial, udepv > ar(notselected))), var(L@1 e.wage@a e.selected@a) option family() invalid; suboptions not allowed with family bernoulli
    What is the right syntax to consider?

  • #2
    Try , logit

    Comment


    • #3
      Thank you Chris Boudreaux

      Still getting an error message with logit


      Code:
      .  gsem (minimumwage <- educ age L, logit)(selected <- married children educ age L@1,family(binomial, udepv
      >  > ar(notselected))), var(L@1 e.wage@a e.selected@a)
      option family() invalid;
      suboptions not allowed with family bernoulli
      I would also like to know why logit is prefered to probit
      Last edited by Anne Wanyonyi; 04 Aug 2020, 06:36.

      Comment


      • #4
        Hi Anne,

        It tells you in the error log that the problem is with the family() option. When I use gsem, I use logit and don't specify the family options at all. So my model would look something like:

        Code:
         
         gsem (minimumwage <- educ age L, logit)(selected <- married children educ age L@1, logit))
        Also, note that if all models have binary dependent variables and you want to use logit, you can simply specify the logit part in the options at the end. If there is a reason for you to specify the family options, I suggest you read through the gsem manual.

        I don't "prefer" logit to probit. I only recognize that they are substitutes and you can often use either. The coefficients are scaled differently, but the average partial effects are usually similar.

        Comment


        • #5
          Hi Chris,

          I have run the model with just logit, but I am not getting convergence:

          Code:
           gsem (minimumwage <- educ age L)(selected <- married children educ age L@1,logit)
          Code:
           gsem (minimumwage <- educ age L, logit)(selected <- married children educ age L@1,logit)
          Allowing for correlation between error terms brings an error message:

          Code:
           
             gsem (minimumwage <- educ age L, logit)(selected <- married children educ age L@1, logit, udepv >  > ar(notselected))), var(L@1 e.wage@a e.selected@a)

          Comment


          • #6
            What does the error message say? It might be due to the break between udepv > > ar(notselected). Or perhaps that was just something you directly copied.

            Regarding convergence, sometimes gsem models take a long time to converge. Did it actually fail to converge, or is it just taking a long time? I have run gsem models that take several days to find a solution.

            Comment


            • #7
              This is the error message:

              Code:
              gsem (minimumwage <- educ age L)(selected <- married children educ age L@1,logit, udepvar(notselected)), var(L@1 e.wage@a e.selected@a)
              latent variable udepvar not found;
              'udepvar' specifies a latent variable.
              For 'udepvar' to be valid, 'udepvar' must begin with a capital letter.
              but to my understanding, udepvar is not a latent variable. When I ran the model in the manual that uses wage (a continuous variable), instead of what I am considering (minimumwage -a dummy variable), I don't get the error message. (I am interested in running a model with dummy variables, that's why I generated the minimumwage variable)


              It seems the model with just the covariates and dependent variables is taking long to converge. The iterations are currently at more than 80.

              Comment


              • #8
                You have an extra comma between logit and udepvar(notselected). Try removing that comma so that there is only one after L@1. Perhaps that is the reason it is treating as latent. I'm not sure.

                I would give it more time. Do you have many observations? If so, you can try taking a random sample on a smaller set. The convergence should be much quicker. If not, you might look into a simpler integration method just to see if it converges. In the options, you can try typing 'intmethod(laplace)' or 'intmethod(ghermite)' and specifying fewer integration points (e.g., intpoints(3)). You might not want to use these other methods for the final model, but it should at least give you an idea about whether it will actually converge or not.

                Comment


                • #9
                  Without the comma, I get this error message:

                  Code:
                  latent variable logit not found;
                  'logit' specifies a latent variable.
                  For 'logit' to be valid, 'logit' must begin with a capital letter.
                  I am wondering whether use of a 0/1 dummy variable could be helpful, instead of dividing it into selected and notselected.


                  Code:
                   
                    generate selected = 0 if wage < . generate notselected = 0 if wage >= .

                  I have also looked at a similar example in the manual incorporating union, and its still divided into Ilunion and ulunion
                  Code:
                  generate llunion = 0 if union == 1 (1,433 missing values generated)
                  generate ulunion = 0 if union == 0 (709 missing values generated)
                  Code:
                  gsem (minimumwage <- age grade i.smsa i.black tenure 1.union L)(llunion <- i.black tenure i.south L@1, logit, udepvar(ulunion))),var(L@1 e.wage@a e.llunion@a)
                  Will it be wrong to estimate the model as follows: (here, the O/1union dummy is used instead of the Ilunion and ulunion)

                  Code:
                  gsem (minimumwage <- age grade i.smsa i.black tenure 1.union L)(union <- i.black tenure i.south L@1, logit, ),var(L@1 e.wage@a e.union@a)
                  It's my first time using gsem, so I am heavily relying on the manual to understand it.


                  I have 2000 observations.

                  Comment


                  • #10
                    Sorry. I am unfamiliar with the selected and unselected, so that probably explains my inability to help you. If you code the dummies 0/1 like you plan, I think it should work.

                    N=2000 observations should not pose a problem.

                    Comment


                    • #11
                      In #7, there are 2 commas after L@1, which is not appropriate. If you wish to add several equations, using parethesis for the family and link will be helpful. That said, SEM (as well as gsem ) may be challenging at first time. Starting with simple models, then improving it: good remedy.
                      Best regards,

                      Marcos

                      Comment


                      • #12
                        Dear Marcos Almeida ,

                        Removing the commas after L@1 and using parenthesis still returns an error message:

                        Code:
                         gsem (minimumwage <- educ age L)(selected <- married children educ age L@1,udepvar(notsele
                        > cted)), var(L@1 e.wage@a e.selected@a) family(binomial) link(logit)
                        option udepvar() not allowed
                        I have previously worked with Heckman selection model, but it requires the outcome dependent variable to be continuous. Since my both my depedent variables are binary (ie the response and selection dependent variables), I was looking for an alternative and settled on gsem since it allows for this. But I am open to alternative models which I can use in place of gsem

                        Comment


                        • #13
                          See
                          Code:
                          help heckprobit
                          for a probit model with sample selection. I am sure you can program this in gsem, but you need a deep understanding of the syntax.

                          Comment


                          • #14
                            re: #12 - I think you need to review the help file; as far as I can see, the "udepvar(notselected)" should be a sub-option to the "family" option - thus, more like "family(logit, udepvar(notselected)" - note, however, that I am confused by some of your text and this might not be what you want/mean

                            Comment


                            • #15
                              Thank you for the heckprobit suggestion Andrew Musau

                              I have looked at the manual and run the example in Stata. I have noticed that the dummy dependent variable in both equations don't have missing values. My question is whether heckprobit is a good alternative when one of the dependent variables has a number of missing values.

                              Comment

                              Working...
                              X