Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtheckman in Panel Data error

    Hello! I am trying to use the xtheckman command with a data panel, specifically the panel has information from 10 individuals over 26 years, from 1996 to 2021, in addition the information is monthly, so I have 26*12=312 observations for each individual.
    My data
    Code:
    tiempo id_trabajador rem_tot sexo tenure edad desempleo
    "1996-01-01" 5929 673.13074 1  1 34 .171
    "1996-02-01" 5929 700.77625 1  2 34 .171
    "1996-03-01" 5929 704.61829 1  3 34 .171
    "1996-04-01" 5929 738.87048 1  4 34 .171
    "1996-05-01" 5929 668.61676 1  5 34 .171
    "1996-06-01" 5929 1130.1899 1  6 34 .171
    "1996-07-01" 5929 693.80298 1  7 34 .173
    "1996-08-01" 5929 532.82471 1  8 34 .173
    "1996-09-01" 5929 442.36554 1  9 34 .173
    "1996-10-01" 5929 475.64505 1 10 34 .173
    "1996-11-01" 5929 535.51196 1 11 34 .173
    "1996-12-01" 5929 632.32269 1 12 34 .173
    "1997-01-01" 5929         . 1  . 35 .148
    "1997-02-01" 5929         . 1  . 35 .148
    "1997-03-01" 5929         . 1  . 35 .148
    "1997-04-01" 5929 258.59531 1  1 35 .148
    "1997-05-01" 5929 320.48413 1  2 35 .148
    "1997-06-01" 5929 399.21515 1  3 35 .148
    "1997-07-01" 5929         . 1  . 35 .148
    "1997-08-01" 5929         . 1  . 35 .148
    "1997-09-01" 5929         . 1  . 35 .148
    "1997-10-01" 5929         . 1  . 35 .148
    "1997-11-01" 5929         . 1  . 35 .148
    "1997-12-01" 5929         . 1  . 35 .148
    "1998-01-01" 5929         . 1  . 36 .133
    "1998-02-01" 5929         . 1  . 36 .133
    "1998-03-01" 5929         . 1  . 36 .133
    "1998-04-01" 5929         . 1  . 36 .133
    "1998-05-01" 5929         . 1  . 36 .133
    "1998-06-01" 5929         . 1  . 36 .133
    "1998-07-01" 5929         . 1  . 36 .125
    "1998-08-01" 5929         . 1  . 36 .125
    "1998-09-01" 5929         . 1  . 36 .125
    "1998-10-01" 5929         . 1  . 36 .125
    "1998-11-01" 5929         . 1  . 36 .125
    "1998-12-01" 5929         . 1  . 36 .125
    "1999-01-01" 5929         . 1  . 37 .146
    "1999-02-01" 5929         . 1  . 37 .146
    "1999-03-01" 5929         . 1  . 37 .146
    "1999-04-01" 5929         . 1  . 37 .146
    "1999-05-01" 5929         . 1  . 37 .146
    "1999-06-01" 5929         . 1  . 37 .146
    "1999-07-01" 5929         . 1  . 37 .139
    "1999-08-01" 5929         . 1  . 37 .139
    "1999-09-01" 5929         . 1  . 37 .139
    "1999-10-01" 5929         . 1  . 37 .139
    "1999-11-01" 5929         . 1  . 37 .139
    "1999-12-01" 5929         . 1  . 37 .139
    "2000-01-01" 5929         . 1  . 38 .154
    "2000-02-01" 5929         . 1  . 38 .154
    "2000-03-01" 5929         . 1  . 38 .154
    "2000-04-01" 5929         . 1  . 38 .154
    "2000-05-01" 5929         . 1  . 38 .154
    "2000-06-01" 5929         . 1  . 38 .154
    "2000-07-01" 5929         . 1  . 38 .148
    "2000-08-01" 5929         . 1  . 38 .148
    "2000-09-01" 5929         . 1  . 38 .148
    "2000-10-01" 5929         . 1  . 38 .148
    "2000-11-01" 5929         . 1  . 38 .148
    "2000-12-01" 5929         . 1  . 38 .148
    "2001-01-01" 5929         . 1  . 39 .164
    "2001-02-01" 5929         . 1  . 39 .164
    "2001-03-01" 5929         . 1  . 39 .164
    "2001-04-01" 5929         . 1  . 39 .164
    "2001-05-01" 5929         . 1  . 39 .164
    "2001-06-01" 5929         . 1  . 39 .164
    "2001-07-01" 5929         . 1  . 39 .184
    "2001-08-01" 5929         . 1  . 39 .184
    "2001-09-01" 5929         . 1  . 39 .184
    "2001-10-01" 5929         . 1  . 39 .184
    "2001-11-01" 5929         . 1  . 39 .184
    "2001-12-01" 5929         . 1  . 39 .184
    "2002-01-01" 5929         . 1  . 40 .215
    "2002-02-01" 5929         . 1  . 40 .215
    "2002-03-01" 5929         . 1  . 40 .215
    "2002-04-01" 5929         . 1  . 40 .215
    "2002-05-01" 5929         . 1  . 40 .215
    "2002-06-01" 5929         . 1  . 40 .215
    "2002-07-01" 5929         . 1  . 40 .179
    "2002-08-01" 5929         . 1  . 40 .179
    "2002-09-01" 5929         . 1  . 40 .179
    "2002-10-01" 5929         . 1  . 40 .179
    "2002-11-01" 5929         . 1  . 40 .179
    "2002-12-01" 5929         . 1  . 40 .179
    "2003-01-01" 5929         . 1  . 41 .161
    "2003-02-01" 5929         . 1  . 41 .161
    "2003-03-01" 5929         . 1  . 41 .161
    "2003-04-01" 5929         . 1  . 41 .161
    "2003-05-01" 5929         . 1  . 41 .161
    "2003-06-01" 5929         . 1  . 41 .161
    "2003-07-01" 5929         . 1  . 41 .144
    "2003-08-01" 5929         . 1  . 41 .144
    "2003-09-01" 5929         . 1  . 41 .144
    "2003-10-01" 5929         . 1  . 41 .144
    "2003-11-01" 5929         . 1  . 41 .144
    "2003-12-01" 5929         . 1  . 41 .144
    "2004-01-01" 5929         . 1  . 42 .143
    "2004-02-01" 5929         . 1  . 42 .143
    "2004-03-01" 5929         . 1  . 42 .143
    "2004-04-01" 5929         . 1  . 42 .147
    end

    I use the following command to perform the heckman
    Code:
     xtheckman rem_tot c.edad##c.edad tenure, select(working = c.edad##c.edad desempleo)
    But when I run the command I get the following error

    Code:
     initial values not feasible
    r(1400);
    
    end of do-file
    I appreciate any help since this work is for my doctoral thesis and I am short on time.
    Thank you so much

  • #2
    if your N=10, i would say Panel heckman is not the way to go. You will need a substantially larger sample for xtheckman to work.
    F

    Comment


    • #3
      Originally posted by FernandoRios View Post
      if your N=10, i would say Panel heckman is not the way to go. You will need a substantially larger sample for xtheckman to work.
      F
      Thank you very much, looking at it carefully I would like to make fixed effects, using the xtheckmanfe command, I think you designed that command. In that case also the problem is that I would need a larger N? Actually, what I showed is just a sample, my real base has an N greater than 6000. If I use a larger N then could it work?
      Last edited by Facundo Duran; 18 Sep 2023, 06:15.

      Comment


      • #4
        I worked on a different approach. xtheckmanfe.
        It applies a kind of correlated random effects model to the analysis.
        Now other two points
        xtheckman (official command) is a full MLE program. It could be very hard to converge, because of the model complexity.
        xtheckmanfe is an implementation of Wooldridge work. Look at the helpfile to get the references. It uses a two step approach. And each year is treated separately for the probit estimation.
        Wheter or not it fits your needs may depend on the assumptions, data structure, etc.
        HTH

        Comment


        • #5
          Originally posted by FernandoRios View Post
          I worked on a different approach. xtheckmanfe.
          It applies a kind of correlated random effects model to the analysis.
          Now other two points
          xtheckman (official command) is a full MLE program. It could be very hard to converge, because of the model complexity.
          xtheckmanfe is an implementation of Wooldridge work. Look at the helpfile to get the references. It uses a two step approach. And each year is treated separately for the probit estimation.
          Wheter or not it fits your needs may depend on the assumptions, data structure, etc.
          HTH
          Thank you very much for responding and sorry for the inconvenience, I expanded the number of individuals to 600 but I cannot estimate it.
          The model considers income as a function of age age^2 and experience. While the selection equation is a dummy for the condition of working based on age, age^2 and unemployment (it would be like the market variable in the example)
          When i try to execute de code
          Code:
          xtset id_trabajador fecha
          xtheckmanfe rem_tot c.edad##c.edad tenure, select(working = c.edad##c.edad desempleo)
          I don't get any results

          Code:
          . xtset id_trabajador fecha
                 panel variable:  id_trabajador (strongly balanced)
                  time variable:  fecha, 01jan1996 to 01dec2021, but with gaps
                          delta:  1 day
          
          . xtheckmanfe rem_tot c.edad##c.edad tenure, select(working = c.edad##c.edad desempleo)
          r(2000);
          
          end of do-file
          
          r(2000);

          Comment


          • #6
            Start by estimating a probit model for selection in each year
            see if that works
            then you can do the second step

            Comment


            • #7
              Originally posted by FernandoRios View Post
              Start by estimating a probit model for selection in each year
              see if that works
              then you can do the second step
              Thank you very much for your reply!
              I am trying to do it manually as you mentioned but I have some doubts about how to calculate the inverse of the mill ratio, here is the code I was writing.
              Code:
              if year==1996 {
                   xtprobit working edad_total edad_total2 desempleo
                   predict working_index, xb
                   gen imr = .  // Crear variable IMR inicialmente como missing
              replace imr = (normalden(working_index)) / (normal(working_index)) if year ==1996
                   
               }
               
              
               if year==1996 xtreg rem_tot edad_total edad_total2 tenure imr, fe
              When working edad_total edad_total2 desempleo is my selection equation

              I have been reading that the IMR can be calculated as follows:

              H(z) = f(z) / (1 − F(z)) (where f() is the density and F() is the cumulative, evaluated at each point of the sample and z is the forecast in probit index)
              So
              IMR=1/H(z)

              Is this calculation correct when applied to panel data?

              if this is correct then should I run this command until 2021 which is my last year in the sample?

              Something I had to do to be able to use age and age squared and not have them disappear due to fixed effects, since for each year the age is the same, I expressed the age in decimals, e.g. in January 1996 the age is 30, in February it is 30.1 and so on.

              Then I would like to know if the inverse of the mill ratio is correctly calculated and if the steps I followed are correct.
              Thank you very much

              Comment


              • #8
                ok, i see the problem you are having.
                When I say manually estimate the probit model I mean the following:

                Code:
                webuse wagework
                xtset personid year
                foreach i in age tenure market {
                  bysort personid:egen m_`i'=mean(`i')
                }
                 probit working age market m_* if year==2013
                 probit working age market m_* if year==2014
                 probit working age market m_* if year==2013
                If any of the probits fails here will also fail in your overall model
                F

                Comment


                • #9
                  Originally posted by FernandoRios View Post
                  ok, i see the problem you are having.
                  When I say manually estimate the probit model I mean the following:

                  Code:
                  webuse wagework
                  xtset personid year
                  foreach i in age tenure market {
                  bysort personid:egen m_`i'=mean(`i')
                  }
                  probit working age market m_* if year==2013
                  probit working age market m_* if year==2014
                  probit working age market m_* if year==2013
                  If any of the probits fails here will also fail in your overall model
                  F
                  ah, now I understand
                  I have another doubt, because my data are monthly, so if I take a year, I have information of 12 periods for each individual, so it would also be a panel of data, in that case, shouldn't I do xtprobit instead of probit?

                  Comment


                  • #10
                    Nop, if you have monthly data, your probit has to be done by month.
                    So instead of estimating a xtprobit, what the command does is estimating a probit for each period.
                    Then collects the IMR , and uses them to correct for selection.
                    It may be a good idea if you try to replicate this using toy datasets like wagework, following the strategy described in the references in helpfile (Wooldridge earlier paper is the most relevant)
                    F

                    Comment


                    • #11
                      Originally posted by FernandoRios View Post
                      Nop, if you have monthly data, your probit has to be done by month.
                      So instead of estimating a xtprobit, what the command does is estimating a probit for each period.
                      Then collects the IMR , and uses them to correct for selection.
                      It may be a good idea if you try to replicate this using toy datasets like wagework, following the strategy described in the references in helpfile (Wooldridge earlier paper is the most relevant)
                      F
                      Thank you very much,
                      do you mean this paper? Wooldridge, Jeffrey M. 1995. "Selection corrections for panel data models under conditional mean independence assumptions."





                      I am trying as you mentioned to replicate the "wagework" toy database. In that sense I am doing the xtheckmanfe example and looking at the IMR value.
                      Code:
                      xtset personid year
                      
                      xtheckmanfe wage age tenure, select(working = age market)
                      Subsequently I am trying to calculate it manually to compare them,

                      Code:
                      probit working age if year == 2013
                      predict probit_resid_2013, xb
                      
                      probit working age if year == 2014
                      predict probit_resid_2014, xb
                      
                      probit working age if year == 2015
                      predict probit_resid_2015, xb
                      
                      probit working age if year == 2016
                      predict probit_resid_2016, xb
                      
                      gen imr = .  // Crear variable IMR inicialmente como missing
                      replace imr = (normalden(probit_resid_2013)/normal(probit_resid_2013)) if year ==2013
                      replace imr = (normalden(probit_resid_2014)/normal(probit_resid_2014)) if year ==2014
                      replace imr = (normalden(probit_resid_2015)/normal(probit_resid_2015)) if year ==2015
                      replace imr = (normalden(probit_resid_2016)/normal(probit_resid_2016)) if year ==2016
                      but I am getting different values.

                      Comment


                      • #12
                        Originally posted by FernandoRios View Post
                        ok, i see the problem you are having.
                        When I say manually estimate the probit model I mean the following:

                        Code:
                        webuse wagework
                        xtset personid year
                        foreach i in age tenure market {
                        bysort personid:egen m_`i'=mean(`i')
                        }
                        probit working age market m_* if year==2013
                        probit working age market m_* if year==2014
                        probit working age market m_* if year==2013
                        If any of the probits fails here will also fail in your overall model
                        F
                        Sorry for continuing to ask you, I just did what you told me about running the probits and I see that there is no problem in any of them. However, when I run xtheckman it still does not converge.

                        This is the code
                        Code:
                        xtset id_trabajador fecha, monthly
                        foreach i in edad tenure desempleo {
                        bysort id_trabajador:egen m_`i'=mean(`i')
                        }
                        probit working edad desempleo m_* if mes==1
                        probit working edad desempleo m_* if mes==2
                        probit working edad desempleo m_* if mes==3
                        probit working edad desempleo m_* if mes==4
                        probit working edad desempleo m_* if mes==5
                        probit working edad desempleo m_* if mes==6
                        probit working edad desempleo m_* if mes==7
                        probit working edad desempleo m_* if mes==8
                        probit working edad desempleo m_* if mes==9
                        probit working edad desempleo m_* if mes==10
                        probit working edad desempleo m_* if mes==11
                        probit working edad desempleo m_* if mes==12
                        These are the results of the grobits




                        Click image for larger version

Name:	probit.png
Views:	1
Size:	207.8 KB
ID:	1729757

                        greetings and thank you very much!

                        Comment


                        • #13
                          I think i kept reading your post incorrectly
                          The procedure I suggested is for what xtheckmanfe does. Not what xtheckman does.
                          The latter relies of Full information ML. Which is far more difficult to estimate.
                          The former is what I implemented.
                          Different strategies.
                          F

                          Comment


                          • #14
                            Originally posted by FernandoRios View Post
                            I think i kept reading your post incorrectly
                            The procedure I suggested is for what xtheckmanfe does. Not what xtheckman does.
                            The latter relies of Full information ML. Which is far more difficult to estimate.
                            The former is what I implemented.
                            Different strategies.
                            F
                            No, it's okay, what I want to do is xtheckmanfe

                            Comment


                            • #15
                              In that case there is something else with your data
                              I can’t say much at this point
                              but open then the ado file and try to replicate it yourself
                              only that way you can figure out what is going on
                              it may be something that is specific to your data

                              Comment

                              Working...
                              X