Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can the dependent variable be logged in the xtheckman model?

    Hello,
    I am trying to run an 'xtheckman' command using the following:
    Code:
    xtheckman ln_OFEI ln_X1 corr2 yhat ln_X2 L.ln_X3 ln_X4 Dummies i.cat X5 X6 X7 i.year, select(z=ln_X1 corr2 yhat ln_X2 L.ln_X3 ln_X4 Dummies i.cat X5 X6 X7 i.year i.villagecode) vce(cluster id)
    However, the program never converges (for over 45min).

    My professors asked me to use the log of OFEI as the dependent variable to make the interpretation easier. As you can see, I am also using lagged variables. Is this the reason why it's not working?
    I also tried with 'xtheckmanfe' but it said:
    Code:
    time-series operators not allowed
    r(101);
    How do you suggest solving this problem?

  • #2
    Yes, in fact it usually is something like the log of hourly wage, which is missing for those not in the workforce. What is your unit of observation? If you it's village, you can't hope this works well by including village fixed effects in the selection equation. There are easier, more robust alternatives.

    Comment


    • #3
      My unit of observation is household id.
      What kind of alternatives do exist in this case ?

      Comment


      • #4
        Like in the vast majority of contexts, random effects approaches estimated by full MLE impose too many assumptions -- both substantive exogeneity assumptions and then auxiliary assumptions, such as no serial correlation. More robust is xtheckmanfe, written by Fernando Rios-Avila. Install with ssc install xtheckmanfe.

        It implements the method from my 1995 Journal of Econometrics paper, and also the extension with instrumental variables in Semykina and Wooldridge (2010). Allows correlation between covariates and heterogeneity; allows general serial correlation. Computationally, it's like a cross-sectional Heckman correction.

        Comment


        • #5
          Thank you for the comment.
          As I said in #1, xtheckmanfe was not an option either unfortunately and I can't let go of my lagged independent variables.

          Originally posted by FatimaZohra SABRANE View Post
          I also tried with 'xtheckmanfe' but it said:
          Code:
          time-series operators not allowed
          r(101);
          However, xtheckman works (after 30min) with just OFEI as dependent variable but it yields large coefficients.
          In this case, there is nothing to do about it I guess :/

          Comment


          • #6
            For xtheckmanfe you need to create the lags yourself not using the lag operator.
            however I ll try to update it so it allows for that and send it to ssc

            Comment


            • #7
              Fernando, I tried as recommended but I got a new error using the following:

              Code:
              xtheckmanfe ln_OFEI Lln_OFEI ln_X1 corr2 ln_X2 Lln_X3 ln_X4 Dummies i.cat X5 X6 X7 i.year, select(z=Lln_OFEI ln_X1 corr2 ln_X2 Lln_X3 ln_X4 Dummies i.cat X5 X6 X7 i.year i.villagecode) endog(Lln_OFEI=L2ln_OFEI)
              
              convergence not achieved
              convergence not achieved
              r(430);

              Comment


              • #8
                Hi Fatima
                If you look at the helpfile it says you cannot add year dummies directly in the model specification.
                You need to either xtset your data, or use id() time() variables.
                F

                Comment


                • #9
                  Hello Fernando,

                  After several trials by removing year and village dummies, I was able to get results within 3 days. Out of 50 bootstraps only 3 replications were conducted with an error message:
                  Code:
                  x: Error occurred when bootstrap executed _xthck.
                  However, I was a bit disappointed in the output as it is SO long.
                  Results showed coefficients for the variables in the main equation, coefficients for the mean of each variable, coefficients for the time variable, the interaction between time var and IMR. Then the selection section contained interactions of time (15 quarters) with each and every continuous and categorical variable and their means. But I don't need all of this!
                  Rather, I was expecting to see coefficients for the probability of having OFEI (selection equation), the level of OFEI (main equation), the IMR or correlation between idiosyncratic errors and the correlation between the individual effects of the two equations to justify the use of Heckman just like what the 'xtheckman' usually shows.

                  I am not sure if I were successful explaining the output to you. I can't paste it here because it is just unnecessarily long (it takes almost the whole results section in stata).
                  If this is how it supposed to be, how can I interpret these results?
                  If not, would other alternatives like 'heckprobit' for panel data do the job?

                  Fatima Zohra

                  Comment


                  • #10
                    Fatima: How big are N and T? Unless N is really large or the model is poorly identified, this should not be computationally that difficult.

                    The method is NOT valid with a lagged dependent variable. There is such a method but I don’t think it’s coded in Stata.

                    Comment


                    • #11
                      I see. My N=6260 and T=16 quarters.
                      Bootstrap results Number of obs = 4,056 because of the presence of missing values, and the replications were based on 452 clusters in id.
                      I used about 39 explanatory variables (between continuous and categorical) and included one endogenous variable that was instrumented manually by extracting the yhat from the first step of IV regression.

                      Comment


                      • #12
                        Please read my paper with Anastasia Semykina on the proper way to implement IV with selection in panel data. I believe she still has code on her website. I don’t know why you’re so set on including a lag. Is this a prediction problem?

                        Comment


                        • #13
                          Also the output is as it should be, because it estimates separate coefficients for the selection model, which is estimated for each year.
                          the “mean” of the variables are controls that need to be included, even tho they do not have explicit interpretations.
                          i would refer to original papers to go over the interpretation
                          fernando

                          Comment


                          • #14
                            Jeff and Fernando, thank you so much for your comments.

                            I will try again with the code of Anastasia and see.

                            The use of lagged variables is because I assume that the decision to participate in off-farm employment activities as well as the level of OFEI would depend on the level of OFEI of previous quarter, the lag of consumption amount and the lag of the impact of flood and drought.

                            I first tried the Two-Part Model to see factors affecting both the probability of participation and the level of OFEI, but my professors criticized the model since the two parts are independent/ not related with each other compared to Heckman model that uses the Inverse Mills Ratio.

                            Best

                            Comment

                            Working...
                            X