Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    In a model without the three way interaction term, you are constraining the time trends within Gujarat (and also those within Rajasthan) to be the same for young and old. If you think that the time trends do not differ by age, then this is a very reasonable model.

    What adding the three-way interaction does is relax that constraint and allow the young to have different time trends from the old in Gujarat (and also in Rajasthan).

    Comment


    • #17
      That is very understandable,

      have implemented my model under different specifications and have some curious result:

      1. I have significant results if I include no controls and simply run the diff-in-diff. So there is significant treatment effect.

      2. I get significant results if I include year fixed effects to the specification. I get a fraction smaller p-value and a small increase in estimate with this specification. I was thinking estimate should be smaller after inclusion of year fixed effects (another term for year dummies), why is it actually bigger?

      3. I get significant results if I include the c.year##Rajasthan interaction term. Treatment effect is significant with and without including year fixed effects in this specification. Model results of treatment effect with year fixed effects only is very similar to model with year fixed effects and c.year##i.India.

      4. Treatment effect dissapear after inclusion of three war interaction term i.YOUNG##c.year##Rajasthan suggested by Clyde sir!

      Sorry for wrong interpretation;

      Am now thinking that state specific trend across all workers is not important. So I am getting positive treatment results. So the young seem to be increasing work in rajasthan after the law. After inclusion of triple interaction term, I can control for different trends in working hours between young and old in each state. So before I am not controlling for trend in YOUNG workers only, it is mixed up with the overall trend for all workers in the state. AFTER controlling for the YOUNG worker trend, my result is gone. Is this meaning that:

      1. Both Gujarat and Rajasthan young workers have trend increasing upwards, and the LAW did very little to change that? This is meaning that underlying trend amongst young workers is very common across the state?

      EDIT: Wrong interpretation before



      Last edited by sanjay nawaz; 05 Apr 2017, 12:25.

      Comment


      • #18
        1. I have significant results if I include no controls and simply run the diff-in-diff. So there is significant treatment effect.
        What does this even mean? How can you run a DID analysis with no controls? There is no difference in difference definable without controls.

        2. I get significant results if I include year fixed effects to the specification. I get a fraction smaller p-value and a small increase in estimate with this specification. I was thinking estimate should be smaller after inclusion of year fixed effects (another term for year dummies), why is it actually bigger?

        3. I get significant results if I include the c.year##Rajasthan interaction term. Treatment effect is significant with and without including year fixed effects in this specification. Model results of treatment effect with year fixed effects only is very similar to model with year fixed effects and c.year##i.India.

        4. Treatment effect disappear after inclusion of three war interaction term i.YOUNG##c.year##Rajasthan suggested by Clyde sir!
        You shouldn't be doing this! Your model should be chosen and fixed based on your understanding of the mechanisms at work. Playing around with different models and then cherry-picking p-values is not science. It's p-hacking. Some even call it scientific misconduct (though in my view that's a bit extreme.)

        I was thinking estimate should be smaller after inclusion of year fixed effects (another term for year dummies), why is it actually bigger?
        No, it can go either way. Sometimes the period time-shocks are proxies for other effects that actually happen to obscure the relationship you're looking for. When that happens, adjusting for them can unmask the relationships. It's the same with any other confounding variable: adjustment will change the effect, but the possibilities lie in both directions.

        ...model with year fixed effects and c.year##i.India
        How can you possibly have an i.India variable in this model. Last time I checked, both Gujarat and Rajasthan are part of India. What am I missing here?

        4. Treatment effect dissapear after inclusion of three war interaction term i.YOUNG##c.year##Rajasthan suggested by Clyde sir!
        Three way models are complicated. Are you sure you did the right test here? Remember that because of the interactions there are two different treatment effects, one for the young and one for the old, and even these vary by year. What exactly is it that disappeared? What about the average marginal effect of the law variable. (By the way,. where did that variable go? You're not mentioning it any more.)

        Your use of language is confusing me. Using interaction terms does not control for anything. It enables you to model separate effects in subgroups (or continuously graded effects.) So I really can't follow your discussion about controlling for things at all. I don't know what you're getting at.

        Comment


        • #19
          Hi sorry in my rush for typing i am making typing mistakes, am meant to write c.year##i.Rasjasthan;

          I am running the different models as robustness checks. So I will be showing that any treatment effect simply disappears after inclusion of the three way interaction.

          When I saw significant results, I am only looking at the coefficient on the LAW*YOUNG*Rajasthan variable in my model. This coefficient will tell me how the LAW affected the youth in rajasthan differently from the youth in Gujarat. I am not at the moment paying attention to the other variables. This is because usually in diff-in-diff you only pay attention to the TREATMENT*GROUP variable and not the other variables because that is the coefficient on interest. Am not used to interpreting coefficients other than the d-i-d or in this model the d-i-d-i-d because most of the papers I read never mention them. But maybe that is because most of the papers use d-i-d, and so thing should be differing with d-i-d-i-d?

          Thanking you for the explaination for the time fixed effects. I am wondering what event would time fixed effects be proxying for that causes an increase in the d-i-d-i-d estimate after inclusion of time fixed effects. Is it that time fixef effects have a negative relationship with LAW*YOUNG*Rajasthan variable, so once there is inclusion, this is filtered out?

          My specification not including the time fixed effects and controls such as GDP is the following:

          Code:
           xtreg workhours i.YOUNG##i.Rajasthan##i.LAW  i.YOUNG##c.year##i.Rajasthan, fe vce(cluster state)
          The only significant variable here is the;

          YOUNG#c.year

          I am assuming that this means the common trend in the young across both states is significant, which complements the loss in significance once I attach i.YOUNG on the two way interaction term i.Rajasthan##c.year. Although my YOUNG#Rajasthan#c.year term is not significant and my Rajasthan#c.year is also not significant.

          One side issue is that I have a x variables which control for parents age, and this is being omitted. I don't understand why this is the case. Is this correalted with the time trend, this is what I am thinking.


          Am very sorry for wrong use of language.

          Comment


          • #20
            The only significant variable here is the;

            YOUNG#c.year

            I am assuming that this means the common trend in the young across both states is significant, which complements the loss in significance once I attach i.YOUNG on the two way interaction term i.Rajasthan##c.year.
            Well, no. It's really very hard to say what, if anything, it means. In these interaction models, the terms for the individual constituents of the interaction, or the pairwise interacations in the presence of a three-way interaction no longer mean what they would have meant in a simpler model. The fact that you have two three way interactions here makes it even harder to keep it all straight. To be honest, I would not even attempt it. Yes, it can be done, but it will take you a long time, and you are likely to make mistakes along the way, because all of the various effects and sub-effects have to be calculated as sums of coefficients. The individual coefficients themselves are not very useful. So in interpreting these models, I estimate and draw conclusions about effects only from the output of -margins-, not from the regression coefficients themselves.

            One side issue is that I have a x variables which control for parents age, and this is being omitted. I don't understand why this is the case. Is this correalted with the time trend, this is what I am thinking.
            Close. Within a particular subject's data, the parent's age will be colinear with the time trend. But different subjects presumably start out with different parents ages. So this is not the whole story. What completes the story is the fixed effect. Parent's age will be a perfect linear combination of the fixed-effects indicators and the time trend. If this were not a fixed effects model, or if you omitted the time trends, then either way, you would retain the parent age variable. But the fixed effects are clearly necessary, and in the context you have presented, as I understand it, the time trends may well be crucial, too. If so, the effect of parent age is not estimable in this kind of model.

            Comment


            • #21
              Thank you Clyde

              Whilst interpretation of the individual coefficients may be troubling, my main concern at this moment is interpretation of the d-i-d-i-d coefficient, which I think might have easier interpretation since it is the coefficient most people care about. I will now be reading on your stata journal you linked me to understand and implement the margins command.

              Comment


              • #22
                Okay I have implemented the margins command. I have done the following:

                margins YOUNG#Rajasthan#LAW, noestimcheck

                margins Rajasthan, dydx(year)

                In my model where I don't have the 3 way interaction time trend term (i.YOUNG##c.year##i.Rajasthan):
                Code:
                xtreg workhours i.YOUNG##i.Rajasthan##i.LAW, fe vce(cluster state)
                I am getting sensible answers for the first margins command. The trouble is that when I re-introduce the 3 way interaction time trend term (i.YOUNG##c.year##i.Rajasthan):
                Code:
                xtreg workhours i.YOUNG##i.Rajasthan##i.LAW i.YOUNG##c.year##i.Rajasthan, fe vce(cluster state)
                Then I am getting crazy answers (very, very large). Even negative which is impossible. Obviously I have miscoded my margins command in this instance. I must not be taking account of
                i.YOUNG##c.year##i.Rajasthan term. Am not sure where to proceed here.

                When I run the second margins command I get that the coefficients are not estimable. Not sure why this one is happening.

                Comment


                • #23
                  So a couple of questions. I didn't really question it before, but what is the variable state that you are using for your cluster robust standard error?

                  And how did you -xtset- your data? That is, what is the panel structure here?

                  In addition, I think I need to see the actual commands and outputs of the -xtset-, -xtreg-, and -margins- commands to troubleshoot this. Please do this by copying directly from your Results window or log file and pasting into a code block here so that nothing gets subtly changed by mistake. (And give the actual output of -xtreg-: don't run it through -esttab- or -outreg- or anything like that.)

                  Comment


                  • #24
                    My panel structure is following each worker on a YEARLY basis The dependent variable is YEARLY SAVINGS IN USD in YEAR. I am clustering at the state level, Rajasthan and Gujarat are two indian states which border each other. This is my latest specification.

                    Code:
                    xtset worker year
                    xtdes
                    gen LAW = (year>=2002)
                    xtreg SAVINGUSD i.YOUNG##i.RAJASTHAN##i.LAW  i.YOUNG##c.year##i.RAJASTHAN i.year, fe vce(cluster state)
                    margins YOUNG#RAJASTHAN#LAW, noestimcheck // EXPECTED HOURS IN EACH CITY BEFORE AND AFTER
                    margins RAJASTHAN, dydx(year)
                    marginsplot
                    The margins output for this specification is:

                    Code:
                    -------------------------------------------------------------------------------------
                    | Delta-method
                    | Margin Std. Err. z P>|z| [95% Conf. Interval]
                    --------------------+----------------------------------------------------------------
                    YOUNG#RAJASTHAN#LAW |
                    0 0 0 | 6866.411 4487.828 1.53 0.126 -1929.571 15662.39
                    0 0 1 | 6866.411 4487.828 1.53 0.126 -1929.571 15662.39
                    0 1 0 | -298.502 3574.729 -0.08 0.933 -7304.841 6707.837
                    0 1 1 | -287.3368 3566.045 -0.08 0.936 -7276.656 6701.982
                    1 0 0 | -261.4541 1020.897 -0.26 0.798 -2262.376 1739.468
                    1 0 1 | -254.2366 1016.603 -0.25 0.803 -2246.742 1738.269
                    1 1 0 | -182.3822 680.4452 -0.27 0.789 -1516.03 1151.266
                    1 1 1 | -174.7156 675.039 -0.26 0.796 -1497.768 1148.337
                    -------------------------------------------------------------------------------------
                    This look like nonsense. If I am using this specification:

                    Code:
                    xtset worker year
                    xtdes
                    gen LAW = (year>=2002)
                    xtreg SAVINGUSD i.YOUNG##i.RAJASTHAN##i.LAW, fe vce(cluster state)
                    margins YOUNG#RAJASTHAN#LAW, noestimcheck // EXPECTED HOURS IN EACH CITY BEFORE AND AFTER
                    margins RAJASTHAN, dydx(year)
                    marginsplot

                    Then first margins output is:
                    Code:
                    
                    Expression   : Linear prediction, predict()
                    
                    -------------------------------------------------------------------------------------
                                        |            Delta-method
                                        |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    --------------------+----------------------------------------------------------------
                    YOUNG#RAJASTHAN#LAW |
                                 0 0 0  |   69.31832   1.012689    68.45   0.000     67.33349    71.30315
                                 0 0 1  |   77.90618   6.420397    12.13   0.000     65.32244    90.48993
                                 0 1 0  |   69.31832   1.012689    68.45   0.000     67.33349    71.30315
                                 0 1 1  |   75.22994    5.53278    13.60   0.000     64.38589    86.07399
                                 1 0 0  |   69.31832   1.012689    68.45   0.000     67.33349    71.30315
                                 1 0 1  |   71.39383   2.815018    25.36   0.000      65.8765    76.91116
                                 1 1 0  |   69.31832   1.012689    68.45   0.000     67.33349    71.30315
                                 1 1 1  |   71.97724   1.598195    45.04   0.000     68.84483    75.10964
                    -------------------------------------------------------------------------------------
                    Which make sense.

                    The second margin command is only work for the following specification:

                    Code:
                    xtset worker year
                    xtdes
                    gen LAW = (year>=2002)
                    xtreg SAVINGUSD i.YOUNG##i.RAJASTHAN##i.LAW i.year, fe vce(cluster state)
                    margins YOUNG#RAJASTHAN#LAW, noestimcheck // EXPECTED HOURS IN EACH CITY BEFORE AND AFTER
                    margins RAJASTHAN, dydx(year)
                    marginsplot
                    But all tables show "(not estimable)".

                    The second margin command fails with the full specification:

                    Code:
                    xtset worker year
                    xtdes
                    gen LAW = (year>=2002)
                    xtreg SAVINGUSD i.YOUNG##i.RAJASTHAN##i.LAW  i.YOUNG##c.year##i.RAJASTHAN i.year, fe vce(cluster state)
                    margins YOUNG#RAJASTHAN#LAW, noestimcheck // EXPECTED HOURS IN EACH CITY BEFORE AND AFTER
                    margins RAJASTHAN, dydx(year)
                    marginsplot
                    As STATA says

                    Code:
                    margins RAJASTHAN, dydx(year)
                    invalid dydx() option;
                    variable year may not be present in model as factor and continuous predictor
                    Am sorry I am not sure if am allowed to give full output as per my boss's permission.

                    I can give following which is top of regression table for the first specifiction in this post also:

                    Code:
                    Fixed-effects (within) regression               Number of obs      =      3360
                    Group variable: worker                          Number of groups   =       625
                    
                    R-sq:  within  = 0.0154                         Obs per group: min =         2
                           between = 0.0000                                        avg =       5.4
                           overall = 0.0002                                        max =        13
                    
                                                                    F(18,73)           =      2.97
                    corr(u_i, Xb)  = -0.9999                        Prob > F           =    0.0005
                    Thanking you Clyde and again am sorry for not putting up full output results.

                    EDIT: forgot to put in CODE Commands for table

                    BIG EDIT: I put in the wrong specifications in my post. I have 4 or 5 regressions with this dataset and I am copy pasting all over the place. This problem is relating to when SAVINGSUSD is the dependent variable NOT workhours. That one is looking fine actually. It is this one that I am having problems with. Am very, very sorry for this inconvenience
                    Last edited by sanjay nawaz; 05 Apr 2017, 19:09.

                    Comment


                    • #25
                      Not relevant post
                      Last edited by sanjay nawaz; 05 Apr 2017, 19:09.

                      Comment


                      • #26
                        This is getting a bit beyond my ability to troubleshoot remotely without having my hands on the data. But here's my best shot at it. I think the problem is arising because in the specifications that produce problems the regression command contains both c.year and i.year. While that is not necessarily a mis-specified model, it may be getting -margins- confused. I generally discourage people from using year-specific fixed effects and a linear time trend in the same model, perhaps including a small number of indicators for specific years that are likely to be distinctive (i.e. one might include 2008.year to account for the financial crisis--I don't know if that's relevant in India, but just as an example).

                        Nevertheless, it is permissible to have both a linear time trend and year indicators. (In this kind of model, there will be two indicators omitted due to colinearity instead of the usual one; one is the usual omission of the base category and the other arises because of the LAW variable.) But I think -margins- is not handling it properly. Here's an idea to try; I don't know if it will work.

                        Code:
                        clonevar time_trend = year
                        xtset worker year
                        xtdes
                        gen LAW = (year>=2002)
                        
                        xtreg SAVINGUSD i.YOUNG##i.RAJASTHAN##i.LAW i.YOUNG##c.time_trend##i.RAJASTHAN i.year, fe 
                        margins YOUNG#RAJASTHAN#LAW, noestimcheck // EXPECTED HOURS IN EACH CITY BEFORE AND AFTER
                        margins RAJASTHAN, dydx(time_trend)
                        I hope that will give you more sensible results.

                        One other correction I have made here is to remove the error clustering on state. You didn't explain the variable state, but I assume it has only two levels: Rajasthan and Gujarat. Clustered covariance on a variable with so few levels is not good: the cluster robust variance estimator requires a large number of clusters. While different experts disagree on how many is enough, I don't think anyone would say that two is sufficient. It's not just that the cluster robust variance estimator doesn't work well with small numbers of clusters, it is actually worse than the ordinary covariance estimator in this circumstance.

                        Comment


                        • #27
                          Thanking Clyde,

                          I'm getting one more curious result. When I add the state specific trend, my LAW variable is dropping out. My law variable is not dropping when I am using only time dummies, it is dropping out only when I am implementing the state specific time trend. What is the reason for this? My LAW dummy is equal to 1 after a certain year. Is it that it is correlated with the time trend? But this can only be true after the year LAW was implemented. Before this year the LAW dummy=0, and time trend=1 (correct?) , and after this year then LAW=1 and time trend=1. Where is the collinearity?

                          Comment


                          • #28
                            Hi

                            Am a little bit confuse as to the following:

                            I am running a DDD on the effect of a law on the young in an Indian state. One state implemented the LAW (Rajasthan), another state did not (Gujarat), and the effect of the LAW will only be implemented on the YOUNG (de facto). My entire sample contains both young and old from both states

                            My DDD estimate should be:
                            1. [YoungRajasthanPOST-YoungRajasthanPRE]-[YoungGujaratPOST-YoungGujaratPRE]-[OldRajasthanPOST-OldRajasthanPRE]
                            or
                            2. [YoungRajasthanPOST-YoungRajasthanPRE]-[ALLGujaratPOST-ALLGujaratPRE]-[OldRajasthanPOST-OldRajasthanPRE]

                            previously on this thread it was said that specification 1 is correct, but now I am thinking specification 2 is correct because in specification 2, my treatment group is [Young in Rajasthan], but my control group is ALL in gujarat, and OLD in Rajasthan which is what DDD is meant to do.

                            Comment


                            • #29
                              Well, in a sense you can think of it as #2. But, it really isn't a good idea to have the young people included in the control group in one state but not in the other (and you clearly can't have them in the control group in Rajasthan.) Moreover, your outcome appears to be something that would definitely be age related, so the concerns about having a different age distribution among the treatment group than among the controls truly bite here. I would not do it.

                              Besides, we are talking economics here. I'm not an economist, but I know enough about it to know that even if a policy is targeted at one particular group, there are often secondary effects on others under the jurisdiction of the policy (even if they are not directly targeted by it). This strikes me as another reason not to conceive of it as #2.

                              Comment


                              • #30
                                Ok thanking Clyde, I think now my question is more conceptual in nature:

                                I basically have 4 groups of people in this study: young in gujarat, old in gujarat, young in rajasthan, old in rajasthan.

                                My treatment group is young in rajasthan, and everyone else is not affected by the policy. I think If i can understand this as a DD first and then unfold it into a DDD it will be very helpful.

                                DD approach:

                                1. Delete all observations that are OLD and run a DD on YoungRajasthan (treatment) vs YoungGujarat (control) to get the DD estimate of [YoungRajasthanPOST-YoungRajasthanPRE]-[YoungGujaratPOST-YoungGujaratPRE], which is the difference in outcome of treatment group vs difference in outcome of control group.

                                2. Now if I unfold this into a DDD I keep all observations (so I am not deleting any OLD observations), and say the treatment group is again YoungRajasthan, and OldRajasthan + (YoungGujarat + OldGujarat = ALLGujarat) are my control groups. Now following the same logic of the DD approach, my DDD estimate should be change in outcome of treatment group minus change in outcome of the control group. Do you now see why I am thinking the DDD estimate should be [YoungRajasthanPOST-YoungRajasthanPRE]-[ALLGujaratPOST-ALLGujaratPRE]-[OldRajasthanPOST-OldRajasthanPRE] and not [YoungRajasthanPOST-YoungRajasthanPRE]-[YoungGujaratPOST-YoungGujaratPRE]-[OldRajasthanPOST-OldRajasthanPRE]?

                                In first case I code DD dummy as 1 if Gujarat*Post because all old observations have been deleted. In second case I code DDD dummy as 1 if observation is Young*Gujarat*Post.

                                I am not sure where I am conceptually going wrong. If you could explain how to go from DD to DDD in order to realise that DDD estimate is [YoungRajasthanPOST-YoungRajasthanPRE]-[YoungGujaratPOST-YoungGujaratPRE]-[OldRajasthanPOST-OldRajasthanPRE], and how this links with which observations to delete and keep, it would be greatly helpful.

                                I think at most basic level my confusion is how treatment and control groups show up in the DD,DDD estimators, because according to me now, when I unfold from DD to DDD, I have an 'excess control' group (OLD Gujarat) and I am not sure how this is affecting the terms in the DDD estimator.

                                sorry for such long message, but am really confused

                                Comment

                                Working...
                                X