Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed-Effects: Dropping those already in the treatment group without having insufficient observations

    I learned that when conducting a fixed-effects analysis one should exclude everybody who is already in the treatment group in wave 1, because one cannot examine the before-and-after-effect in this case. While this sounds plausible to me, I wonder

    Question 1: Do I really have to ex​clude everyone who is already in the treatment in wave 1 or does the analysis does automateically not consider those people when estimating the effect? Because if I have to exclude those persons, this is my problem:

    Question 2: Over the waves, my Treatment indicator is found several times, so for example
    Wave 1 Wave 2 Wave 3
    Treatment 45 8 90
    Control 10345 10382 10300
    I excluded everyone who has already been in treatment in wave 1, so:

    drop if treatment == 1 & wave == 1

    But then I suddenly have "insufficient observations" in my analysis! Asking

    count if !missing(Treatment, Control variables)

    it suddenly says there are zero cases. Prior to excluding it said something like 50 cases.


    Am I doing s.th. wrong or do I not even have to worry about this?

    I am grateful for any feedback. Thank you very much!

  • #2
    You don't give much information about the overall design of your study, nor do you tell us the actual analysis you plan to do, so helping you requires a lot of guesswork.

    In particular, if a person (or firm, or whatever your unit of observation is) is in the Treatment condition at any given wave, does that entity remain in the Treatment condition for the rest of the study. Are these three waves the same entities followed over time, or are they three separate cross-sections? What specific analysis do you have in mind for this data? I infer from your description that different entities enter the treatment condition at different times--is that correct? Do any of the entitites that are initially in Control condition switch to Treatment at a later wave?

    Comment


    • #3
      It completely slipped my mind to mention that, thank you so much for asking!

      It is a pan​el data set, so repeated observations of people over time. Overall I have three waves.
      I am planning to do a Fixed-Effects-Analysis, regressing wage on participation in treatment.
      The treatment is extremely flexible. Units (people) do not necessarily stay in treatment, they can drop out at every wave. They can also stay in treatment for all three waves. They can switch between treatment and control group at any wave, so a person could also enter treatment for the first time in the third wave. Or a person could be in treatment in wave 1, drop out, enter again in wave 3, for example.


      There is one mistake I would like to quickly correct: When I wrote about "insufficient observations", the code for the missing variables is this one:

      count if !missing(treatment, control variables) & treatment == 1​

      I forgot to write down the last part.

      Comment


      • #4
        Guest.
        as Clyde suggested more details, a sketch of Stata code8s9 and related results or, even better, an excerpt of your dataset via -dataex- would mke replying easier.
        Anyway, you may want to try something along the following lines:
        Code:
        . . set obs 3
        number of observations (_N) was 0, now 3
        . g id=1
        . g year=_n+2000
        . g treatment=1 in 1
        (2 missing values generated)
        . replace treatment=2 in 2
        (1 real change made)
        . replace treatment=3 in 3
        (1 real change made)
        . expand 2
        (3 observations created)
        . replace id=2 in 4/6
        (3 real changes made)
        . replace treatment=2 in 4
        (1 real change made)
        . label define treatment 1 treatment 2 control 3 drop_out
        
        . label val treatment treatment
        . g wage=1000*runiform()
        
        . xtset id year
               panel variable:  id (strongly balanced)
                time variable:  year, 2001 to 2003
                        delta:  1 unit
        . xtreg wage i.treatment, fe
        Fixed-effects (within) regression               Number of obs     =          6
        Group variable: id                              Number of groups  =          2
        R-sq:                                           Obs per group:
             within  = 0.0758                                         min =          3
             between = 1.0000                                         avg =        3.0
             overall = 0.0358                                         max =          3
                                                        F(2,2)            =       0.08
        corr(u_i, Xb)  = -0.2137                        Prob > F          =     0.9242
        ------------------------------------------------------------------------------
                wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
           treatment |
            control  |  -91.22863   550.3884    -0.17   0.884    -2459.359    2276.901
           drop_out  |  -202.9828   550.3884    -0.37   0.748    -2571.113    2165.147
                     |
               _cons |   446.7334   463.2208     0.96   0.437    -1546.345    2439.812
        -------------+----------------------------------------------------------------
             sigma_u |  138.39729
             sigma_e |  420.36606
                 rho |  .09779265   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        F test that all u_i=0: F(1, 2) = 0.25                        Prob > F = 0.6649
        Last edited by sladmin; 06 Feb 2018, 10:10. Reason: anonymize original poster
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Dear Carlo,

          thank you very much for your syntax. I am not quiet sure what to take from it. If I am not mistaken treatment == 3 are those who drop out after being in treatment. I am sorry I don't quiet understand how that answers my question?

          Concerning more examples: I completely understand that request. Unfortunately I have no direct access to the data and am not allowed to post anything about it. I can only make up examples.

          My data structure is the following:
          id wave wage treatment covariates (e.g. health)
          1 1 1800 1 5
          1 2 2000 0 7
          1 3 2000 1 7
          2 1 2300 0 8
          2 2 2300 0 8
          2 3 2350 1 9
          And my code:

          Code:
          use dataset.dta
            
            /*replacing missing variables and as such*/
            
            gen age = year - birthyear
            
            xtset id wave
            
            count if !missing(treament, health) & treatment ==1
            --> 53
            count if !missing(treatment,  health) & treatment == 0
            --> 13380
          
           xtreg wage treatment health c.age##c.age##c.age ib(freq).year cov_1 cov_2 cov_3, fe vce (cluster id) /* overall I have 7 indep. variables*/
            --> "insufficient oberservations"
          
          /* even prior to excludin​g those already in treatment it tells me that I don't have enough observations*/
            
            drop if treatment == 1 & wave ==1
            
            count if !missing(treatment, health) & treatment == 1​
            --> 0​
            count if !missing(treatment, health) & treatment == 0
            --> 13380
            
            xtreg wage treatment health c.age##c.age##c.age ib(freq).year cov_1 cov_2 cov_3, fe vce (cluster id) /* overall I have 7 indep. variables*/
            --> "insufficient oberservations"​

          So, apparently I seem to have to few people in my treatment group. ​Or do I have too many covariates? And clearly, removing those already in treatment in the first wave, removes all treatment-observations.
          Last edited by sladmin; 06 Feb 2018, 10:11. Reason: anonymize user

          Comment


          • #6
            Guest:
            as per your first post I envisaged that units can drop from being in treatment but can show up again in treatment in future waves (but probably I should have been mistaken).
            Some remarks:
            - the error message that Stata gave you may depend on asking too much out of your data (you do not say how many observations and groups your sample is composed of);
            - I'm not clear with your strategy about dealing with missing values, which might have affected the above error message;
            - you've included a cubic term for age as a predictor and your choice may be well supported from the literature in your research field; however, the usual approach is searching for turning point and so a quadratic term is enough;
            - eventually, as you stated from the beginning that you want to follow an -fe- specification with your panel data regression, I assume that you've already ruled out -xtreg, re- via -hausman-.,
            Last edited by sladmin; 06 Feb 2018, 10:10. Reason: anonymize original poster
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Dear Carlo,

              you are absolutely right, units​ can drop in and out of treatment. I just do not know how to apply your guidance to my problem, maybe you could just tell me why you think I should controll for "drop outs"? I would have thought that repeated enrollment into treatment is not a problem for FE.

              I have added all of the "count if" - numbers above (in case I added that while you wrote your response), I cannot really say how many groups I have, because I usually get that Information from my FE output which i cannot get at the moment due to the error message. But I probably indeed ask too much of my data. In that case, do I have to drop the whole analysis or is there another solution?

              True about age. I will recheck that.

              I cannot rule out RE at the moment because my FE Analysis does not work and I usually conduct a houseman test by first doing a RE, then a FE analysis, storing the estimates and then doing a hausman test with them.

              Comment


              • #8
                Guest:
                - repeated enrollments are actually not an issue: that's why I have previously proposed to create a categorical variable to consider drop-outs, too, so that you can qualify each unit for each wave according to her/his treatment condition (that is: treatment, control Group; drop-out), under the asumption that who drops out in the (say) second wave is not surely lost to follow-up (i.e., third wave), but can show up again in the third wave being on treatment, belonging to control group or remaining drop-out-;
                - if missing data give you some troubles, I would not drop the whole analysis (unless you can explore another research path with a different dataset), but try to issue some queries about the missing data or, if unfeasible, -ipolate- or -mi- them;
                -when the previous issues have been hopefuly fixed, please remember to run -fe- first, store it, and then do the same with -re-; -hausman- needs the respect of that sequence to work properly.
                Last edited by sladmin; 06 Feb 2018, 10:11. Reason: anonymize original poster
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Because you, Carlo, said that I might ask too much of my data, I excluded the covariates from my analysis.



                  Excluding all covariates, it works! And I get results. Even if I exclude everyone in treatment in wave 1. So is it a problem with my degrees of freedom? That would really suck, because I would hardly be able to control for any other variables other than treatment.



                  And concerning my zero cases after excluding the first wave-treatment-people, the problem is suddenly solved if I do not account for the covariates.

                  Code:
                    drop if wave == 1 & treatment == 1
                    Count if !missing(Treatment) & Treatment == 1
                    -->185 ​
                    Count if !missing(Treatment) & Treatment == 0
                    --> 14460
                  Suddenly I do not have zero cases anymore after excluding those already in treatment in wave 1. Apparently there were too many missings in my covariates (I did check if there are any variables with a vast amount of missings - there are none - but I never excluded all of them, thinking this could not possibly be the problem.)


                  Regarding hausman: Thank you for reminding me. I will do that test now.

                  Regarding drop out: I now understand what you mean. That is a great idea, thank you!


                  Comment

                  Working...
                  X