Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Treatment variable in a Difference-in-Difference analysis using didregress is collinear

    I am trying to run a Difference-in-Difference analysis but the treatment variable is collinear against my expectation. The didregress cannot do without specifying a treatment variable.

    Below is the data using dataex:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float selfefficacy byte(highschatt ethnicity numpartners liveinpartner) float treatment byte facility float dummy
     26 3 2 . 2 1 12 1
     20 3 2 . 2 0 10 1
     36 4 1 . . 0  5 1
     23 5 1 . 2 1  3 1
     31 4 2 . 2 0  4 1
     32 4 1 . . 0  6 1
     29 4 1 . 2 1  1 1
     22 4 1 . . 0  5 1
     23 4 2 . 2 0  8 1
     30 4 1 . . 0  5 1
     21 4 1 . . 0  5 1
     29 2 2 . 2 0  4 1
     27 4 1 . . 0  5 1
     22 5 2 . 1 1  2 1
     35 4 1 . . 0  6 1
     25 3 1 . . 0  6 1
     29 4 2 . 2 1  7 1
     25 4 1 . . 0  5 1
     28 4 1 . . 0  5 1
     36 4 2 . 2 1 12 1
     27 4 1 . . 1  3 1
     27 4 2 . 2 1 11 1
     27 4 2 . 2 0  8 1
     28 3 2 . 2 1  7 1
     35 4 1 . 2 1  1 1
     20 3 1 . . 0  5 1
     30 3 2 . 2 1  2 1
     25 4 1 . . 0  5 1
     28 4 1 . 2 1  9 1
     27 2 2 2 1 0  4 1
     26 2 2 . 1 0  8 1
     36 4 2 . 2 1  2 1
     21 4 2 . 2 1 12 1
     32 4 1 . . 0  5 1
     27 4 2 . 2 1  7 1
     27 4 1 . . 1  3 1
     36 4 2 2 2 1  2 1
    891 4 1 . . 1  3 1
     24 4 1 . . 0  6 1
     32 5 1 . . 1  1 1
     21 4 1 . . 0  6 1
     27 4 2 . 2 1  2 1
     36 4 2 . . 1  7 1
     35 4 1 . . 1  1 1
     31 1 2 . 2 0  4 1
     15 3 2 . 2 0  8 1
     25 4 3 . 2 0 10 1
     31 4 1 . 2 0  6 1
     31 4 2 . 2 1 12 1
     35 5 3 . 2 0  8 1
     32 4 2 . 2 1 11 1
     32 4 2 . 1 1  7 1
     34 4 1 . 2 1  9 1
     29 4 1 . . 1  3 1
     23 4 1 . . 1  3 1
     29 2 1 . . 0  5 1
     36 4 2 . 2 0  8 1
     36 1 2 . 2 0  4 1
     29 4 1 . . 1  9 1
     29 5 1 . . 0  5 1
     23 4 1 . . 1  3 1
     28 3 1 . . 0  5 1
     28 4 1 . . 1  3 1
     25 3 2 . 2 1  2 1
     36 4 1 . . 1  9 1
     24 4 2 . 2 1 11 1
     28 4 2 . 1 1 12 1
     28 4 1 . . 0  5 1
     36 4 2 . 2 1 11 1
     34 3 2 2 2 1 11 1
     30 2 2 . 2 0  4 1
     30 1 2 . 2 0  4 1
     36 4 2 . . 1  7 1
     22 4 2 . 2 0  8 1
     26 4 2 . 2 1 12 1
     35 4 2 . 2 1  2 1
     19 4 1 . . 1  3 1
     31 2 2 . 2 0  4 1
     36 4 1 . . 0  6 1
     27 4 1 . . 1  9 1
     36 4 1 . . 1  1 1
     24 3 1 . . 0  6 1
     23 4 2 . 2 1 11 1
     33 3 2 . 1 1  7 1
     23 4 1 . . 1  1 1
     35 . 5 2 1 0 10 1
     28 4 2 . 2 1  7 1
     28 4 1 . 2 1  3 1
     30 3 1 . . 0  6 1
     28 4 2 . 2 1  2 1
     36 4 2 . 2 1 12 1
     30 4 1 . . 1  9 1
     28 5 2 . . 1  2 1
     30 4 2 . 2 0 10 1
     27 4 1 . . 0  5 1
     23 4 1 . . 1  3 1
     27 3 4 . . 0  6 1
     27 4 2 . . 1 11 1
     36 4 2 . 1 1 11 1
     36 4 1 . . 1  9 1
    end
    label values highschatt highschatt
    label def highschatt 1 "Islamiyyah", modify
    label def highschatt 2 "Primary", modify
    label def highschatt 3 " JSS", modify
    label def highschatt 4 "SSS", modify
    label def highschatt 5 "Higher", modify
    label values ethnicity ethnicity
    label def ethnicity 1 "Yoruba", modify
    label def ethnicity 2 "Hausa", modify
    label def ethnicity 3 "Fulani", modify
    label def ethnicity 4 "Igbo", modify
    label def ethnicity 5 "Others", modify
    label values numpartners numpartners
    label def numpartners 2 "Two Live-in Partner", modify
    label values liveinpartner liveinpartner
    label def liveinpartner 1 "Yes", modify
    label def liveinpartner 2 "No", modify
    label values facility facility
    label def facility 1 "Atan PHC", modify
    label def facility 2 "Baban dodo PHC", modify
    label def facility 3 "Etere PHC", modify
    label def facility 4 "Jaji PHC", modify
    label def facility 5 "Kugba PHC", modify
    label def facility 6 "Kuto PHC", modify
    label def facility 7 "Kwata PHC", modify
    label def facility 8 "Mando PHC", modify
    label def facility 9 "Otun PHC", modify
    label def facility 10 "Rigachikun PHC", modify
    label def facility 11 "Samaru PHC", modify
    label def facility 12 "Tudun Wada", modify

    Below is the output I am getting:

    Code:
    . didregress (selfefficacy i.highschatt i.ethnicity i.numpartners i.liveinpartner) /// 
    > (treatment), group(facility) time(dummy) level(95) aeq aggregate(dlang,constant)
    note: treatment omitted because of collinearity.
    model is not identified
        The treatment variable treatment was omitted because of collinearity.
    I want to use didregress because it allows for factor-variable covariates unlike ieddtab.

    I will appreciate your kind suggestions.

  • #2
    You are misinterpreting the treatment variable as used in the -didregress- command. This seems to confuse a lot of people using this command, as this question arises regularly here on Statalist. The treatment variable that -didregress- wants is not merely a division of the observations into treatment and control groups. Rather, it is a variable that is set to 1 for those observations that receive the treatment but only in those time periods where they are receiving it, and 0 otherwise. Your treatment variable must be replaced by one that does that. When you do that, it will no longer be colinear.

    If, in fact, your situation is such that the facilities with treatment = 1 receive the treatment at all times in your data, then a DID estimate of the treatment effect is not possible. For a (generalized) DID analysis there must always be at least three types of observations: those for facilities that will eventually receive treatment but have not yet, those for facilities that have received treatment, and those for facilities that that never receive treatment.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      You are misinterpreting the treatment variable as used in the -didregress- command. This seems to confuse a lot of people using this command, as this question arises regularly here on Statalist. The treatment variable that -didregress- wants is not merely a division of the observations into treatment and control groups. Rather, it is a variable that is set to 1 for those observations that receive the treatment but only in those time periods where they are receiving it, and 0 otherwise. Your treatment variable must be replaced by one that does that. When you do that, it will no longer be colinear.

      If, in fact, your situation is such that the facilities with treatment = 1 receive the treatment at all times in your data, then a DID estimate of the treatment effect is not possible. For a (generalized) DID analysis there must always be at least three types of observations: those for facilities that will eventually receive treatment but have not yet, those for facilities that have received treatment, and those for facilities that that never receive treatment.
      Thank you for your response. I quite appreciate it. I actually struggled with that part of the treatment when I was reading the didregress manual. I now understand it better.

      The treatment is an intervention in form of lessons received. Facilities were designated as either for treatment group (to receive intervention) and comparison group(to not receive intervention). Lessons were given to the participants in the intervention facilities and were interviewed in a survey at baseline and endline. This implies that the facilities with treatment = 1 received the treatment at all times. Hence, DID estimate of the treatment effect is not possible as you asserted.

      Will it be valid if I subjectively set one of the facilities that received treatment to "facilities that will eventually receive treatment but have not yet"?

      Also, I tried specifying the nogteffects option whilst still having the group variable and time variable specified and it gave me results. The covariates coefficients were not initially displayed even though I requested that they be displayed by specifying the aeq option but latter got displayed when I removed the aggregate(dlang,constant) option (not so important to me).


      Does didregress at least consider the time component (the dummy variable in this case:baseline/endline) if the nogteffects option is specified?

      Code:
      . didregress (selfefficacy i.highschatt i.ethnicity i.numpartners i.liveinpartner) /// 
      > (treatment), group(facility) time(dummy) level(95) aeq nogteffects
      
      Number of groups and treatment time
      
      Time variable: dummy
      Control:       treatment = 0
      Treatment:     treatment = 1
      -----------------------------------
                   |   Control  Treatment
      -------------+---------------------
      Group        |
          facility |         4          5
      -------------+---------------------
      Time         |
           Minimum |         0          0
           Maximum |         0          0
      -----------------------------------
      
      Difference-in-differences regression                       Number of obs = 173
      Data type: Repeated cross-sectional
      
                                               (Std. err. adjusted for 9 clusters in facility)
      ----------------------------------------------------------------------------------------
                             |               Robust
                selfefficacy | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      -----------------------+----------------------------------------------------------------
      ATET                   |
                   treatment |
                   (1 vs 0)  |  -2.575428   2.269046    -1.14   0.289    -7.807858    2.657002
      -----------------------+----------------------------------------------------------------
      Controls               |
                  highschatt |
                    Primary  |   1.083784   1.144349     0.95   0.371     -1.55509    3.722658
                        JSS  |    1.14773    .595503     1.93   0.090    -.2255023    2.520963
                        SSS  |   3.840893   2.673948     1.44   0.189    -2.325242    10.00703
                     Higher  |    .636369   1.305331     0.49   0.639    -2.373731    3.646469
                             |
                   ethnicity |
                      Hausa  |   2.169859   1.166972     1.86   0.100     -.521184    4.860902
                     Fulani  |   .5044398   1.810871     0.28   0.788    -3.671437    4.680316
                     Others  |   .9085437   .9961483     0.91   0.388    -1.388578    3.205666
                             |
                 numpartners |
        Two Live-in Partner  |   4.688281   3.189657     1.47   0.180    -2.667081    12.04364
      Three Live-in Partner  |   5.754489   2.813575     2.05   0.075    -.7336262     12.2426
       Four Live-in Partner  |     2.1937   2.429528     0.90   0.393    -3.408803    7.796202
                             |
               liveinpartner |
                         No  |   1.076926   .9277845     1.16   0.279    -1.062549    3.216401
                       _cons |   21.96793   2.691596     8.16   0.000      15.7611    28.17476
      ----------------------------------------------------------------------------------------
      Note: ATET estimate adjusted for covariates.

      Or does it make sense if I specify the time variable in the manner below:

      Code:
      . didregress (selfefficacy i.highschatt i.ethnicity i.numpartners i.liveinpartner i.dummy) /// 
      > (treatment), group(facility) time(dummy) level(95) aeq nogteffects
      
      Number of groups and treatment time
      
      Time variable: dummy
      Control:       treatment = 0
      Treatment:     treatment = 1
      -----------------------------------
                   |   Control  Treatment
      -------------+---------------------
      Group        |
          facility |         4          5
      -------------+---------------------
      Time         |
           Minimum |         0          0
           Maximum |         0          0
      -----------------------------------
      
      Difference-in-differences regression                       Number of obs = 173
      Data type: Repeated cross-sectional
      
                                               (Std. err. adjusted for 9 clusters in facility)
      ----------------------------------------------------------------------------------------
                             |               Robust
                selfefficacy | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      -----------------------+----------------------------------------------------------------
      ATET                   |
                   treatment |
                   (1 vs 0)  |  -2.254226   2.599101    -0.87   0.411    -8.247763    3.739312
      -----------------------+----------------------------------------------------------------
      Controls               |
                  highschatt |
                    Primary  |   .9538704   1.293969     0.74   0.482    -2.030028    3.937768
                        JSS  |   1.321295   .5422613     2.44   0.041     .0708382    2.571752
                        SSS  |   3.744676   2.806848     1.33   0.219    -2.727927    10.21728
                     Higher  |    .630829   1.375975     0.46   0.659    -2.542175    3.803833
                             |
                   ethnicity |
                      Hausa  |   3.249392   1.737261     1.87   0.098    -.7567392    7.255523
                     Fulani  |   2.042662   2.465594     0.83   0.431    -3.643009    7.728332
                     Others  |   2.252425   1.891984     1.19   0.268    -2.110497    6.615347
                             |
                 numpartners |
        Two Live-in Partner  |   4.526972    3.39056     1.34   0.219    -3.291673    12.34562
      Three Live-in Partner  |   5.313405   3.115641     1.71   0.127    -1.871276    12.49809
       Four Live-in Partner  |   2.287578   2.444687     0.94   0.377    -3.349879    7.925036
                             |
               liveinpartner |
                         No  |   .9592618   .9986596     0.96   0.365    -1.343651    3.262175
                     1.dummy |   -1.76205   1.608711    -1.10   0.305    -5.471745    1.947644
                       _cons |   21.99936   2.766846     7.95   0.000       15.619    28.37972
      ----------------------------------------------------------------------------------------
      Note: ATET estimate adjusted for covariates.
      Thank you.

      Comment


      • #4
        The treatment is an intervention in form of lessons received. Facilities were designated as either for treatment group (to receive intervention) and comparison group(to not receive intervention). Lessons were given to the participants in the intervention facilities and were interviewed in a survey at baseline and endline. This implies that the facilities with treatment = 1 received the treatment at all times. Hence, DID estimate of the treatment effect is not possible as you asserted.
        There is really nothing more to say about this in relation to DID. Your study design is simply not compatible with DID estimation. Any manipulations you do to the data or the code that create the appearance of a DID estimate of the treatment effect is just a sham and the results are not meaningful. Had you gathered pre-treatment data from both groups, you would have had the necessary data structure to do DID estimation. But without that crucial piece, it is just not possible.

        All you can do here is a simple two-group comparison of the outcomes in the two groups. If the treatment and control facilities were selected by true randomization, then you could analyze this as a cluster-randomized trial; it is not an ideal design for estimating treatment effects but is reasonably strong.

        If the treatment and control facilities were not randomly assigned then you really have very little to work with. You can do a simple two-group comparison. But you could not honestly call the findings anything more than just suggestive, preliminary findings which might be of value in planning a better study to be carried out in the future.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          There is really nothing more to say about this in relation to DID. Your study design is simply not compatible with DID estimation. Any manipulations you do to the data or the code that create the appearance of a DID estimate of the treatment effect is just a sham and the results are not meaningful. Had you gathered pre-treatment data from both groups, you would have had the necessary data structure to do DID estimation. But without that crucial piece, it is just not possible.

          All you can do here is a simple two-group comparison of the outcomes in the two groups. If the treatment and control facilities were selected by true randomization, then you could analyze this as a cluster-randomized trial; it is not an ideal design for estimating treatment effects but is reasonably strong.

          If the treatment and control facilities were not randomly assigned then you really have very little to work with. You can do a simple two-group comparison. But you could not honestly call the findings anything more than just suggestive, preliminary findings which might be of value in planning a better study to be carried out in the future.
          Thank you very much. Your comments are quite an eye-opening. I have learned a lot. Please, permit me to ask one more question, is the ieddtabb approach a true DID analysis? The approach seems like a two-group comparisons of outcomes. Below is the steps of the ieddtabb:
          1. Calculate the before-after difference in the outcome (Y) for the treatment group (B-A).
          2. Calculate the before-after difference in the outcome (Y) for the comparison group (D-C)
          3. Calculate the difference between the difference in outcomes for the treatment group (B-A) and the difference for the comparison group (D-C). This is the difference-in-differences: (DD)=(B-A)-(D-C).
          Thanks.

          Code:
          . ieddtab selfefficacy, t(dummy) treatment(treatment) 
          (0 observations deleted)
          +--------------------------------------------------------------------------------------------+
          |              |           Control           |          Treatment          |  Difference-in  |
          |              |  Baseline  |   Difference   |  Baseline  |   Difference   |   -difference   |
          |              |    Mean    |     Coef.      |    Mean    |     Coef.      |     Coef.       |
          |              |   (SE)     |    (SE)        |   (SE)     |    (SE)        |    (SE)         |
          | Variable     |    N       |     N          |    N       |     N          |     N           |
          |--------------+------------+----------------+------------+----------------+-----------------|
          | selfefficacy |     30.34  |        0.79    |     35.15  |        1.34    |         0.55    |
          |              |     (0.50) |       (1.28)   |     (2.23) |       (3.64)   |        (4.08)   |
          |              |       481  |         919    |       507  |        1057    |         1976    |
          +--------------------------------------------------------------------------------------------+
              The baseline means only include observations not omitted in the 1st and 2nd differences. The number of
              observations in the 1st and 2nd differences includes both baseline and follow-up observations. ***, **, and *
              indicate significance at the .01, .05, and .1 percent critical level.

          Comment


          • #6
            I am not familiar with -ieddtabb-, but from your description it appears to be a DID analysis. However, you cannot apply it to your data either. That's because with no before data, you cannot calculate a before-after difference (in either group).

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              I am not familiar with -ieddtabb-, but from your description it appears to be a DID analysis. However, you cannot apply it to your data either. That's because with no before data, you cannot calculate a before-after difference (in either group).
              Oh!, I think something is being missed here. The data is actually before-after data. The dummy variable represents the before-after (baseline-endline) indicator. Survey data was collected before the intervention (classes) was administered. The baseline (before) survey included both the intervention groups and the comparison(control) group. The endline survey was done after the intervention. The only "error" has to do with the facilities, i.e., facilities that will eventually receive treatment but have not yet". But I discovered from the didregress manual, what is called a 2x2 DID which seems to be "somewhat" compatible, though still examining it. I only have to ensure that the before-after variable (dummy) has before as 0 and after as 1 as it is already in the data.
              Thanks.

              Comment


              • #8
                I see. Well, this puts things in an entirely different light. Returning to your original question in #1, then it is a question of properly defining the variable to use where you have put -(treatment)- in the -didregress- command. Your treatment variable distinguishes the two groups: 1 = treated (eventually), 0 = never-treated. You also have a before vs after variable. You don't say what its name is or how it is coded, but let me assume that it is called pre_post and that it is coded 0 = before, and 1 = after. Then you need to calculate a new variable to use in your -didregress-:
                Code:
                gen under_treatment = treatment*pre_post
                didregress (selfefficacy i.highschatt i.ethnicity i.numpartners i.liveinpartner) ///
                (under_treatment), group(facility) time(dummy) level(95) aeq aggregate(dlang,constant)
                Added: I've just copied over the options to -didregress- that you gave. I haven't given any thought to what they do and whether those choices make sense in terms of your particular problem. I'm just narrowly focused on the specific issue raised in #1 here.
                Last edited by Clyde Schechter; 04 Mar 2023, 10:31.

                Comment


                • #9
                  Originally posted by Clyde Schechter View Post
                  I see. Well, this puts things in an entirely different light. Returning to your original question in #1, then it is a question of properly defining the variable to use where you have put -(treatment)- in the -didregress- command. Your treatment variable distinguishes the two groups: 1 = treated (eventually), 0 = never-treated. You also have a before vs after variable. You don't say what its name is or how it is coded, but let me assume that it is called pre_post and that it is coded 0 = before, and 1 = after. Then you need to calculate a new variable to use in your -didregress-:
                  Code:
                  gen under_treatment = treatment*pre_post
                  didregress (selfefficacy i.highschatt i.ethnicity i.numpartners i.liveinpartner) ///
                  (under_treatment), group(facility) time(dummy) level(95) aeq aggregate(dlang,constant)
                  Added: I've just copied over the options to -didregress- that you gave. I haven't given any thought to what they do and whether those choices make sense in terms of your particular problem. I'm just narrowly focused on the specific issue raised in #1 here.
                  Thank you very much. Your last reply did the magic.

                  In my own case,

                  Code:
                  gen under_treatment = treatment*pre_post
                  would mean

                  Code:
                  gen under_treatment = treatment*dummy
                  where treatment is the intervention(1)/comparison(0) variable
                  and the dummy is the pre-post(baseline(0)/endline(1)) variable

                  So on implementing it, the code no longer gave the collinearity error.

                  Code:
                  . gen under_treatment = treatment*dummy
                  
                  . didregress (selfefficacy i.highschatt i.ethnicity i.numpartners i.liveinpartner) /// 
                  > (under_treatment), group(facility) time(dummy) level(95) aeq
                  note: 5.ethnicity omitted because of collinearity.
                  
                  Number of groups and treatment time
                  
                  Time variable: dummy
                  Control:       under_treatment = 0
                  Treatment:     under_treatment = 1
                  -----------------------------------
                               |   Control  Treatment
                  -------------+---------------------
                  Group        |
                      facility |         4          5
                  -------------+---------------------
                  Time         |
                       Minimum |         0          1
                       Maximum |         0          1
                  -----------------------------------
                  
                  Difference-in-differences regression                       Number of obs = 173
                  Data type: Repeated cross-sectional
                  
                                                           (Std. err. adjusted for 9 clusters in facility)
                  ----------------------------------------------------------------------------------------
                                         |               Robust
                            selfefficacy | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                  -----------------------+----------------------------------------------------------------
                  ATET                   |
                         under_treatment |
                               (1 vs 0)  |  -.3968961    2.53471    -0.16   0.879    -6.241947    5.448155
                  -----------------------+----------------------------------------------------------------
                  Controls               |
                              highschatt |
                                Primary  |   .3753249   .8441032     0.44   0.668     -1.57118     2.32183
                                    JSS  |   1.295256   1.997663     0.65   0.535    -3.311362    5.901875
                                    SSS  |   4.590735   1.392573     3.30   0.011     1.379455    7.802014
                                 Higher  |   .7578937    2.36491     0.32   0.757    -4.695598    6.211385
                                         |
                               ethnicity |
                                  Hausa  |   3.953234   2.010347     1.97   0.085    -.6826354    8.589104
                                 Fulani  |   2.461051   2.375102     1.04   0.330    -3.015943    7.938046
                                 Others  |          0  (omitted)
                                         |
                             numpartners |
                    Two Live-in Partner  |   8.000055   1.899367     4.21   0.003     3.620106       12.38
                  Three Live-in Partner  |    8.61689   1.642082     5.25   0.001     4.830243    12.40354
                   Four Live-in Partner  |   7.074087   2.382874     2.97   0.018      1.57917      12.569
                                         |
                           liveinpartner |
                                     No  |    .935683   .6016205     1.56   0.158    -.4516565    2.323022
                                         |
                                   dummy |
                                Endline  |  -1.692988   1.917618    -0.88   0.403    -6.115023    2.729048
                                   _cons |   17.50397   1.992999     8.78   0.000     12.90811    22.09984
                  ----------------------------------------------------------------------------------------
                  Note: ATET estimate adjusted for covariates, group effects, and time effects.
                  Thanks a million.

                  Comment

                  Working...
                  X