Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Failing Common Trend Assumption testing

    Dear Statslisters

    I'm currently doing a Diff-in-Diff analysis if the access to micro credit programs helps children of those families spend more time in school each day.

    I define a treatment variable as the participation in a credit program, which is called IKP, and I'm interested in those who get treated in the fourth of my four rounds.
    So my treatment variable is defined as:

    gen treatment=ikp==1 & round==4

    I'm able to run an OLS and some FE regression with this without any problems.
    Now I want to test the common trend assumption and see if those, who get treated between round 3 and 4 and those, who do not, have the same prior development.
    So I try to look at the means of hschool (the variable I'm targeting):


    egen school11=mean(hschool) if round==1 & treatment==1
    egen school12=mean(hschool) if round==2 & treatment==1
    egen school13=mean(hschool) if round==3 & treatment==1

    egen school01=mean(hschool) if round==1 & treatment==0
    egen school02=mean(hschool) if round==2 & treatment==0
    egen school03=mean(hschool) if round==3 & treatment==0


    Of course for all the generated variables in the first column Stata tells me there are "no observations", as it is not possible for an observation to be in round 4 and round 1, 2 or 3.
    How do I fix that?

    Thank you so much for your help!
    Best
    Arto



  • #2
    It would have been helpful had you shown some example data. Without it, I'm left to guess at what the data looks like, so the code I'm suggesting here conveys the logic but might not actually work with your data.

    Code:
    forvalues i = 1/3 {
        by child_id, sort: egen treated_round_`i' = max(round == `i' & ikp == 1)
        forvalues j = 1/3 {
            egen school`i'`j' = mean(cond( round == `j' & treated_round_`i', hschool, .))
        }
    }
    This code assumes there is a variable, child_id, that identifies the different children in your study.

    This code is not tested, so it may contains typos, unbalanced quotes or parentheses, etc.

    Also, I'm not completely sure I have understood what you want these school* variables to signify, but I think I have.

    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      ... why does it require the max command in the generation of the first variable
      So, the purpose of this code is to create a variable that takes on the value 1 in every observation of a given child based on a condition (ikp == 1 in a given round) that only happens in some (even just one) of those observations. The syntax:

      Code:
      by child_id: egen it_happened = max(condition)
      tells Stata to examine all of the observations for a given child and evaluate the condition in each of the observations. Now, if you have properly coded condition, it will evaluate to 0 in those observations where the condition is not met, and 1 in those observations where it is met. So if there are no such observations, the maximum of those values of condition will be 0, but if there is at least one such observation, the maximum of those value will be 1. This is a frequently-used Stata construction. By the way, -max()- is a function, not a command.

      Code:
      twoway (scatter school control round if i==3 & j<=3)
      i ambiguous abbreviation
      r(111);
      I'm going to guess that you tried to insert this command inside loops indexed by i and j (perhaps you added this to the loops in #2. This error is occurring because, in order to refer to the index of the loop, you need to say `i' and `j'. Because you used i without the surrounding macro quotes, Stata thought you wanted it to look for a variable named i. And also because Stata allows you to abbreviate variable names, when there was no variable named exactly i, it thought that perhaps you meant some variable whose name begins with i. So it looked for that, and discovered there was more than one--and it had no way to know which to use. So it told you that i is an ambiguous abbreviation. Put the quotes in there and this problem will go away. Be attentive to the distinction between the left quote (`) and the right quote ('). You must use each in the appropriate place or you will get other errors.



      Comment


      • #4
        Just a pedantic note. Everyone who tells you that you can test the common trend assumption is lying to you. The CT assumption is that, without the treatment, the treated would have behaved the same as the untreated at the moment of treatment. There is no requirement that they also behave the same in the pretreatment period.

        Of course, one can make the assumption that similarity in the pretreatment period implies similarity during the treatment period, but this is once again an assumption. Whether it's stronger or weaker than the common trend assumption depends on the context of your problem.

        Comment


        • #5
          You need to show the complete context of this code, not just this one line. I had suggested this correction to your code on the assumption, which I stated when I presented it, that this code is located inside the -forvalues i = ...- and -forvalues j = ...- loops. The error message you are getting is what I would expect to see if `i' is not defined at the time this command is being executed. That, in turn, means that it is outside those loops. So either that code needs to be put inside those loops, or there needs to be some other definition of i and j before this code. So please post the code in its full context.

          Comment


          • #6
            OK. I see what you've done. I'm still not entirely clear on what you want to do.

            As I suspected, you are trying to refer to `i' and `j' in your -twoway- command, but that command is outside the loops that define `i' and `j', so Stata sees -twoway (scatter school control round if == 3 & <= 3)- because `i' and `j' are undefined. The solution is either to create a new set of -forvalues- loops around the -twoway- command, or move the -twoway- command inside the existing final loop, just under the -egen control`i'`j'...- command.

            But there is another problem you will encounter then. Your -twoway- command refers to a variable, control, that does not exist: you have created a number of control`i'`j' variables, but there is no single variable called control. Maybe you mean control`i'`j' here; if so, that's what you need to say. It is also unclear to me whether the variable school in the -twoway- command is what you really want, or whether you mean to use school`i'`j'.

            Comment


            • #7
              I don't think we can make any progress here without an example of your data, accompanied by an explanation of what each variable in it means.

              Comment


              • #8
                Code:
                 twoway (scatter school`i'`j' control`i'`j' round if `i'==4 & `j'<=4) ==4 invalid name r(198);
                Where do you define i and j? I only see them as iterators in the loops above, but these don't persist beyond the loop.

                Comment


                • #9
                  The link you give for your data is not helpful. It leads to a bunch of PDF files. You can't extract the data from those and import it into Stata. Your -dataex- examples show only one variable at a time. None of this gives anybody something they can work with. What is needed is a -dataex- example that includes all of the relevant variables.

                  That said, it seems to me that you are conceptualizing this problem incorrectly. You don't need whole series of control* variables here, nor a whole series of hschool* variables. If you have a classic DID (i.e. the intervention begins precisely in round 4 for everybody who gets it), then it sounds like you want this:

                  Code:
                  by childid, sort: egen treat_group = max(ikp)
                  gen byte post = round >= 4
                  //  CLASSIC DID ANALYSIS
                  xtreg hschool i.treat_group##i.post, fe
                  
                  // MAKE A PLOT
                  margins treat_group, at(round = (1 2 3 4)) noestimcheck
                  marginsplot
                  If your intervention begins at different times for different children, then it's a generalized DID analysis that looks like:

                  Code:
                  by childid, sort: egen treat_group = max(ikp)
                  // GENERALIZED DID ANALYSIS
                  xtreg hschool i.ikp i.round, fe
                  
                  // MAKE A PRE-TREATMENT PLOT
                  keep if ikp == 0
                  collapse (mean) hschool, by(treat_group round)
                  reshape wide hschool, i(round) j(treat_group)
                  graph twoway line hschool* round, sort


                  Comment


                  • #10
                    Originally posted by Arto Arman

                    Why don't they exist beyond the loop? How do I make them stay as I defined them?
                    I tried to put the -twoway- command inside of the loop to fix this, but as you can see from yesterday's post the output is not what I wanted.
                    Why would you want them to stay? In defining the loop you defined the last value of both i and j to be 4, which is the value they would have if they persisted. There are tricks to make them persistent (e.g. putting global i = `i' in the loop and then referring to the global i through $i) but I don't think I've ever used that, and I've been using Stata for a very long time (though peanuts compared to some others here). Either way, I recommend looking at Clyde's answer, which at least conceptually to me makes more sense than continuing along this line.

                    Comment


                    • #11
                      To clarify about the data example:

                      1. The issue was not the number of observations but the variables. For example, your problem involves a key variable round, but round did not appear in any of your examples. Also you posted two examples, each containing only a single variable--you can't do any modeling with that.

                      2. It is true that by default -dataex- only shows the first 100 observations. But it has a -count()- option that lets you show whatever number of observations you choose. That's not relevant here, as the issue was not with the number of observations anyway.

                      3. Nobody wants to see your entire data set. Just enough of an example that you could try out code to solve your problem in it. So it needs to be large enough (both in terms of number of observations and inclusion of all the relevant variables). But no larger. And attachments are, indeed, discouraged here.

                      You continue to be unclear about the definition of your treatment group:
                      On one hand I need two groups:
                      People who only had access to credit in period 4 and people who had never access to credit.
                      On the other hand I need:
                      People who had access to credit in period 4 and people who did not have acces to credit in period 4.
                      I have no idea what to make of that. As you seem to recognize, these two classifications are different. But you do not indicate which is the one you want to consider "treatment" in your analysis.

                      For "On one hand I need two groups:
                      People who only had access to credit in period 4 and people who had never access to credit." you would do this:
                      Code:
                      by childid, sort: egen credit_period_4 = max(cond(round == 4, ikp, .))
                      by childid: egen credit_other_periods = min(cond(round != 4, ikp, .))
                      gen byte treatment = .
                      replace treatment = 1 if credit_period_4 & !credit_other_periods
                      replace treatment = 0 if !credit_period_4 & !credit_other_periods
                      For "On the other hand I need:
                      People who had access to credit in period 4 and people who did not have acces to credit in period 4." you would do this:
                      Code:
                      by childid, sort: egen treatment = max(cond(round == 4, ikp, .))

                      Comment


                      • #12
                        Please reread post #15 in this thread. You will never get the graph you want with these school`i'`j' variables. You are thinking about the problem in the wrong way. The code in #15 shows you how to do it. The loops you show in the bottom code block of #19 are not necessary, and the variables they create are not useful.

                        Also, I still do not understand why you are creating -datex- with one variable at a time. What would be helpful is a single -dataex- output that includes all of the relevant variables.

                        Comment

                        Working...
                        X