Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Diff-in-Diff estimator collinearity issue

    Hi all, I am researching the causality of the plain package on the proportion of smokers with Diff-in-Diff estimation technique using Uk and Spanish data. (the UK is treatment and Spanish is control data) Due to limited data, I have data for 2009,2011,2012,2014 and 2017 for both datasets.

    To estimate using Diff-in-Diff, I have constructed,

    Code:
    reg p_csmoker time treatment pp Sex age_gr socialc hqual dnnow incomelv cigtax i.year
    However with this regression, treatment, pp, 2014.year (2014 dummy) and 2017.year are omitted due to collinearity.

    I have seen previous posts about collinearity and concluded that you have to drop variables to overcome this collinearity issue, but I cannot drop treatment and pp as it's my main independent variable.

    Here is the definition of each variable;

    p_csmoker (independent variable): a yearly measure of the proportion of smokers.

    time (dummy variable): 1 if 2017(after the plain package introduced) and 0 otherwise.

    treatment (dummy variable): 1 if UK resident and 0 if Spanish

    pp (interaction variable): time*treatment which is essentially what I am looking for.

    Sex,dnnow and i.year (control dummy): Sex, whether participants drink nowadays and a dummy for each year.

    age_gr, socialc,hqual and incomelv (categorical control variables): age group, social class, highest educational qualifications and income level

    cigtax (continuos control variable): cigarette tax (yearly measure)

    I am sorry for being bit wordy here but as I cannot post my dataset, this is best I can do...

    If there is any additional info that may require to solve, please let me know!

    Thank you in advance!
    Last edited by Fuga Iwama; 03 Feb 2019, 16:31.

  • #2
    Well, it is quite expected that you will lose a couple of year indicators in this context. The omission of the treatment variable is difficult for me to understand here, although it is of no actual consequence. The loss of pp is obviously a big problem, but again it is not clear why this is happening.

    My guess is that the loss of the treatment variable (which is really an indicator for country) is caused by the inclusion of the cigtax variable, as you can probably calculate the country from the cigtax or the cigtax and year combined. So eliminating cigtzx might bring back treatment.

    But it is not at all clear to me why pp is being dropped if it is truly the interaction of treatment and time. It suggests to me that some of your variables are incorrectly coded. But without access to a representative sample of the data, it is impossible to troubleshoot more specifically. The key thing is to discover what pp is colinear with. It shouldn't be colinear with anything--so when you find out what the colinearity comes from, you should review all those variables because there is almost certainly an error to be found. To find the culprit variables you can run
    Code:
    regress pp time treatment Sex age_gr socialc hqual dnnow incomelv cigtax i.year
    You will, within perhaps a small amount of rounding error, get an R2 of 1 with this regression and the coefficients will show you how the colinearity arises. You will then have to explore those variables to understand why this is happening and what to do about it. Perhaps, again, the combination of cigtax and year may be at fault, and removing cigtax would then likely resolve the matter.

    Comment


    • #3
      Clyde,

      First of all, thanks for the reply.

      I have tried regressing without dependent variable but the treatment variable omitted again and excluding some year dummy or cigtax did not work as well...
      I have attached the results here.
      Click image for larger version

Name:	regression.jpg
Views:	3
Size:	142.6 KB
ID:	1482245

      Click image for larger version

Name:	regression 1.jpg
Views:	1
Size:	116.9 KB
ID:	1482244





      As I am doing DiD, I cannot omit time, treatment (just country dummy) and pp, and pp and treatment kept being omitted...

      I am simply stuck in here and do not know what to do here.. please help me out..
      Attached Files
      Last edited by Fuga Iwama; 06 Feb 2019, 04:39.

      Comment


      • #4
        If you look at these results, you will clearly see that all of them say that your pp variable and your time variable are equal (the coefficient of time is 1 and all the other coefficients are, within small rounding errors, 0). Clearly that is an error in your data--the interaction should not equal the time variable; one of them is wrong. You need to go back and fix that. It is more likely that the pp variable is wrong than the time variable.

        One way of getting interaction variables correct is to note create them yourself but to use factor variable notation, so that Stata creates the interaction term for you, without errors or omissions.

        Code:
        reg p_csmoker i.time##i.treatment pp Sex age_gr socialc hqual dnnow incomelv cigtax i.year
        Of course, this will not help if the time variable itself is incorrect. So before trying this code, go back and review your data to make sure that the time variable is properly coded: 0 before treatment began, and 1 thereafter.

        Read -help fvvarlist- and the associated sections of the PDF documentation for a full explanation.




        Comment


        • #5
          Clyde,

          That is exactly what I thought and I have tried the regression with stata creating interaction term for me.

          Also, I have checked my dataset especially time, pp and treatment variables but there seems to be no error there.
          Here is dataex of my dataset.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input byte Sex float age_gr byte socialc float hqual byte(cigst1 dnnow) float(year nationality rpprice) byte incomelv float(time treatment pp p_csmoker cigtax d2009 d2011 d2012 d2014 d2017)
          1 8 3 7 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 3 3 2 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 6 2 2 2 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 7 3 7 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 6 3 3 4 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 4 2 2 4 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 6 2 1 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 4 1 1 1 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 4 3 7 1 2 2017 3 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 8 2 1 2 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 5 2 3 4 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 6 2 1 1 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 4 2 3 1 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 3 1 1 4 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 5 2 1 4 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 5 1 1 1 2 2017 3 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 8 3 1 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 7 3 7 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 7 3 1 4 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 7 2 7 1 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 6 . 7 1 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 6 3 4 2 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 5 2 2 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 6 3 4 4 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 4 3 4 4 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 5 2 3 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 7 3 4 1 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 8 2 7 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 2 . 3 4 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 2 . 3 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 3 3 3 1 1 2017 3 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 7 3 7 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 4 2 3 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 7 3 7 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 8 3 7 1 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 4 1 1 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 6 2 3 1 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 4 . 7 4 2 2017 3 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 3 3 7 1 2 2017 3 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 3 3 7 1 2 2017 3 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 3 3 2 1 2 2017 3 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 8 3 7 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 4 3 3 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 7 3 7 2 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 3 3 7 1 2 2017 3 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 8 3 7 2 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 4 2 1 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 8 3 7 2 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 8 2 7 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 8 3 7 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 4 2 3 4 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 5 2 2 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 3 3 3 2 2 2017 3 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 3 3 3 4 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 8 . 7 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 8 3 7 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 2 3 4 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 3 1 1 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 8 . 7 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 7 3 4 2 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 3 3 4 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 5 3 4 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 6 3 7 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 4 3 7 1 2 2017 3 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 6 2 7 4 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 6 3 4 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 5 3 7 4 1 2017 3 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 5 3 7 2 1 2017 3 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 5 1 4 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 4 3 2 4 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 5 3 4 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 2 3 2 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 5 2 1 2 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 2 3 3 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 5 3 2 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 5 3 3 4 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 2 . 4 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 8 3 7 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 2 3 4 4 2 2017 3 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 6 2 1 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 7 2 4 1 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 3 2 2 1 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 7 1 1 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 2 2 1 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 7 1 1 1 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 6 1 1 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 7 2 4 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 2 1 1 1 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 6 1 1 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 7 2 3 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 6 3 1 1 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 6 3 7 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 7 1 1 1 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 2 3 3 4 1 2017 3 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 6 2 4 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 6 3 4 2 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 7 3 2 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          0 6 1 2 2 1 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 8 1 7 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          1 8 2 7 1 2 2017 2 5.6 . 1 0 0 23.4 .494 0 0 0 0 1
          end
          label values Sex Sex
          label def Sex 0 "Male", modify
          label def Sex 1 "Female", modify
          label values age_gr age_gr
          label def age_gr 2 "16-24", modify
          label def age_gr 3 "25-34", modify
          label def age_gr 4 "35-44", modify
          label def age_gr 5 "45-54", modify
          label def age_gr 6 "55-64", modify
          label def age_gr 7 "65-74", modify
          label def age_gr 8 "75 over", modify
          label values socialc socialc
          label def socialc 1 "Managerial and professional occupations", modify
          label def socialc 2 "Intermediate occupations", modify
          label def socialc 3 "Routine and manual occupations", modify
          label values hqual hqual
          label def hqual 1 "Degree or equiv", modify
          label def hqual 2 "High ed below degree", modify
          label def hqual 3 "A-level equiv", modify
          label def hqual 4 "O-level equiv", modify
          label def hqual 7 "No qual", modify
          label values cigst1 cigst1
          label def cigst1 1 "Never smoked cigarettes at all", modify
          label def cigst1 2 "Used to smoke cigarettes occasionally", modify
          label def cigst1 4 "Current cigarette smoker", modify
          label values dnnow dnnow
          label def dnnow 1 "Yes", modify
          label def dnnow 2 "No", modify
          label values year year
          label def year 2017 "2017", modify
          label values nationality nationality
          label def nationality 2 "Spanish", modify
          label def nationality 3 "Other", modify
          label values incomelv incomelv
          label values time time
          label def time 1 "2017", modify
          label values treatment treatment
          label def treatment 0 "non-british", modify

          Hope this helps solving my issue.
          Last edited by Fuga Iwama; 07 Feb 2019, 06:01.

          Comment


          • #6
            Thank you. Unfortunately the subset of the data you showed as an example doesn't support a resolution to this problem. In the example shown, both treatment and time are constants. The latter is always 2017, and the former is always non-british. If this is true in your complete data set, then you do not have adequate data to do the analysis. But I suspect that is not the case. So to troubleshoot this, you need to provide a subset of the data that includes all four combinations of the possible values of treatment and time. The first thing would to determine whether pp is properly computed--the results you show in #3 strongly suggest that it is not. In any case, you don't need pp: you are better off using i.treatment##i.time, as suggested in #4.

            If your data encompasses all four combinations of treatment and time, and if you use i.treatment##i.time and still find that the interaction is being dropped, then there is some other problem, probably arising from colinearity with one or more of the covariates, that can be explored.

            Comment


            • #7
              Clyde,

              It is very unfortunate that dataex will not show my dataset. I could not create the sample dataset including all the time and nationalities, so I have created on my own in text file. Is there any chance you can have a look at it?

              Furthermore, as you mentioned, I suspected that there should be an error computing pp, treatment and time variable and thus I checked every variable but I could not find it.

              Finally, I thought of collinearity issue with more than one variables and hence did the correlation test. Treatment column came out with no results ( just filled with .) and thus I suspect the coding error but I did not find any....

              I am doing what I can, but I keep hitting the brick wall...

              Comment


              • #8
                Well, just looking at correlations is ordinarily not sufficient, but in your case it seems you hit on the problem. The fact that the correlations of treatment came out as all missing values means that your "variable" treatment is actually a constant. Clearly, that should not be the case. So you need to fix the treatment variable so that it is 1 in UK observations and 0 in the Spain observations. (If I understood #1 correctly, you have only two countries in your data, and the UK is the treatment country and Spain is the control--is this right?)

                Now, as for fixing the problem, there are two possibilities. The first possibility is that somehow you have lost the data from one of your countries and you really only have one of them. In that case you have to go back and get the data from the other country and append it to your data set.

                The other possibility is that you actually have data from both countries, but your variable treatment is incorrectly coded. In that case, you have to re-create the treatment variable to be 1 in UK observations and 0 in Spain observations.

                Comment


                • #9
                  Clyde,

                  Thank you so much for sticking with me.

                  I have looked at my browse confirmed that there are two countries and re-coded the treatment variable as UK and still yeilds the same results. Here are some of the proof (I hope). For Spanish and other=0 and Uk=1.

                  The code I used to make Uk is

                  Code:
                  gen UK=1 if nationality==1
                  replace UK=0 if nationality>=2
                  Click image for larger version

Name:	regression 4.jpg
Views:	1
Size:	300.9 KB
ID:	1482572


                  As shown previously, I thought the correlation not shown for treatment meant its constant as well. However, there clearly 2 numbers....

                  Comment


                  • #10
                    You're confusing me. You show code for a variable UK. But until now we have been talking about a variable called treatment, which appeared in your example in #5 and was constant there.

                    I remind you that the FAQ specifically asks people not to post screenshots. If I wanted to try to work with this data, there is no way to import it into Stata. In this case, I don't plan to do that anyway, but I also note that the screenshot is just barely large enough to read. Had it been any smaller it would have been equivalent to showing nothing at all. Please pay attention to the advice in the FAQ: it's there to make communication as clear and efficient as possible, so that we avoid spinning our wheels and wasting our time.

                    Looking at what I can see in your screenshot, at the moment I see only observations where UK = 1 and time == 2017, and where UK == 0 and time == other than 2017. So that's still not adequate data for a DID analysis (or really any analysis aimed at this problem). You must have all four combinations: UK = 0, time == 2017; UK = 1, time = 2017; UK = 0, time = other than 2017; and UK = 1, time = 2017. Do you? If so, the following should run with no difficulties:

                    Code:
                    regress p_smoker i.UK##i.time
                    Once you get that going, I suggest you expand your analysis by adding in your covariates one at a time, re-running the regression at each step. If there is a covariate that is colinear with the interaction, the regression will either omit that new variable or will omit the interaction once that variable is added to the model, and you will have identified the source of your problem.

                    Comment


                    • #11
                      Clyde,

                      Thanks for everything.

                      I was able to start with the very basic model of DiD and able to figure out the issue.

                      Comment

                      Working...
                      X