Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data, Stata omits i.Year dummies due to collinearity

    Hi all,

    I am fairly new to Stata and I ran into a problem with my panel data set. My data set consists of multiple countries with each including the same 7 industries over a time period of 2010-2017. My supervisor advised me to use time dummies so I did using the i.Year function. Now my problem, when I run the code [by Country, sort : xtreg BERD FDIinflow GDPPCG Enterprises Turnover Educationlvl ULC Inflation i.Year, fe] I get the note that Stata omits variable i.2015 or other i.Years varying across countries due to collinearity. I don't get why Stata does this and if this affects the validity of my database. If i run xtreg and don't do it by country, so the whole dataset, I get an r² that is almost zero.

    Thanks in advance and my apologies if I didn't make things clear or let me know if i should include any attachments!

    Click image for larger version

Name:	Example collinearity.png
Views:	1
Size:	143.2 KB
ID:	1620320

    (Using Stata 16.0)


  • #2
    Some of your variables, e.g., inflation will be collinear with the year indicators. If you have a panel of firms, such variables do not vary across firms in a given year. If these variables are just control variables, then there is nothing to worry about. However, if they are your independent variables of interest, you have a problem as you cannot identify their coefficients in the presence of year indicators. I would switch to reghdfe from SSC to easily identify the collinear variables.

    Code:
    ssc install reghdfe, replace
    ssc install ftools, replace
    reghdfe BERD FDIinflow GDPPCG Enterprises Turnover Educationlvl ULC Inflation, absorb(countryind Year)
    Now my problem, when I run the code [by Country, sort : xtreg BERD FDIinflow GDPPCG Enterprises Turnover Educationlvl ULC Inflation i.Year, fe] I get the note that Stata omits variable i.2015 or other i.Years varying across countries due to collinearity. I don't get why Stata does this and if this affects the validity of my database.
    Reading #1 again, you are creating the problem by running your regressions by country. You have cross-country panel data and your advisor probably wants a cross-country analysis. Do not bother with the R-squared (which is computed differently in xtreg compared to OLS [LSDV]), just specify your model, run it and interpret the results.
    Last edited by Andrew Musau; 23 Jul 2021, 08:50.

    Comment


    • #3
      Andrew Musau
      You are indeed correct that inflation is used as a control variable. Thanks a lot for your helpful insight! I really appreciate your help and the solution was very usefull!
      In attachment, you can see the results that I have now thanks to your solution! Just to be clear, all these coefficients are meant to be interpreted as across countries? If I want to run xtreg just for one country, what would be the best way to approach that?
      Click image for larger version

Name:	Results solution.png
Views:	1
Size:	123.7 KB
ID:	1620329

      Last edited by Sander Wyckmans; 23 Jul 2021, 10:37.

      Comment


      • #4
        For a particular country, use the -if- qualifier

        Code:
        reghdfe BERD FDIinflow GDPPCG Enterprises Turnover Educationlvl ULC Inflation if country="UK", absorb(countryind Year)
        If you are doing this for all countries:

        Code:
        bys country: reghdfe BERD FDIinflow GDPPCG Enterprises Turnover Educationlvl ULC Inflation, absorb(countryind Year)
        or

        Code:
        levelsof country, local(countries)
        foreach country of local countries{
            di "Regression for `country'"
            reghdfe BERD FDIinflow GDPPCG Enterprises Turnover Educationlvl ULC Inflation if country="`country'", absorb(countryind Year)
        }
        where I assume that "country" is a string variable above. If country is a numerical variable

        Code:
        levelsof country, local(countries)
        foreach country of local countries{
            di "Regression for country code `country'"
            reghdfe BERD FDIinflow GDPPCG Enterprises Turnover Educationlvl ULC Inflation if country=`country', absorb(countryind Year)
        }
        The issue with running regressions by country is that variables such as inflation which do not vary across industries within a country at a point in time will drop out in the presence of year indicators. Otherwise, you should cluster your standard errors in the cross-country regression. You do not appear to have enough clusters when running the regressions by country.

        Code:
        reghdfe BERD FDIinflow GDPPCG Enterprises Turnover Educationlvl ULC Inflation, absorb(countryind Year) cluster(countryind)

        Comment

        Working...
        X