Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with fixed effects regression - Urgent help needed

    Hi to the community of Statalist.

    I am currently researching for a politics paper. Therefore I use the following very large dataset https://www.v-dem.net/data/v-party-dataset/ in Stata 15.

    I try to do a fixed effects regression but my regression either turns out to have no objects (r2000) in the xtreg command or it fails because of the "repeated time values within panel" (r451).

    This is where I am so far:

    * EU after 2004
    keep if country_name == "Austria" | country_name=="Belgium" | country_name=="Bulgaria" | country_name=="Croatia" | country_name=="Cyprus" | country_name=="Czech Republic" | country_name=="Denmark" | country_name=="Estonia" | country_name=="Finland" | country_name=="France" | country_name=="Germany" | country_name=="Greece" | country_name=="Hungary" | country_name=="Ireland" | country_name=="Italy" | country_name=="Latvia" | country_name=="Lithuania" | country_name=="Luxembourg" | country_name=="Malta" | country_name=="Netherlands" | country_name=="Poland" | country_name=="Portugal" | country_name=="Romania" | country_name=="Slovakia" | country_name=="Slovenia" | country_name=="Spain" | country_name=="Sweden"
    keep if year >= 2004


    * Generate the independent variable - liberal governing party
    gen libgov = 0
    replace libgov = 1 if v2pagovsup <= 2 & ep_type_populism >= 3
    bysort country_id year: replace libgov = libgov[_n-1] if libgov == .
    label var libgov "Liberal Governing Party".

    * Generate the dependent variable - right wing parties
    gen rightwing = 0
    replace rightwing = 1 if v2pagovsup >= 2 & ep_type_populism <= 3
    bysort country_id year: replace rightwing = rightwing[_n-1] if rightwing == .
    label var rightwing "Right Wing Opposition Parties"

    * Sort and structure the dataset as a panel data
    sort country_id year
    drop if year==year[_n-1]
    xtset country_id year

    * Lag the independent variable by one year for each European state individually
    by country_id: gen libgov_lag = L1.libgov
    label var libgov_lag "Lagged Liberal Governing Party"

    * Control variables - countries, economic politics and societal politics
    gen country_controls = country_id
    gen econ_politics = v2pariglef
    gen soc_politics = ep_v6_lib_cons
    label var country_controls "Country Controls"
    label var econ_politics "Economic Politics"
    label var soc_politics "Societal Politics"

    * Check for collinearity using the vif command
    * vif libgov_lag country_controls econ_politics soc_politics

    * Run the fixed effects regression
    xtreg rightwing libgov_lag country_controls econ_politics soc_politics, fe

    The marked drop command seems to ruin the regression.
    Help is much appreciated! Thank you in advance!

  • #2
    The problem is that you do not understand this data set properly. In its current form, it is not suitable for the type of analysis you are trying to do.

    The problem is that the data set is not a data set of countries and years. It is a dataset of countries, political parties, and years. So trying to analyze variables like libgov and rightwing that are defined at the country-year level requires reducing the data set to a single observation for each country and each year. Your code shows some awareness of this issue, with the command that you have put into boldface. But you created the libgov and rightwing variables in such a way that they are defined at the party level. And then when you reach the boldfaced command that reduces to one observation per country-year, you are, unfortunately, selecting one observation at random (and, consequently, a random guess about these variables). Now, from the code you wrote for creating libgov and rightwing, I can make a pretty good guess what you actually need.

    But more problematic are the variables econ_politics and soc_politics. These variables also are defined, and vary at, the party level, and I do not know how you want to combine these to become, somehow, country level variables.

    The variable country_controls is of no use at all. It is just a clone of country_id, which you are using as your fixed effect. So it will just be colinear with that and will be dropped.

    You need to reduce the data set to one observation per country---but in doing that you must aggregate up your existing variables to one consistent value per country-year. For the variables libgov and rightwing, I think what you want is:
    Code:
    by country_id year, sort: egen libgov = max(v2pagovsup <= 2 & ep_type_populism >= 3)
    label var libgov "Liberal Governing Party"
    
    by country_id year, sort: egen rightwing = max(v2pagovsup >= 2 & ep_type_populism <= 3)
    label var rightwing "Right Wing Opposition Parties"
    There is another problem in this data. L1.libgov refers to the value of libgov in the immediately preceding calendar year. But in nearly all of your data, there is a gap between consecutive observations. Only occasionally is the next observation for a country that for the next year. Gaps are typically 4 years, but sometimes less and sometimes more. You cannot define a lag with this kind of data. Perhaps what you want, instead is:
    Code:
    by country_id (year), sort: gen libgov_prior = libgov[_n-1]
    This will give you the value of libgov in the latest preceding year for which the data has an observation, which will, typically, be several years earlier, not the year before. I don't know if that's useful for your purposes. But there's no other way to do it with this data.






    Comment


    • #3
      Thank you so much. That helps a lot!
      I had already suspected that I must be misunderstanding the dataset. Now the FE-regression works .
      Is there any way you could think of to improve the reliability of the regression by omitting the elimination of that many cases by "drop"?
      I'm also very curious how I could introduce control variables and a test for collinearity (the "estat vif" command just produces the error r321).

      Thanks again!

      Comment


      • #4
        Is there any way you could think of to improve the reliability of the regression by omitting the elimination of that many cases by "drop"?
        No. Your analysis is at the country-year level and the observations you are dropping are, for that purpose, completely redundant. Including them would not enhance the reliability of the regression. It would just make its results appear misleadingly strong.

        I'm also very curious how I could introduce control variables and a test for collinearity (the "estat vif" command just produces the error r321
        -estat vif- only runs after a -regress- command. In any case, even if you got it to run, it's just a waste of time. Multicolinearity is a bogus issue over which far too much ink and too many pixels have been spilled. When multicolinearity exists, it is only a problem if it involves the key variables whose coefficients are the focus of your research. If it just involves covariates ("control" variables) it makes no difference at all. If it does involve the key variables, in your case the previous value of libgov, it seems, then you need to simply look at the confidence interval around that coefficient. If that confidence interval is narrow enough that the answer to your research question is the same regardless of where in that interval the "truth" lies, then you have no problem: your question is answered. If the confidence interval extends into territories where your research question would get different answers, then you have a problem. But in that case, your study is inconclusive, and the only thing you can do about it is to get a new, better, and probably much larger data set.

        Arthur Goldberger's textbook of econometrics has a nice chapter about why multicolinearity should be relegated to the dustbin of history. You can see a shorter version of it by Brian Caplan at https://www.econlib.org/archives/200...ollineari.html.

        Comment


        • #5
          I quite agree with your position on collinearity! Thanks for the article: I will certainly send it around when VIF comes up next time.
          I'm currently trying to combine the generated variable of "Right Wing Opposition" with one for radicalization "v2paviol" but I don't seem to figure it out. Can you imagine a solution?
          by country_id year, sort: egen rightwing = max(v2pagovsup >= 2 & ep_type_populism <= 3) label var rightwing "Right Wing Opposition Parties" by country_id year, sort: egen rightwing_rad = rightwing*v2paviol -->? And then again I am just asking myself if you would say that instead of looking deeper into the FE-regression there could a chance to pursue the case by implementing interaction effects?

          Comment


          • #6
            At this point you are really asking me for advice on the substance of your work in political science. While I have no shortage of opinions on all sorts of political matters, they are the uninformed ideas of a layperson. I don't think I am qualified to give you guidance on how to define constructs from this kind of data.

            On the general question of whether adding interaction effects would improve your model, there we have a statistical question I feel comfortable answering. Your data set is large enough to handle a couple of interaction terms, and doing that usually does improve models. In the real world, the presence of interaction is common, and models that do not reflect that fact are basically mis-specified. Unfortunately, adding interaction terms severely degrades the statistical power of analyses, so doing this is only sensible in a data set large enough to accommodate it.

            Comment


            • #7
              Your help is much appreciated. I will look into it. Thank you!

              Comment

              Working...
              X