Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Correlation in Panel Data Regressions

    Hi,

    I have an unbalanced panel dataset with one dependent variable and 15 independent variables spread across 700 cross sections and 16 years. As suggested by Hausman Test, I am using fixed effects model. Following are my queries:

    1. Should I calculate a correlation matrix of independent variables? Given that STATA simply calculates correlation of independent variables by ignoring cross-sectional and time dimensions, what is the relevance/interpretation of such correlation?
    2. Should we worry about multi-collinearity in panel data regressions? If yes, how do we test for it and what are the potential solutions?
    3. I have a few cross-sectional invariant variables in my dataset (like interest rate, inflation etc.). Since they represent a time-series, should I calculate and report a correlation matrix for them?

    Thanks a lot!!


  • #2
    Prateek:
    1. & 2. a correlation matrix -estat vce, corr- might be useful to investigate quasi-extreme multicollineraity. Please note that, as often adised by Clyde Schechter on this forum, the best way to investigate quasi-extreme multicollinearity is to take a look at confidence intervals; if they're suspiciously wide, quasi-extreme multicollinearity might be an issue.
    See also http://www.hup.harvard.edu/catalog.p...ontent=reviews, which devotes a chapter to the (oversold) issue of multicollinearity.
    A high Rsq-within with high p-value for most of your predictors might also be a clue of quasi-extreme multicollinearity.
    3. I do not see the reason why you should report a correlation matrix for them.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Alright. Actually, I intend to show a correlation of macroeconomic variables for the following two reasons:

      1. Since macroeconomic variables only vary across time (and not cross-sections), a meaningful interpretation of their correlation matrix is possible.
      2. Some of these macroeconomic variables are my variables of interest and since they are highly correlated, they perhaps explain the same variation in the dependent variable. I would like to avoid inclusion of highly correlated macroeconomic variables together in my model as it leads to insignificant coefficients for many of them. In order to include only those macroeconomic variables which explain unique variation in dependent variables, I would like to report a correlation table and identify those which should not be introduced together.

      Thanks!

      Comment


      • #4
        Prateek:
        1. I'm surely missing out on something, but I fail to get what appears the be your substantive statement here: if I consider the GDP per capita of Italy and, say, The Netherlands they are by far different (hence, they vary between countries and, within each country, they probably vary over time). That said, in my experience a meaningful correlation matrix for panel data are difficult to obtain, as they are a mix of within and between variance.
        2. I agree with your second point (given the abovementioned caveat about -estat vce, corr- matrix with panel data. I would also add that the best way to avoid highly correlated predictors is a good knowledge of the data generating process.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Now I understand the missing link in our conversation, Sir. I forgot to mention (apologies for the same) that my cross-sectional units are companies in the same country, India. Hence, macroeconomic variables shall remain the same for cross-sectional units. I completely appreciate your argument that these macroeconomic variables do vary across countries.

          Further, please help me understand how "-estat vce, corr- matrix" is calculating correlations among variables in a panel setting. In my humble opinion, it is not possible to obtain a meaningful correlation among variables in a panel setting without controlling for one dimension (cross-section or time). This is because either we can calculate correlation across time (by averaging out cross sections) or vice-versa.

          Comment


          • #6
            Prateek:
            thanks for clarifications: now everything makes sense.
            I've never challenged myself in trying to understand in full at what extent -estat vce, corr- is informative: usually, I use this postestimation command to get a rough idea about quasi-extrene multicolinearity/"weird" coefficients. But my opinion is that, being a mix of within and between correlation, it cannot provide fine-grained information.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment

            Working...
            X