Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel analysis over 4-year average data (fe, re, hausman)

    Dear Statalist

    I am new to STATA and looking for your help regarding panel analysis.

    I have a balanced panel data from 2004 to 2015 (12 years, no missing value) and my analysis equation is as below:

    Q = a*GDP_int + bX1 + cX2 + dX3 + e

    Where
    Q: quality of life (Human Development Index) of country i in year t
    GDP_int: GDP per capita in year 2004
    X1, X2, X3: explanatory variables that affect Q

    My first question is how to conduct a panel analysis by grouping 3 periods.
    I want to conduct panel analysis twice with this same model above:
    1) using the annual data
    2) using average data of 4 year average data over 3 period (2004-2007, 2008-2011, and 2012-2015)

    For doing 1)
    I wrote a command as below.

    Code:
    xtset id year
           panel variable:  id (strongly balanced)
            time variable:  year, 2004 to 2015
                    delta:  1 unit
    
    xtreg Q GDP_int X1 X2 X3, fe
    xtreg Q GDP_int X1 X2 X3, re
    For doing 2)
    I created "period" variable and average variables for all explanatory variables by the period.
    And tried to re- define the data set as a panel data which has a time variable "period" as below.

    Code:
    gen period=1 // year 2004-2007
    replace period=2 if year >=2008 & year <2012
    replace period=3 if year >=2012
    
    bysort code period : egen aQ = mean(Q)
    bysort code period : egen aX1 = mean(X1)
    bysort code period : egen aX2 = mean(X2)
    bysort code period : egen aX3 = mean(X3)
    
    xtset id period
    But then I got an error message "repeated time values within panel".
    so I just ran the panel analysis as below, without defining the data set again.

    Code:
    xtreg aQ GDP_int aX1 aX2 aX3, fe
    xtreg aQ GDP_int aX1 aX2 aX3, re
    I want to know if it is okay to run the analysis twice like this, without re-defining the data set with "period".
    The output using the average variables showed stronger significance, and I am worried if this is because of duplication.
    (For example, for id 1 in period 1, the value of aX1 is same between 2004 and 2007)

    My second problem is that when I conduct the 1) and 2) analysis above with fixed effect option,
    STATA returns the a message saying "note: GDP_int omitted because of collinearity".
    It does not happen with the random effect model.

    I understand it is because "GDP_int" is a non-time-varying variable.
    My question is whether it is okay to run hausman test in this situation,
    where the coffecient for "GDP_int" is zero (omitted) in fixed effect model, but measured in random effect model.
    If it is not okay, what would be the solution?
    Do I have to drop this term in my equation if I want to use fixed effect model?

    It will be a great help if someone can answer any of these questions.
    Thank you very much.

  • #2
    I am confused about the panel structure in your data. Your xtset command suggests that the panels are identified by the variable id. But when you computer your period-average values you use a different variable called code. What's this about?

    To do your analysis based on period-average data, you need to reduce the data set to a single observation per period. So, after you create the period variable, instead of the block of -egen- statements you used, do this:

    Code:
    collapse (mean) Q X1 X2 X3, by(code period)
    xtset code period
    Stata will not complain now and your data will be properly structured for
    Code:
    xtreg Q X1 X2 X3
    with either fe or re.

    Regarding your second question, the issue is whether or not you need an estimate of the effect of GDP_int in order to accomplish your research goals. If you do, then it is simply not possible to do so in a fixed-effects model and no Hausman test can prevail over linear algebra. If you do not need an estimate of the effect of GDP_int, then you should simply drop it from your model altogether. In that situation, the Hausman test on the model without GDP_int might be used to guide your choice between -fe- and -re-.

    Comment


    • #3
      Dear Clyde Schechter

      Thank you for your clear answer.
      Regarding my code, variable "id" is same as variable "code". Because "code" is a string variable I created "id" which is not string.
      Sorry for confusing you and thank you for your help again! :-)

      Comment


      • #4
        OK. In that case, the -xtset- command will have to use id, not code, as the panel variable, because -xtset- does not allow string variables.

        Comment

        Working...
        X