Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • within-effects, between-effects, and residual regressions

    Hello everybody,

    my problem is twofold: I have a huge dataset, hourly weather and hourly emission data for about 15 years, which is about 10GB.
    Now, I want to run a regression of emissions on weather data and other variables, some of which are varying over time (not every hour but every year) and some of which are not varying at all. Of the varying variables, one is relatively stable over the years but varying much between observations. Thus, I wanted to run a mixed effects model, which captures the fixed effects but also allows for the analysis of between-effects. Now the questions are: can I take residuals from a first-stage regression and run those residuals in a second stage on other variables, therefore reducing the dataset a lot in size? And, what is the best way to run such a hybrid model?

    Regarding the model variant:
    I've read of two ways to do so: Allison (2009) suggests to take the unit-specific means (xbar = 1/n_i * SUM_T (x_it)) and the unit-specific deviation from the mean (x*_it = x_it - xbar) and to run a random effects model. The deviations from the means, as far as I understood the approach, then gives the within-effect of the variable while the unit-specific mean give us the between-effect. Furthermore, one is able to include non-varying variables and to assess their effects. (I know that you can also use the mixed-command in Stata, but I do not fully understand it so far and how it gets me comparable results to the Allison-approach)
    The problem hereby is, that this would be very tedious to do and takes a very long time to run because of the many variables, which are part of the right-hand side of the regression (all the weather variables, some of which also squared and lagged and interacted and so on).

    Henderson (1996) took another variant to assess the effects of non-varying variables. He runs a fixed effects model, and then takes the residuals of this model and runs an OLS regression on all of his non-varying variables. His normal fixed effects model, which he uses to assess the effects of time-variant determinants is 03_it = C + b*X_it + y*Z_i + f_i + e_it, while his model for assessing the role of non-varying variables is: RES_i = O3bar_i - b^hat * Xbar_i, where the RES_i are average residuals for each unit in his dataset from the fixed effects estimation of the first equation, and the bar-variables are the time-average for each unit. (here I do not fully understand how he gets b^hat then, because each year normally should have another prediction if he takes it form the first regression?!)

    So, I asked myself if I could regress hourly emissions on hourly weather, then take the residuals of that regression and then run a hybrid model like the one suggested by Allison? I've now played around with the data a little bit and was a little confused, because when I take the residuals of a regression of e.g. y = c + b*x1 + e with the command "predict resid, residuals" and regress the residual like this: resid = c + a*x2 + u, this gives very different results than running y = c + b*x1 + a*x2 + e. So, how can I interpret the coefficient for a from the full model compared to the coefficient from the residual model?

    Can somebody give me some hints how to convincingly run such a model?

    Thank's a lot in advance.



    Literature:
    Allison, Paul D. (2009) - Fixed Effects Regression Models
    Henderson, Vernon (1996) - Effects of Air Quality Regulation

  • #2
    Okay, to put it a little differently: Is someone aware of an approach, that provides me with within- as well as between-effects of a variable that differs on average ten times as much between units than within units, assuming that fixed effects (let's name them alpha_i) are potentially correlated with my explanatory variables, which deters me from using a pure random effects model? And, how can I implement such a model in Stata then?

    Comment


    • #3
      You didn't get a quick answer. You'll increase your chances of a quick answer by providing some Stata code in code delimiters, Stata output,and (often) sample data using dataex. However, since your problem is around data size, I understand the problem with sample data.

      First, I'd try to ignore processing time. Especially with the multi-processor versions and fast computers, Stata is pretty fast. I'd make sure the program works, then let it run over the weekend or whatever. xtreg is the obvious first choice. If you want both within and between and don't want to impose the orthogonality assumptions of xtreg, re, then I'd look at mundalk estimators. There are also some other correlated error estimators, but I'm not up on them.

      Comment

      Working...
      X