Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    So can you comment on the validity of the specification of a fixed effect model with the specification y= b0 +b1(time variant var) + b2(time variant var * time invariant var) + e?

    Comment


    • #17
      It's absolutely fine.

      (Actually, the fixed effects model also incorporates a term, u, representing the panel-specific fixed effect. But, at least in Stata, you don't have to explicitly specify it in the model: it comes with the use of the -xt- command.)

      Again, in Stata, I would still code this as -time_invariant_var##time_variant_var- and let Stata both create and then omit the term for time_invariant_var. First, it's just easier: you don't have to think about it--you just code the way you would in any other regression model. Also, and really more important, it provides a built-in reality check for you. If you think your variable is time invariant but, say, due to a data error, or due to a misunderstanding on your part, it isn't, then Stata will not omit it and when you see it in the output you'll know that something is amiss. And if you think your variable is not time invariant and, again either due to misunderstanding or data error it is, Stata will surprise you by omitting it and, again, reviewing the output your attention will be called to the problem. [This is one of my general principles of programming: when possible, write code so that it gracefully handles foreseeable exceptions but clearly tells you about any unexpected exceptions.]


      Comment


      • #18


        Again, in Stata, I would still code this as -time_invariant_var##time_variant_var- and let Stata both create and then omit the term for time_invariant_var. First, it's just easier: you don't have to think about it--you just code the way you would in any other regression model. Also, and really more important, it provides a built-in reality check for you. If you think your variable is time invariant but, say, due to a data error, or due to a misunderstanding on your part, it isn't, then Stata will not omit it and when you see it in the output you'll know that something is amiss. And if you think your variable is not time invariant and, again either due to misunderstanding or data error it is, Stata will surprise you by omitting it and, again, reviewing the output your attention will be called to the problem. [This is one of my general principles of programming: when possible, write code so that it gracefully handles foreseeable exceptions but clearly tells you about any unexpected exceptions.]


        [/QUOTE]

        can you clarify the coding scheme you described? I've been generating interaction terms manually with generate interaction_var = time_variant_var*time_invariant_var, and then entering interaction_var into the regression command.

        Comment


        • #19
          With modern Stata it is unnecessary to generate your own interaction terms like that: factor-variable notation will do it for you, correctly, and you then can avail yourself of the capabilities of -margins- following your regression command. (-margins is a very powerful, flexible tool that enables you to calculate predicted expected values and marginal effects from your model. It is particularly helpful with models containing interaction terms.) So the coding would be like the following schema:

          Code:
          regression_command dependent_variable var1##var2 other_covariates
          
          // OR MORE SPECIFICALLY IN YOUR CASE
          xtreg outcome_var time_variant_var##time_invariant_var other_covariates, fe
          Stata will expand the ## term into the separate main terms and an interaction term (which it creates for you as a "virtual" variable). In this instance the time_invariant_var effect is colinear with the fixed effects so it will show up marked as omitted in the coefficient table.

          Now, actually it makes a difference whether these variables are discrete or continuous For discrete variables, they should be prefixed with i., and for continuous variables the prefix c. is needed. Omitting the i. will do no harm (Stata assumes variables in interaction terms are discrete if no prefix is specified), but omitting the c. will cause Stata to erroneously treat the continuous variable as discrete. I generally code the i. anyway just for clarity. So, i.var1##c.var2 or the like.

          Factor-variable notation and -margins- were introduced to Stata fairly recently (I think it was version 12?) and have made it so much easier to estimate and interpret models with interaction terms. Do read -help fvvarlist- and the associated chapter in the online manual for details about factor-variable notation. As for -margins-, the online manual section is quite extensive and full of good examples, but it's a bit of a heavy lift. I think an easier introduction to the basics of -margins- is in Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf. Once you understand that PDF, the manual chapter will be able to enhance your understanding of the more exotic and subtle capabilities -margins- offers.

          Comment


          • #20
            Sorry to return to such an old thread, but daniel klein when you describe the "Hybrid-model" as a means of including time-invariant predictors in a model, could this be applied to a situation where a statistical test (be it Hausman or Mundlak) has suggested that a fixed effects estimator should be used but you do not wish to exclude important time-invariant predictors in your model?

            In my own analysis I have a dependent variable (health behaviors) and a time-variant independent variable (unemployment) I wish to report the coefficient on unemployment in a model which controls for theoretically relevant time-invariant coefficients. In such a situation which coefficient should be reported, the *_within or the *_between coefficient on unemployment?

            Very best,

            John




            Technically you should in each case use factor variable notation (see help fvvarlist).

            From a substantial perspective do not use interactions as a way of including time-invariant predictors in the model. By interacting such a predictor with time, your model answers the theoretical question of how the effect of that predictor varies over time. It does by no means estimate a main effect of this predictor. If you are not interested in testing interaction effects then you should not use interactions.

            The "hybrid-model" is actually a rather simple thing, that can be explained in three steps

            1. Calculate the panel-unit-specific mean for all time-varying predictors (but not the response/outcome). This is something along the lines by <id> ,sort : egen x1_between = mean(x1)

            2. Subtract the panel-unit-specific mean from the original values, i.e. preform the fixed-effects/within-transformation. This is as simple as generate x1_within = x1 - x1_between

            3. Run a random-effects/mixed model where you include the time-varying predictors in their de-meaned form (those from step 2) and their mean (those calculated in step 1) along with the time-invariant predictors. This is, in the simplest form, xtreg depvar x1_within x1_between x2_within x2_between x3_within x3_between x4

            You are done. The coefficients for the *_within variables resemble the fixed-effects estimates, while the *_between variables can be interpreted as a between estimator. The coefficients for time-invariant predictors are those from a random-effects model.

            Be warned that interactions are not as straight forward implemented in such models, as one might think. But see Schunck (2013) for more on this point.

            Best
            Daniel

            Comment


            • #21
              Originally posted by John Adler View Post
              In my own analysis I have a dependent variable (health behaviors) and a time-variant independent variable (unemployment) I wish to report the coefficient on unemployment in a model which controls for theoretically relevant time-invariant coefficients
              If controlling for time-invariant confounders [not coefficients] is all that you want, then you do not need a "hybrid" model at all; just use the fixed-effects/within estimator.

              Best
              Daniel

              Comment


              • #22
                Dear Daniel,

                Thank you for your response,

                Of course this makes perfect sense when I realise that of course my time-invariant variables are confounders and not coefficients I wish to measure.

                Could I ask how this might extend to correlated random effects (CRE)? i.e. as I had heard that in some cases, CRE approaches lead to widely used estimators, such as fixed effects (FE) in a linear model and I am using a linear model in my own analysis I was wondering if this suggests an approach that provides "the best of both worlds" when it comes to fixed and random effects? Is there any model out there that can?

                Thank you again

                Very best,

                John

                Comment


                • #23
                  I do not really understand what "the best of both worlds" is supposed to mean. Technically, the CRE is nearly identical to the so-called hybrid model; just leave the time-variant predictors in their original form. The within-estimates are exactly the same for the two models, it is only the coefficients for the between (mean) variables that are interpreted differently. The article that I cite explains the differences in quite some detail.

                  One of the reasons for choosing the fixed-effects/within estimator is the desire to control for (possibly unobserved) time-invariant confounders. If this is what you want, then there is really nothing that the CRE or hybrid model adds. If you want something else, please explain what it is that you want.

                  Best
                  Daniel

                  Comment


                  • #24
                    You can estimate time-invariant effects using the Fixed Effect Filter (FEF) two-step procedure. See Pesaran and Zhou (2014)
                    https://www.tandfonline.com/doi/abs/...8.2016.1222225

                    Comment


                    • #25
                      Assuming exogeneity of the time-invariant variables, this FEF estimator is essentially just a simple instrumental variables estimator where all time-varying regressors are instrumented with their own within-groups deviations and all time-invariant variables are instrumented by themselves. In that regard, it is just a special case of the Hausman-Taylor estimator where all time-varying regressors are allowed to be correlated with the unobserved unit-specific effects but all time-invariant regressors are assumed to be uncorrelated with them. This assumption should be kept in mind when saying that the effects of the time-invariant regressors can be identified with this method.

                      The estimates for the coefficients of the time-varying regressors are once again the "fixed-effects" within-groups estimates. Yet, this approach differs from the correlated random-effects estimator by not including the within-groups averages of the time-varying regressors. The exogeneity assumption imposed on the time-invariant regressors is thus different. While the CRE estimator assumes that the time-invariant regressors are exogenous conditional on the time-varying regressors, the FEF / Hausman-Taylor estimator assumes unconditional exogeneity. This distinction is important if the time-invariant regressors are correlated with (the averages of) the time-varying regressors.
                      Last edited by Sebastian Kripfganz; 22 Dec 2018, 02:54.
                      https://twitter.com/Kripfganz

                      Comment


                      • #26
                        I think that roughly speaking the upshot is that
                        if we are interested in estimating the structural impact of a (roughly) time invariant variable, say education, on an outcome, say wages, and we want to allow for individual specific unobserved effect correlated with the regressor of interest, say individual ability correlated with education
                        then we need to find external to the model instruments to identify this effect, say proximity to college to instrument education.

                        This effect of (roughly) time invariant variable is just not identified in the absence of external to the model instruments.

                        Comment


                        • #27
                          Dear all,
                          after reading through this thread I am a bit puzzled by the huge amount of potential solutions. I understand that there are many options, all with pros and cons, so no clear recommendation is feasible I assume. Still I wonder whether an (conceptually) easier solution, that is not as complex as some of these "hybrid" models (just as an example) is also fine?
                          Imagine a very basic example: you want to test whether men and women react differently to becoming unemployed. The dependent var is overall satisfaction (2 time points; t1, before the event and t2 after the event). Since gender is time constant, simply using an FE model does not work. We assume there are no time-variant controls required (experimental design). What would be a simple yet robust (FE-related?) option? Given that these questions are so popular, I wonder if there are any new insights or recommendations.
                          Best wishes

                          (Stata 16.1 MP)

                          Comment


                          • #28
                            Imagine a very basic example: you want to test whether men and women react differently to becoming unemployed.
                            [emphasis added]
                            For this purpose, the sex variable itself is not needed, and will be dropped from a fixed-effects model due to colinearity with the fixed effects. The question you are posing depends instead on the sex#unemployment interaction term--which is not time invariant and will process just fine in a fixed effects model.

                            Comment


                            • #29
                              Dear all,

                              Can someone please confirm whether my interpretation of the following results are correct? It involves an interaction between a time constant dummy and a continuous variable in a panel fixed effects model.

                              My code

                              Code:
                              webuse nlswork.dta, clear
                              xtset idcode year
                              xtreg ln_wage i.race##c.hours i.year, fe
                              
                              .....
                              ------------------------------------------------------------------------------
                                   ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                              -------------+----------------------------------------------------------------
                                      race |
                                    black  |          0  (omitted)
                                    other  |          0  (omitted)
                                           |
                                     hours |    .001518   .0002699     5.63   0.000      .000989    .0020469
                                           |
                              race#c.hours |
                                    black  |  -.0037024   .0005952    -6.22   0.000    -.0048691   -.0025357
                                    other  |   -.001572   .0024067    -0.65   0.514    -.0062893    .0031452
                              
                              ....
                              I would interpret this the following way:

                              - one additional hour is associated with 0.001518 * 100 percent increase in wage for whites (i.e. not black or others)
                              - one additional hour is associated with (0.001518 - 0.0037024) * 100 percent increase in wage for blacks
                              - one additional hour is associated with (0.001518 - 0.001572) * 100 percent increase in wage for others

                              Is this correct? I am struggling because there is an omitted category.

                              Thank you!

                              All the best
                              Leon​​​​​​

                              Comment


                              • #30
                                Your interpretation is, indeed, correct.

                                If you prefer not doing arithmetic, and worry about getting things mixed up, there is the -margins- command. Among the many things it can do for you, it can do this:

                                Code:
                                margins race, dydx(hours)
                                which will show you the marginal effect of hours worked on lnwage in each of the racial categories (including the omitted one).

                                By the way, bear in mind that the heuristic of multiplying the coefficient by 100 to come up with a percent change in X when ln X has been the outcome variable is an approximation. With very small coefficients like you have, it is an excellent approximation, so no worries. But I frequently see this misapplied to large coefficients, where the approximation becomes bad. When the coefficient exceeds 0.1, it is better to do the exact calculation. (Actually, given that everybody who needs to think about this has a readily-available computer, it is unclear to me why anyone uses the approximate formula anymore. But, whatever.)

                                Comment

                                Working...
                                X