Differences in validity at different times

Edgar Kausel

Join Date: Jul 2015

Posts: 13
#1

Differences in validity at different times

13 Jan 2017, 09:31

Hi all:

I'm conducting a study examining whether a specific intervention improves the quality of performance ratings made by managers regarding employees.
What I have is these performance ratings and also data regarding the employees' "objective performance." Objective performance is my criterion to assess the quality of the performance ratings. So, the larger the relationship between performance ratings and objective performance, the better their quality (akin to the notion of "criterion validity" in educational or organizational psychology).

I have data at two points in time, t1 and t2, regarding the same employees and the same managers (employees are clustered within managers). Our intervention was aimed at improving the performance ratings; this was done after t1 and before t2. So, what I'd like to test is whether the relationship between objective performance and the performance ratings at time 2 was larger than the relationship at time 1.

How would you set up this in Stata?
One way I was thinking about is the following:

xtreg y c.x##i.time, vce(cluster id_e id_s)
where y is objective performance, x is the performance ratings, time is a dummy variable (time1 vs time2), id_e is the employees' identifier and id_s is the supervisors' identifier.

What is the difference between the above vs. the following, where I add the fixed effects (fe) command?

xtreg y c.x##i.time, fe vce(cluster id_e id_s)

Remember that the same employees received the ratings by the same supervisors at both times, and we also have measures of their objective performance at these two times (4 measures in total for all employees).

Any advice or comment (even if it's an entirely different regression or command) is greatly appreciated.

Ed

Last edited by Edgar Kausel; 13 Jan 2017, 09:36.
Tags: fixed effects, interaction, mixed effects, panel data, xtreg
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#2

13 Jan 2017, 10:08

-xtreg- performs three different kinds of regression, corresponding to different models: between effects (-be-), fixed effects (-fe-) and random effects (-re-). The default, when you don't specify, is random effects. So your first model is a random effects model. The second one is fixed effects. The difference is as follows. Both have the same algebraic form:

y_it = constant + b1*x + b2*time + b3*x*time + u_i + e_it

where u is an effect for whatever you designated as the panel variable in your -xtset- command (you don't show it, and it's unclear whether it would be the employee or the supervisor--more on this later), and e is an error term. u is constant within panels, e is not constrained in that way and can vary from one observation to the next. The difference is the handling of the u_i term. In a random effects model, u_i is assumed to be a random variable with mean zero, and the variance of the distribution is estimated from the data. In a fixed effects model the u_i are assumed to be unknown fixed constants and are estimated from the data with no assumptions imposed about their distribution.

From a practical perspective, the differences between random and fixed effects models are these:

For the random effects model, in addition to assuming that the u_i follow a normal distribution, we also have to assume that they are independent of everything else. In the fixed-effects model there is no such assumption necessary. An implication of this is that the fixed-effects model automatically adjusts for any bias that might arise from unobserved effects that do not vary over time. The random effects model makes no such guarantee unless its assumptions are fully met. For this reason, whereas fixed effects models, if properly specified, provide consistent estimation of the model parameters, random effects models may not (if their assumptions are not fully met). However, the random effects model has a compensating advantage: if its assumptions are fully met, its estimates are not just consistent but are also more efficient (that is, have smaller standard errors) than those of the fixed effects model.

So to put it in slightly different terms we can contrast precision with accuracy. The fixed effects model has better guaranteed accuracy, but the random effects model gives greater precision. In some disciplines, such as economics and finance, there is a strong preference for consistent estimates and considerable worry about omitted variable bias, so random effects models are seldom used and are viewed very skeptically. In some other disciplines, such as epidemiology, these concerns are less strong and random effects models are used frequently, fixed effects models less so.

Another thing to bear in mind is that the -fe- model estimates only within panel effects. The -re- estimator is a mixture of within and between panel effects. So if only within-panel effects are of interest, the -fe- estimator is more appropriate.

I hope I've answered your question.

Now I want to turn to an issue that you didn't ask about, but one that looks important to me. You have repeated observations on both employees and supervisors. I don't know what the organizational structure here is, and it may be that employees are nested within supervisors, or they might be crossed, or somewhere in between (e.g. a "matrix" organization). In any case, the -xt- commands only encompass a single level of nesting. So, whichever of these you used as your panel variable in your -xtset- command, you are failing to appropriately adjust for the nesting of observations within the other. To simplify matters, from here on I will assume that you made id_s your panel variable. So you need to also incorporate the effects of id_e in your model. There are a couple of approaches you can use. One is to just include i.id_e as a covariate in the model:

Code:

xtreg y c.x##i.time i.id_e, fe vce(cluster id_e id_s) //

Another is to do a "double fixed-effects" model where both id_e and id_s are absorbed. There is no official Stata command for that, but you can get the -reghdfe- command, written by Sergio Correa, from SSC and use that.

Another approach is to go to a multilevel model with the -mixed- command. The exact syntax to use would differ depending on whether employees are nested within supervisors or whether there is multiple-membership or a crossed design. The simplest case is if employees are nested within supervisors, and then it would be:

Code:

mixed y c.x##i.time, vce(cluster id_e id_s) || id_s: || id_e:

Note that these multi-level models are like -re- models in their underlying assumptions.

I should add that in both your code and in mine above, the option -vce(cluster id_e id_s)- is not valid syntax. You can only specify a single variable here. So you may need to create a new variable that corresponds to combinations of id_e and id_s (see -egen, group()-) for this purpose. I have retained your incorrect syntax here mostly out of laziness as a shorthand.

Note also that the use of cluster-robust variance estimators is not a good idea if the number of clusters is small.
2 likes
Comment
Edgar Kausel

Join Date: Jul 2015

Posts: 13
#3

13 Jan 2017, 11:23

Excellent! Thanks for your thorough answer.
Comment

Announcement

Differences in validity at different times

Comment

Comment