Commands for lagged dependent regression when using three indexes

Jakob Andersen

Join Date: May 2016

Posts: 10
#1

Commands for lagged dependent regression when using three indexes

27 May 2016, 10:06

I hope some of you can help me with what command to use when analyzing panel data with lagged dependent. I couldn't find any threads containing what specific commands to use.

I'm analyzing X: leadershipstyles (three seperate indexes from 0-100) --> Y: sick absence (in number of days) which are the one I need to lag.
I have a time-variable: 0: before treatment, 1: after treatment. Furthermore some control variables.

I've set the dataset as time series.

Now, how would the command look like? Should I have all my three indexes in one command, or three separate?

Thanks!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#2

27 May 2016, 10:28

I'm not sure I understand how your data is set out. Let me assume that you have three leadership variables, X1, X2, and X3, which are "continuous" variables that range from 0-100. Your dependent variable is sick_days. It is entirely unclear to me whether the unit of observation here is the manager or the employee. In the former case, there would be a variable manager_id identifying the manager, X1-X3 would pertain to that manager, and Y would be total sick_days among that manager's direct reports. Or you might have the employee as the unit of analysis, X1-X3 would represent their managers' leadership styles (which might vary over time), and there would be an employee_id variable. Or you might have observations which report both the manager_id and the employee_id, along with the manager's X1-X3 and the employee's Y. And you have these things observed over various time periods (months or years or quarters or something like that.) Perhaps there are even other designs for this. The exact command would depend on the design. So more information is needed to give a precise answer to your question.

That said, you seem to be concerned with the fact that there is something special you need to do because you want to lag your dependent variable. That is not a problem. Whatever the regression command you want is, you can just incorporate the lag operator into it:

Code:

regression_command L.Y X1 X2 X3 // AND MAYBE OTHER VARIABLES // INCLUDING FIXED OR RANDOM EFFECTS FOR EMPLOYEES & MANAGERS

And that said, it is strange that you want to lag the dependent variable. Generally, we try to model things to try to reflect the independent variables causing the dependent variable, not the other way around. And it would be very bizarre to argue that current values of X1-X3 are causing earlier values of Y. Did you perhaps mean that you want to lag the independent variables? If so:

Code:

regression_command Y L.X1 L.X2 L.X3 // ETC.
Comment
Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#3

27 May 2016, 10:32

Yes, you would normally include all three indices in your model. However, this is a substantive question of developing your analysis. Once you have the model you wish to estimate, you can come back here and develop the proper Stata command. There are many different types of commands for time-series data, so you will want to think about the time (and panel?) structure of your data. It also appears you have a treatment effect you want to estimate? Or at least take into account? That is another wrinkle you will have to deal with in developing your model.

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
Comment
Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#4

27 May 2016, 10:34

Crossed with Clyde. There is also the possibility that Jakob wants to include a lag of the dependent variable on right hand side:

Code:

regression_command Y L.Y X1 X2 X3

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#5

27 May 2016, 10:42

Carole,

Yes, good point. There are so many design possibilities that fit with the description in #1 that it's really hard to give concrete advice here.
Comment
Jakob Andersen

Join Date: May 2016

Posts: 10
#6

27 May 2016, 11:39

Thanks for quick answer!

My indexes reflect the employees perception of their leader's leadershipstyle, so the actual managers are not included. The sickness absence is also only for employees.

I realized I might not be sure about which model to use yet - it sounds like lagging the independent (perceived leadershipstyle) is more accurate.
I have two time periods: one before treatment, and one after treatment one year later.
Hope it makes sense
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#7

27 May 2016, 12:11

Well, if you only have two time periods, using a lagged variable is a bit of a problem: the lag will be undefined (i.e. missing) for the pre-treatment period, and so you will be unable to incorporate any of the pre-treatment observations into your regression.

I think having separate observations for each employee pre and post intervention and doing a mixed-effects model (or, if those aren't really accepted in your field, a fixed-effects model). So the data layout and analysis might look something like this:

Code:

// CREATE A RANDOM DATA SET JUST TO // ILLUSTRATE DATA LAYOUT clear* set obs 100 // EMPLOYEES gen int employee_id = _n expand 2 // PRE vs POST by employee_id, sort: gen byte time = _n-1 label define time 0 "Pre" 1 "Post" label values time time set seed 1234 forvalues i = 1/3 { gen x`i' = floor(100*runiform()) label var x`i' "Perception of Leader's Style `i'" } gen y = rpoisson(5) label var y "Sick Days Past Year" // ILLUSTRATIVE MODELING COMMAND mepoisson y x1 x2 x3 i.time || employee_id:

The coefficient of 1.time in the output will represent the effect of the intervention, adjusted for the X values (both current and lagged), to the extent that this can be estimated in a pre-post design with no concurrent control group. (I am assuming here that this is your principal, or one of your principal goals.)

The choice of -mepoisson- is not critical here. I'm treating sick days as a count variable and assuming that, in general, the conditional means are small so that a normal approximation and use of -mixed- would not be appropriate. Also assuming there is no overdispersion, so -menbreg- is not needed. Obviously, adjust this decision to your actual circumstances.

Note that this analysis does not explicitly use lagged variables. It does, however, take into account both the pre- and post- intervention values of all the variables, which I imagine is what you are most concerned with. This is probably what I would do in your situation, assuming I have correctly understood it.
Comment
Jakob Andersen

Join Date: May 2016

Posts: 10
#8

27 May 2016, 12:31

Fixed effects seems to make sense here yes.

Just to clearify my prime goal is to see wether change in perceived leadership affects sickness absence among employees. I have four groups; one for each treatment and a control group. 10.000 observations.

Thanks again for the good answers! Appreciate it!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#9

27 May 2016, 14:56

It isn't clear to me whether your intervention is directed at changing these leadership style perceptions, or at changing use of sick days, or at something different but perhaps affecting the variables we are working with here. Assuming that it's not directed at the leadership style perceptions (or the actual leadership style), a model like this would make sense:

Code:

xtset employee_id xtpoisson y x1 x2 x3 i.treatment_group##i.time, fe margins, dydx(x1 x2 x3)

If the intervention is directed at the leadership styles or perceptions thereof, then it is a bit weird to have a model of this sort that includes both treatment effects and the leadership style perceptions (which would presumably then be variables mediating the treatment effect on sick days) among the predictors in this way. Interpreting such a model in a way that separates the effects of the treatment (which might also work through other mechanisms than just leadership style perceptions) and the effects of leadership style perceptions (which might operate in part independently of the treatment). I would think more in terms of a path analysis approach using -gsem-, which would enable you to separately estimate direct and indirect effects. I know that there are ways to incorporate the nesting of repeated observations within employees in that kind of model, but I'm not personally familiar with them and can advice in greater detail.
Comment
Jakob Andersen

Join Date: May 2016

Posts: 10
#10

28 May 2016, 01:31

Sorry for the confusion. I realize I probably need difference-in-difference design.

To sum up I need to analyze employee-perceived leadership style's (three indexes 0-100) effect on employee sickness absence (number of days). I thus have 4 groups: 3 treatments and 1 control measured in "treatment".
I have two time periodes measured in "wave" where 0 is before, and 1 is after treatment.

1. problem is setting Stata to time series. I get the "repeated time values in sample" note.

2. problem is that I don't know which commands to use when problem 1 is fixed.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#11

28 May 2016, 09:54

The code in #9 is a difference in difference design. And it will work equally well with three treatments as with 1. But, again, I'm confused about the relationship between treatment(s) and the perceptions. I'm even more worried now as I'm suspicious that each of your three treatments is targeted towards influencing one of the three perceptions. So my concerns expressed in the last paragraph of #9 still prevail, and are, if anything, even stronger.

With this design, having only two time periods, if you follow my approach, you do not need to use lag operators. So you do not need to specify a time variable in your -xtset- command. So just -xtset employee_id- and you won't get that error message.
Comment
Jakob Andersen

Join Date: May 2016

Posts: 10
#12

28 May 2016, 10:22

Alright, II thought xt was only for lagged.

To clearify: I 10.000 employees who answered how they perceive their leader from a range of questions. From that I made 3 indexes that reflect 3 different leadership styles. Their sickness absence was also measured.
Then, the leaders was randomly divided into 4 groups; one control, and 3 groups that received the 3 different kinds of leadership style training. A year after the training the employees was asked the same questions again - also sickness absence.
What I want to know is whether the training has affected the employees sickness absence (because I assume that the leaders have become better during the training leading to more motivated employees = lower sickness). Does it make sense?

So if I understand you right I can put my variables into these commands, and it will be a difference-in-difference design?
And one more thing: I don't have to run a command for each of my indexes?
xtset employee_id xtpoisson y x1 x2 x3 i.treatment_group##i.time, fe margins, dydx(x1 x2 x3)
Again, can't tell how much I appreciate your patience
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#13

28 May 2016, 12:12

This is much clearer.

There are really three research questions here.

Did the interventions affect sickness days?

Code:

xtset employee_id xtpoisson y i.treatment##i.time, fe margins treatment#time, predict(nu0)

The significance of each intervention's effect on sick days will be represented by its interaction coefficient--this is a difference in differences analysis. Thus for significance testing (I'll assume treatments are coded 0, 1, 2, 3, with 0 being control) you will want to look at the coefficients of 1.treatment#1.time, 2.treatment#1.time, and 3.treatment#1.time. Note that you should not include the x variables here. To control for the x variables when they are thought to be the very means by which the intervention will affect y is to cancel out the intervention effects in the model. So no x variables here.

Did the interventions affect the three measures of leadership style?

Code:

xtset employee_id forvalues i = 1/3 { xtreg x`i' i.treatment##i.time, fe margins treatment#time }

Again, for each of the three x`i' regressions, the coefficients to focus on for treatment effectiveness are 1.treatment#1.time, 2.treatment#1.time, and 3.treatment#1.time. This is a difference-in-differences analysis.

The third research question is whether perceptions of leadership style influence sick days (regardless of interventions).

Code:

xtset employee_id xtpoisson y x1 x2 x3 i.time, fe margins, dydx(x1 x2 x3) at(time = (0 1)), predict(nu0) marginsplot

This is not a difference in differences analysis. Note that here we do not include treatment. We are interested only in the direct relationship between the perceptions of leadership style, and not the fact that they were influenced by interventions. Including the interventions here would confound that. Even the inclusion of the time variable here is questionable, as it is a proxy for pre- intervention vs post-intervention status. I chose, nevertheless, to include it because of the high likelihood that there would be a secular trend in sick day usage that needs to be adjusted for. The confounding of time with intervention would have been less with only a single intervention (vs. control) in play, but we have to do the best we can with the data we have.

It is also possible to use -gsem- to distinguish the part of the intervention effect(s) that are mediated by the changes in leadership perceptions. But I don't know how to set that up in a way that accounts for the repeated measures data. Perhaps somebody with more experience in that will chime in here.

.
Comment
Jakob Andersen

Join Date: May 2016

Posts: 10
#14

29 May 2016, 00:07

Super!

1. Regarding "Did the interventions affect sickness days?"
1,1) Yes, the treatment variable is coded 1,2,3,4 with 1 being control. I'm I understanding it right when the command: xtpoisson y i.treatment##i.time, fe includes all my treatments - said in another way: I only need this one command, and not one for each of my 4 groups?

1,2) What does this command do, and how am I supposed to understand the output? margins treatment1#wave, predict(nu0)

1,3) How am I supposed to control for other variables?

2. Regarding "Did the interventions affect the three measures of leadership style?"
I don't think I'm interested is this part.

3. Regarding "whether perceptions of leadership style influence sick days (regardless of interventions)"
I'm not sure I understand this part.

xtset employee_id
xtpoisson y x1 x2 x3 i.time, fe margins,
dydx(x1 x2 x3) at(time = (0 1)), predict(nu0)
marginsplot

3,1) Does this measure leadershipstyle --> sickness (t) and leadershipstyle --> sickness (t+1)?

I think I'm supposed to control for initial sickness?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#15

29 May 2016, 00:41

1.1 Correct, this one command will deal with all four grouops.
1.2 It will show you, in each of your four groups, and at both wave 1 and wave2, your model's predicted number of sick days, conditional on the fixed effect being zero (so, more or less, an average)
1.3 Add them to the list of variables in the -xtpoisson- command. You don't have to change the -margins- command: it will automatically adjust for them.

2. OK.

3.1. So, you probably had in mind an analysis of covariance with just one observation per employee, that observation containing the present and lagged values of the x's and the lagged value of y. Something like -poisson y x1 x2 x3 L.x1 L.x2 L.x3 L.y if time == 1-. GIven that employee_id and time do not uniquely identify observations in your data set (though I don't understand why that is the case), you can't -xtset employee_id time-, and so you can't get the lag operator working for you. But in any case, that approach has a drawback: the coefficients of the x's all get attenuated by a factor equal to the intra-class correlation. If you're aware of it (which many people who use this approach are not), you can correct for that. But the approach I'm outlining avoids this problem for you and still appropriately accounts for the values of all variables at both times. It is, in fact, an algebraic transformation of the one-observation-per-employee approach, and one that overcomes the attenuation of the regression coefficients that I mentioned earlier, so it is better because its interpretation is straight forward: every coefficient in the model actually means what it appears to mean.
Comment

Announcement

Commands for lagged dependent regression when using three indexes

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment