fixed effects regression - Effects of binary variable

Alex Mueller

Join Date: May 2018

Posts: 51
#1

fixed effects regression - Effects of binary variable

06 Jun 2018, 03:34

Dear all,

I have estimated a fixed effect panel regression model for work performance (dependent variable y) in my data set for about N = 100 workers for T = 15 years.
Now I want to analyze whether a change in one binary variable x1 (promoted yes/no) leads to a significant improvement of the work performance with some control variables (x2,…, xn).

Some in my data set get promoted after 10 years, some after 12 and some not at all, implying x1 = 0 for all periods.
Additionally many start to work in T !=1 and stop working in T !=15 arising the problem of attrition.

Can I simply write:

xtreg y x1 x2 … xn if y==0, fe
xtreg y x1 x2 … xn if y==1, fe

and then compare coefficients?
Do you have any other idea how to address this problem?
In a first try I created a variable capturing whether a manager was promoted at all during the observation period. I included that variable in my model but because of the fe model Stata omitted that variable.
Additionally is there a possibility to see if attrition is a problem? Meaning if the dropping out of the panel is correlated with the performance y?

Many thanks!

Kind regards,
Alexander-Florian
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

06 Jun 2018, 03:53

Alex:
welcome to this forum.
Some comments about your query:
- if, as it would seem from your description, you have a continuous dependent variable (y), I fail to get the last lines of code you share (...if y==0);
- due to -fe- machinery, the variable you created will be omitted by Stata if time-invariant or collinear with other predictors;
- attrition can surely be a problem, especially in the case you envisage: if attrition is due, say, to poor performance, those values are missing not at random and -mi- assumptions are untenable.

Kind regards,
Carlo
(Stata 19.0)
Comment
Alex Mueller

Join Date: May 2018

Posts: 51
#3

06 Jun 2018, 06:41

Carlo,
many thanks for your Kind reply.
I made a mistake here:

Originally posted by Alex Mueller View Post

xtreg y x1 x2 … xn if y==0, fe
xtreg y x1 x2 … xn if y==1, fe

It shall be ...if x1==0 and ... if x1==1 in order to indicate wheter the respective worker has been promoted.

- Regarding Attrition: Is there any possibility to put that in numbers? i.e. can i check if there is a significant correlation between Performance y and dropping out of the Panel?
(from logic there should be a Connection via the poor Performance)

best
Alex
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#4

06 Jun 2018, 07:16

Alex:
you can use -fvvarlist- notation to innestigate the contribution of -x1- (adjusted for the other predictors) to the variation of the dependent variable.
As far as dealing with missing values is concerned, if panel units scoring low performance in wave 1 are missing in wave 2, it might be that your data are missing at random (MAR).
But if you cannot rely on a previous measurement of performance, you should assume that data are missing not at random (MNAR).
Unfortunately, you cannot test MAR vs MNAR mechanism.
For more details (and interesting references) on missing data, see -mi-related entries in Stata .pdf manual.

Kind regards,
Carlo
(Stata 19.0)
Comment
Alex Mueller

Join Date: May 2018

Posts: 51
#5

07 Jun 2018, 00:27

Carlo, thanks a lot for your comments.

I don’t really get the fv command. I understand from the manual it “allows the model to contain categorical variables”.
However i don’t see why I need that specification: my fe model contains some other (for some individuals time variant) categorical variables (e.g. Post Code) but I did not use the fv specification.
Does that mean the effect of (in this case) the region of residence is estimated wrongly in my model?

Regarding the attrition:

Originally posted by Carlo Lazzaro View Post

Alex:
As far as dealing with missing values is concerned, if panel units scoring low performance in wave 1 are missing in wave 2, it might be that your data are missing at random (MAR).
But if you cannot rely on a previous measurement of performance, you should assume that data are missing not at random (MNAR).
Unfortunately, you cannot test MAR vs MNAR mechanism.
For more details (and interesting references) on missing data, see -mi-related entries in Stata .pdf manual.

Sorry I am a little confused: Shouldnt it be the other way round? If performance is related to attrition then I would assume a MNAR

Thanks for your patience

Cheers,Alex
Comment

Alex Mueller

Join Date: May 2018
Posts: 51

07 Jun 2018, 03:08

Maybe it's the best to share some part of the dataset and the code i used:

Dataset:

Code:

  			year
 			workerID
 			promoted_in_observation_period
 			   performance
 			age
 			postcode
 			work_exp

 			2000
 			1

 			1,10
 			32
 			10115
 			10

 			2000
 			2

 			0,90
 			44
 			10319
 			22

 			2000
 			3

 			1,20
 			28
 			10435
 			8

 			2001
 			4

 			0,80
 			50
 			10435
 			33

 			2001
 			1
 			1
 			1,50
 			33
 			10115
 			11

 			2001
 			2

 			1,20
 			45
 			10319
 			23

 			2002
 			1
 			1
 			1,60
 			34
 			10245
 			12

 			2002
 			3
 			1
 			0,90
 			29
 			10435
 			9

 			2002
 			4

 			1,10
 			51
 			10435
 			34

Code:

xtset workerID year, yearly
xtreg performance promoted_in_observation_period age postcode work_exp i.year, fe (vce) robust

in my model promoted_in_observation_period would be the variable of interest. Does this deliver the desired output?

Best, A.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#7

07 Jun 2018, 05:48

Alex:
- if you have a two-level categorical variable, using -fvvarlist- or not does not make any difference;
- if you have a three(or more)-level categorical variable, -fvvarlist- makes a relevant difference, because it tells Stata that the predictor is not continuous and integers are, in fact, levels.
As fa as the missing values quastion is concerned:
- if, in a previous wave of data, those who reported poor performance (assumption: performance score is self-reported) are missing in the next wave, you can assume that missingness is MAR, as it depends on the observed data (that is, the poor performance score registered in the previous wave of data) and not on unobserved values;
- if you do not have a previous wave of data and panel units do not report any performance score, you can suspect that missing is not at random, as it probably depends on the unreported data (instead of reporting a poor performance score, panel units skip the question altogether).
About your code:
-if performance is a continuous variable, -xtreg- is the way to go (the user-written command -xtoverid- turns out handy to test which specification, -fe- or -re-, fits your data better; -please note that -hausman- allows default standard errors only);
- if working experience is expressed in years, you may want to test whethet there's a turning point (that is, a squared relationship between -work_exp- and -performance-) and rewrite the regression code accordingly (I still assume that -fe- is the right specification for your data):
-(vce) is pleonastic with -robust-

Code:

xtreg performance promoted_in_observation_period age postcode c.work_exp##c.work_exp i.year, fe robust

Kind regards,
Carlo
(Stata 19.0)
Comment
Alex Mueller

Join Date: May 2018

Posts: 51
#8

07 Jun 2018, 06:30

Carlo,
Many thanks again.
So does this imply if I code the postal code variable as region 1,2,3,…,n (postcode_regiondummy) my code would be

... fv postcode_regiondummy … ?

For the missing values, I still don’t see why I would assume a random missing if attrition is correlated with performance. If poor performance is linked to being fired i.e. drop out of the panel it seems –for me- to be a systematic error/attrition.

-yes, performance shall be continuous. I want to go for fe model as I assume performance depends on the unobservable ,though time-invariant, natural skill of a worker.
I will have a look at xtoverid. Does that mean the hausman test is not applicable to my setting? Why would the standard error of my models would not be default.

- the squared relationship approach sounds very interesting but I never worked with non-linear models. Does the model now simply square the experience in years? I dont really see why that is happening =) Additionally Stata help says the c. implies a continous variable whereas work_exp would be discrete (0,1,2,...n) (countable infite; maybe in this context limited to say a max of 50 years or so)
- thanks for that hint about vce robust

Code:

xtreg performance promoted_in_observation_period age i.postcode_regiondummy c.work_exp##c.work_exp i.year, fe robust

Carlo, many thanks for your advise!!!
Best, A.

Last edited by Alex Mueller; 07 Jun 2018, 07:03.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#9

07 Jun 2018, 08:06

Alex:
- your chunk of code will be -i.postcode_regiondummy-;
- there's an interesting example on the (subtle) difference between MAR and MNAR data (it focuses on repeated math tests, but that's immaterial) in https://www.amazon.com/Missing-Data-.../dp/1593853939, pages 43-50;
- if you introduce a squarede terms, -xtreg-is still linear in coefficients but non linear in the relationship between the predictor and the dependent variable. The logic behing squaring experience is that performance scored at its top after some years of experience, then it usually declines. In your case the variable is discrete because its more comfortable to deal with integers representing years (all in all it's the same for age), but time is ontologically continuous, so check whether squaring experience makes sense with your data.

Kind regards,
Carlo
(Stata 19.0)
Comment

Alex Mueller

Join Date: May 2018
Posts: 51

#10

08 Jun 2018, 00:53

Carlo,
thank you for your help!
The mentioned book is unfortunately not available in our library but i will try to find alternative literature.
i ran the regression again with your hints :

Code:

 .xtreg performance promoted_in_observation_period c.Age##c.Age c.work_exp##c.work_exp former_achievments job_quits number_former_employers job_rotation i.year graduate sex medium_size_enterprise , fe robust

which generated the following outcome:

Code:

note: 2015.year omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =        301
Group variable: ID                              Number of groups  =        97

R-sq:                                           Obs per group:
     within  = 0.4426                                         min =          1
     between = 0.1227                                         avg =        3.1
     overall = 0.1647                                         max =         14

                                                F(24,96)         =      25.57
corr(u_i, Xb)  = -0.4997                        Prob > F          =     0.0000

                                                     (Std. Err. adjusted for 97 clusters in ID)
------------------------------------------------------------------------------------------------
                               |               Robust
                   performance |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------------------+----------------------------------------------------------------
promoted_in_observation_period |   .0090608   .0052476     1.73   0.086    -.0012893    .0194108
                           Age |  -.0191297    .011546    -1.66   0.099    -.0419024    .0036429
                               |
                   c.Age#c.Age |   .0002122    .000092     2.31   0.022     .0000307    .0003937
                               |
                      work_exp |    .005203   .0060487     0.86   0.391    -.0067269     .017133
                               |
         c.work_exp#c.work_exp |   -.000138   .0000785    -1.76   0.080    -.0002928    .0000168
                               |
            former_achievments |  -.0000657   .0002536    -0.26   0.796     -.000566    .0004345
                     job_quits |  -.0051987   .0075022    -0.69   0.489    -.0199954    .0095981
       number_former_employers |   .0032433   .0076221     0.43   0.671      -.01179    .0182766
                  job_rotation |   .0004076   .0008302     0.49   0.624    -.0012299    .0020451
                               |
                          year |
                         2003  |   .0018279   .0042328     0.43   0.666    -.0065205    .0101763
                         2004  |   .0050953   .0044401     1.15   0.253    -.0036621    .0138527
                         2005  |   .0115785   .0063166     1.83   0.068    -.0008798    .0240369
                         2006  |  -.0008196   .0044127    -0.19   0.853    -.0095229    .0078838
                         2007  |  -.0032424   .0053398    -0.61   0.544    -.0137742    .0072894
                         2008  |  -.0116758   .0054543    -2.14   0.034    -.0224335   -.0009182
                         2009  |  -.0100215   .0068462    -1.46   0.145    -.0235244    .0034814
                         2010  |   -.004674   .0092071    -0.51   0.612    -.0228335    .0134855
                         2011  |  -.0132499   .0093119    -1.42   0.156    -.0316162    .0051163
                         2012  |  -.0155686   .0063318    -2.46   0.015    -.0280571   -.0030802
                         2013  |  -.0108537   .0073446    -1.48   0.141    -.0253397    .0036323
                         2014  |  -.0004855   .0077775    -0.06   0.950    -.0158254    .0148544
                         2015  |          0  (omitted)
                               |
                      graduate |  -.0115111   .0058921    -1.95   0.052    -.0231323    .0001101
                           sex |  -.0027046   .0046743    -0.58   0.564    -.0119238    .0065147
        medium_size_enterprise |   .0590412   .0068199     8.66   0.000     .0455901    .0724923
                         _cons |   .4233649   .3144232     1.35   0.180    -.1967821    1.043512
-------------------------------+----------------------------------------------------------------
                       sigma_u |  .07633407
                       sigma_e |  .02945436
                           rho |  .87040613   (fraction of variance due to u_i)
------------------------------------------------------------------------------------------------

Does this imply Age and work_exp are not significant but Agesquared and work_expsquared? (at least p=0.08) Does this mean there is the quadratic relationship you mentioned?
None of the years seem to be significant apart from 2014. In my opinion that does not have a lot of explanatory power, but just controls for the year.

--> Can i conclude that whether a worker was promoted matters positively (coef 0.009) at least at the 10 percent level

Cheers, A.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#11

08 Jun 2018, 01:48

Alex:
let's focus on the usual arbitrary level p<0.05:
- results for age reports a turning point at [(-Age)/(2*c.Age#c.Age)]=45.07 years.
Studying the first derivative, it should be a minimum (please check it).
If what above is true, provided that 45.07 years is included among the values of -Age- in your dataset, performance increases after 45.07 years, when adjusted for the remaining predictors.
-i.year- seems redundant (you can test if they're jointly significant via -testparm(i.year)-; under -fe- -i.year- tells that, within each panel, time shows a negligible effect in explaining variation in promotion, when adjusted for the remaining predictors;
- I would conclude that previous promotions show a dubious contribution on performance: all in all, that makes sense if we consider that in many organizations works the so called promoveatur ut amoveatur scheme (that is, some people, obviously not all, are promoted to stop making damages in their previous positions).

Kind regards,
Carlo
(Stata 19.0)
Comment
Alex Mueller

Join Date: May 2018

Posts: 51
#12

08 Jun 2018, 02:44

Carlo,
sorry I dont understand that calculation:

Originally posted by Carlo Lazzaro View Post

- results for age reports a turning point at [(-Age)/(2*c.Age#c.Age)]=45.07 years.
Studying the first derivative, it should be a minimum (please check it).
If what above is true, provided that 45.07 years is included among the values of -Age- in your dataset, performance increases after 45.07 years, when adjusted for the remaining predictors..

How do you come up with 45.07? In my data set tha variable is rounded to full years (min 33, max 70, mean 47,63) How can i build the derivative of a variable?

testparm (i.years) delivers

Code:

2003.year = 0 ... until 2015.year = 0 chi2( 14) = 26.44 Prob > chi2 = 0.0148

testparm (i.years), equal delivers

Code:

( 1) 2003.year +2004.year = 0 ... until 2003.year + 2015.year = 0 chi2( 12) = 23.54 Prob > chi2 = 0.0235

Which of the both options is correct?
As the 0 Hypothesis is "coefficients are equal" that means years does have an impact (and is NOT redundant) since the hypothesis is rejected, right?
thanks, Alex

Last edited by Alex Mueller; 08 Jun 2018, 02:53.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#13

08 Jun 2018, 03:07

I just skimmed through the posts here; the two first potential problems I gather are

1. Should promotion not be a function of performance, i,e, higher performance leads promotion rather than the other way round? Perhaps you should somehow address this problem of reverse causality.

2. Ignoring 1., do you expect promotion to affect performance for one year only? I ask, because you are looking at the effect of promotion in one/this year. I do not know about the theoretical background, but I could imagine that promotion should be coded 1 for every year following initial promotion.

Best
Daniel
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#14

08 Jun 2018, 03:23

Alex:
- calculation of the turning point with respect to -Age- (which is expressed in years) (this calculation implies the first derivative: see below):

Code:

. di [(-(-.0191297))/(2*.0002122)] 45.074694

as 45.07 falls within the range for _Age- in your dataset you can report a turning point.
- the primitive function for -Age- is: .0002122x² -.0191297x;
- the first derivative of the abovementioned primitive is: [2*(0002122)x - .0191297]
- to identify the turning point as minimum or a maximum, you should find out the values for which the first derivative is >0 and <0, that is:
- [2*(0002122)x - .0191297]>0 and [2*(0002122)x - .0191297]<0. The slope of the function is positive for values >45.074694 and negative for values<45.074694. Hence, the turning point is a minimum;
- -i.years- (I would consider the first code) are jointly significant and should be kept in your regression model. Jointly, they tell that time gives a relevant contribution in explaining the within panel vraition of the dependent variable.

PS: crossed in the cyberspace with Daniel's reply, who, as usual, gives valuable inputs to reflect upon.

Last edited by Carlo Lazzaro; 08 Jun 2018, 03:25.

Kind regards,
Carlo
(Stata 19.0)
Comment
Alex Mueller

Join Date: May 2018

Posts: 51
#15

08 Jun 2018, 03:42

Daniel, thanks for your hints:

Originally posted by daniel klein View Post

I just skimmed through the posts here; the two first potential problems I gather are

1. Should promotion not be a function of performance, i,e, higher performance leads promotion rather than the other way round? Perhaps you should somehow address this problem of reverse causality.

2. Ignoring 1., do you expect promotion to affect performance for one year only? I ask, because you are looking at the effect of promotion in one/this year. I do not know about the theoretical background, but I could imagine that promotion should be coded 1 for every year following initial promotion.

Best
Daniel

You are absolutely right! To adress these problem i look at the effect of promotion on performance in the year after promotion took place. I.e. in th year after promotion the indicator turns to 1 and sticks to 1 in all following periods. promoted_in_observation_period therefore will for example look like 0 0 0 1 1 1 1 1 1 1... for a worker promoted in t=3 and sticks to 1 until T = 15.
This approach should control for reverse causality and the second issues.

Cheers, Alex
Comment

Announcement