fixed effects regression - Effects of binary variable

Alex Mueller

Join Date: May 2018

Posts: 51
#16

08 Jun 2018, 04:00

Carlo,
thanks for the hints!
I was a little confused about the composition of the Age quadratic function but your demonstartion made it much clearer, thanks.
Can the output with c.Age##c.Age's p value of 0.002 be interpreted as proof of the quadratic relation? (by the way why is the second "#" missing in the output?)
So 45 is the "worst age" for performance and afterwards it gets better?

(you can test if they're jointly significant via -testparm(i.year)-; under -fe- -i.year- tells that, within each panel, time shows a negligible effect in explaining variation in promotion, when adjusted for the remaining predictors;

you mean variation in performance (dependent variable) and not promotion as explanatory variable, right?

Am I right to conclude that the testparm points out that even almost none of the years itself play an important role but that the joint effect of all of them are important for the output?

Thanks to Daniel and you again.

Best wishes, Alex
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#17

08 Jun 2018, 04:24

Alex:
- the second # is omitted because -Age##Age- calls -Age- twice (and each call wants an #, one for the linear term and one for the squared term). The first mention of -Age- is the linear term (that is the derivative of the parabola equation or, put differently, the main conditional effect of -Age-), whereas the second call of -Age- is the interaction between -Age- and itself (or the squared term);
- the minimum of the parabola function tells that 45 years is the "worst age" for performance and afterwards it gets better;
- my bad: I actually meant performance;
- just to make things simpler, I would report the joint significance of -i.year- without delving into the role of each year (which is negligible, by the way).

Kind regards,
Carlo
(Stata 19.0)
Comment
daniel klein

Join Date: Mar 2014

Posts: 3861
#18

08 Jun 2018, 04:29

Originally posted by Alex Mueller View Post

You are absolutely right! To adress these problem i look at the effect of promotion on performance in the year after promotion took place. I.e. in th year after promotion the indicator turns to 1 and sticks to 1 in all following periods. promoted_in_observation_period therefore will for example look like 0 0 0 1 1 1 1 1 1 1... for a worker promoted in t=3 and sticks to 1 until T = 15.
This approach should control for reverse causality and the second issues.

Sounds plausible. I do not believe that this really "controls" for reverse causality, since you are still looking at correlations. It is an intuitive approach that will probably convince most reviewers (well, perhaps not in economics), though. You might want to have a look at this blog entry by Paul Allison and the cited literature for statistically more rigorous ways to address reverse causality.

Best
Daniel
Comment
Alex Mueller

Join Date: May 2018

Posts: 51
#19

08 Jun 2018, 04:48

Carlo,

Originally posted by Carlo Lazzaro View Post

- the second # is omitted because -Age##Age- calls -Age- twice (and each call wants an #, one for the linear term and one for the squared term). The first mention of -Age- is the linear term (that is the derivative of the parabola equation or, put differently, the main conditional effect of -Age-), whereas the second call of -Age- is the interaction between -Age- and itself (or the squared term);
- the minimum of the parabola function tells that 45 years is the "worst age" for performance and afterwards it gets better;

Does that mean the supposed quadratic impact of age can be proved (in this dataset)?

Originally posted by Carlo Lazzaro View Post

- just to make things simpler, I would report the joint significance of -i.year- without delving into the role of each year (which is negligible, by the way).

This means I run the regresseion xtreg performance Budget_Constraint ... i.year ... , fe robust and point out that even though single years may be insignificant the joint effect is important. and as a proof of this thesis i use the testparm(i.Season) ?

Many thanks, A.
Comment
Alex Mueller

Join Date: May 2018

Posts: 51
#20

08 Jun 2018, 04:55

daniel,

Originally posted by daniel klein View Post

It is an intuitive approach that will probably convince most reviewers (well, perhaps not in economics), though.

Hahaha as i am writing my thesis in economics it would be quite important for me to convince economists I see your point. In the theoretical part of my work i try to justify the approach i use and the causality this way. The presented methods in the papaer seem quite complex and i try to avoid to "inflate" the model any further as its already very bulky.
Long story short: My thesis aims at the psychological factors on performnace. I try to find out whether promoted workers have more motivation because their work is appreciated more. Therefore i go this way although i see reverse causality is an issue.
Thankls for your hints, Daniel!
regards, A.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#21

08 Jun 2018, 04:58

Alex:
1) your data support the evidence of a quadratic relationship between -Age- and -performance-;
2) according to the results of your regression model, the hypothesis of no jointly significance of -i.year- is rejected. I would not delve into the role of each year (it's immaterial for your audience).

Kind regards,
Carlo
(Stata 19.0)
Comment
Alex Mueller

Join Date: May 2018

Posts: 51
#22

08 Jun 2018, 06:30

Carlo,
cheers for the help so far.

To address the attrition i created an indicator variable "dfpnp (drop_from_panel_next_periode)". It is 0 if a manager is within the next period and 1 if he is no longer reported in t+1. However he could report again in t+2, but in this case the value for dfpnp in period t still would be unchanged at 1.
I would assume if there is a significant correlation between my dependent variable "performance" and the "dfpnp" then I have systematic missings in my panel and need to correct for them. In case of no correlation i would assume a random missing of data.

there are two possibilities now:
I) I include dfpnp in the full mdoel

Code:

xtreg performance promotion i.year c.Age##c.Age (...) dfpnp, fe robust

In this case Stata delivers

Code:

Coef. Robust Std. Err. t P>|t| [95% Conf. Interval] (...) .0080527 .0037771 2.13 0.034 .0006007 .0155047

--> can i conclude from the significant result that i have a problem with systematic attrition?

II) just regress performance and dfpnp:

Code:

regress performance dfpnp

delivers:

Code:

performance | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------------------+---------------------------------------------------------------- dfpnp | .0157323 .0050593 3.11 0.002 .0057947 .02567 _cons | .0516722 .003076 16.80 0.000 .0456303 .0577141 ------------------------------------------------------------------------------------------

Also here I have a significant value.

- Which is the right approach? Or are both correct? Both would lead to the same result.

Cheers! A.

Last edited by Alex Mueller; 08 Jun 2018, 06:32.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#23

08 Jun 2018, 07:01

Alex:
https://www.amazon.com/Applied-Econo.../dp/0415676827 covers this issue in Chapter 10.
In brief, methods for dealing with attrition assume a MAR mechanism and include fix such as the inverse probability weighted estimator.
However, to test for the existence of an attrition bias, you should consider the missing indicator as the regressand of a -probit- model, predictors being covariates observed for all the patients at the first wave of data .

Kind regards,
Carlo
(Stata 19.0)
Comment

Alex Mueller

Join Date: May 2018
Posts: 51

#24

08 Jun 2018, 07:31

Carlo,
thanks again.
My model

Code:

 probit dfpnp performance promoted_in_observation_period c.Age##c.Age c.work_exp##c.work_exp former_achievments job_quits number_former_employers job_rotation i.year graduate sex medium_size_enterprise

delivers:

Code:

 Probit regression                               Number of obs     =        229
                                                LR chi2(23)       =      94.82
                                                Prob > chi2       =     0.0000
Log likelihood = -323.01865                     Pseudo R2         =     0.1280

As the model is significant I have attrition, right?

In my model not all workers start in t=1 but many come in at a later t and some may leave early. So there isnt a linear process with a full "fist wave" and then step by step some people dropping from the panel. It is rather a permanent in and out.

Cheers, A.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#25

08 Jun 2018, 08:58

Alex:
the Prob>chi 2=0.0000 is telling you that your coefficients are jointkly different from zero.
You should consider the coefficients of your model to have an idea of as possible attrition-bias.
Moreover, the -probit- test should be repeated for all the waves (see the reference I quoted in my previous post).

Kind regards,
Carlo
(Stata 19.0)
Comment
Alex Mueller

Join Date: May 2018

Posts: 51
#26

11 Jun 2018, 03:53

Carlo, Daniel,
Thanks for your helpful advice.
Best, A.
Comment

Alex Mueller

Join Date: May 2018
Posts: 51

#27

08 Aug 2018, 00:55

Hi guys,
some time has gone by, but i am still working on this data set.. and another question arised:
My code is meanwhile changed to:

Code:

 .xtreg performance promoted_in_observation_period c.Age##c.Age c.work_exp##c.work_exp former_achievments job_quits number_former_employers job_rotation i.year graduate sex##medium_size_enterprise , fe robust

also focusing on the Joint effect of sex##medium_sized_company
My research now also focuses on the question of the interaction between sex and company size.
The coding is: sex=0: Woman ; sex=1: man, medium_sized_company=1: medium Company ; medium_sized_company=0: Large company
The Output is as follows:

Code:

 
note: 2015.year omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =        301
Group variable: ID                              Number of groups  =        97

R-sq:                                           Obs per group:
     within  = 0.4733                                         min =          1
     between = 0.1475                                         avg =        3.1
     overall = 0.1517                                         max =         14

                                                F(22,90)         =      24.69
corr(u_i, Xb)  = -0.5111                        Prob > F          =     0.0000

                                                     (Std. Err. adjusted for 97 clusters in ID)
------------------------------------------------------------------------------------------------
                               |               Robust
                   performance |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------------------+----------------------------------------------------------------
(...)
                           1.sex |  -.0034122   .0044845    -0.52   0.584    -.0120426    .0064253
        1.medium_size_enterprise |   .0601524   .0071202     6.32   0.001     .0447883    .0702673

        1.sex#1.medium_size_enterprise |  - .2141357   .0065118     5.11   0.004     .0475141    .0698513                   
                         _cons |   .3954576   .323874     1.23   0.197    -.2052154    1.124127
-------------------------------+----------------------------------------------------------------
                       sigma_u |  .06544805
                       sigma_e |  .03072684
                           rho |  .83111947   (fraction of variance due to u_i)
------------------------------------------------------------------------------------------------

Now I am quite unsure about the Interpretation of the results:

1) sex alone has no significant Impact in my model, but a weak negative tendency (sign of coefficient), Company size has a significant positive influence
2) The joint effect is very strong (higher coefficient) and negative (and significant)
Does this imply sex alone does not Play a role but if i work in a medium sized Company (=1), being a man (sex=1) has a negative influence on my Coefficients, whereas if i work in a large corporation (medium Company = 0), i have a higher coeffient?

3) (Maybe more statistical than stata related): Is there a way to Interpret the sex#Company term, analogous to Age#Age (with derivatives, quadratic functions, Minimum, etc)?
i.e. is there a primitive function for sex##medium_sized_company?
Like for example: -.0034122x + .0601524y - .2141357xy or something?

Many thanks in advance!

Best, Alexander-Florian

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#28

08 Aug 2018, 02:13

Alex:
- the first coefficient expresses the conditional main effect of being male when the medium_sized_company=0 ( you seem to have two level -medium and large- but it is not clear how you numerated them; moreover, I would have expected another level for small firm);
- the second coefficient expresses the conditional main effect of being female when the medium_sized_company=1;
- the third d coefficient expresses the interaction between being male and medium_sized_company=1;
- there's no scope for investigating whether a turning point exists, as both the terms of your interactions are categorical.

As an aside, to have a comprehensive idea of what's going on with your regression model, you can add the useful -allbaselevels- option (see -help estimation options-).

Kind regards,
Carlo
(Stata 19.0)
Comment
Alex Mueller

Join Date: May 2018

Posts: 51
#29

08 Aug 2018, 03:24

Carlo,

I hope you are doing well! Thanks for your (as always) quick and helpful reply.

To clarify: I am only looking at medium sized companies (Medium_Company =1) and large corporations (Medium_Company = 0)

1) I understand that the coefficients express the conditional effect, however I am unsure how to Interpret the coefficient.
- Since 1.sex is insignificant, i can conclude that being male or female does not make a difference in a large corporation?
- Since 1.medium_size is significant positive, i can conclude that being female in a medium size Company enhances my Performance by 0.0601524?
--> is it possible to write a functional term of the interaction? ( F(z) = x +y +x*y)

2) Sorry, I am still unsure about the interacting effect. if sex=1 and medium Company=1, why would that differ from the conditional effects discussed in 1.sex and 1.medium_company?

3) Thanks for the hint, I will try that asap

kr,

A.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#30

08 Aug 2018, 07:16

Alex:
I must correct my previous reply, as it seems I've looked at the wrong lines:
1) the first coefficient expresses the conditional main effect of being male vs female: regardless the size of the company, there's no evidence of gender-related difference;
2) the second coefficient the conditional main effect of being medium_size_enterprise vs large_size_enterprise: other things being equal, being a medium size company significantly explains variation in the dependent variable vs large size company;
3) the third coefficient tells you that being male (sex=1) at the top (I guess) of a medium size company significantly explains variation in the dependent variable vs large size company. To create yourself a functional form, try to replicate the results of -predict- on a handful of observations.
As an aside, please always use CODe delimiters to share what you typed and what Stata gave you back. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment