Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • fixed effects regression - Effects of binary variable

    Dear all,

    I have estimated a fixed effect panel regression model for work performance (dependent variable y) in my data set for about N = 100 workers for T = 15 years.
    Now I want to analyze whether a change in one binary variable x1 (promoted yes/no) leads to a significant improvement of the work performance with some control variables (x2,…, xn).


    Some in my data set get promoted after 10 years, some after 12 and some not at all, implying x1 = 0 for all periods.
    Additionally many start to work in T !=1 and stop working in T !=15 arising the problem of attrition.

    Can I simply write:

    xtreg y x1 x2 … xn if y==0, fe
    xtreg y x1 x2 … xn if y==1, fe


    and then compare coefficients?
    Do you have any other idea how to address this problem?
    In a first try I created a variable capturing whether a manager was promoted at all during the observation period. I included that variable in my model but because of the fe model Stata omitted that variable.
    Additionally is there a possibility to see if attrition is a problem? Meaning if the dropping out of the panel is correlated with the performance y?

    Many thanks!

    Kind regards,
    Alexander-Florian

  • #2
    Alex:
    welcome to this forum.
    Some comments about your query:
    - if, as it would seem from your description, you have a continuous dependent variable (y), I fail to get the last lines of code you share (...if y==0);
    - due to -fe- machinery, the variable you created will be omitted by Stata if time-invariant or collinear with other predictors;
    - attrition can surely be a problem, especially in the case you envisage: if attrition is due, say, to poor performance, those values are missing not at random and -mi- assumptions are untenable.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Carlo,
      many thanks for your Kind reply.
      I made a mistake here:
      Originally posted by Alex Mueller View Post

      xtreg y x1 x2 … xn if y==0, fe
      xtreg y x1 x2 … xn if y==1, fe
      It shall be ...if x1==0 and ... if x1==1 in order to indicate wheter the respective worker has been promoted.

      - Regarding Attrition: Is there any possibility to put that in numbers? i.e. can i check if there is a significant correlation between Performance y and dropping out of the Panel?
      (from logic there should be a Connection via the poor Performance)

      best
      Alex

      Comment


      • #4
        Alex:
        you can use -fvvarlist- notation to innestigate the contribution of -x1- (adjusted for the other predictors) to the variation of the dependent variable.
        As far as dealing with missing values is concerned, if panel units scoring low performance in wave 1 are missing in wave 2, it might be that your data are missing at random (MAR).
        But if you cannot rely on a previous measurement of performance, you should assume that data are missing not at random (MNAR).
        Unfortunately, you cannot test MAR vs MNAR mechanism.
        For more details (and interesting references) on missing data, see -mi-related entries in Stata .pdf manual.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Carlo, thanks a lot for your comments.

          I don’t really get the fv command. I understand from the manual it “allows the model to contain categorical variables”.
          However i don’t see why I need that specification: my fe model contains some other (for some individuals time variant) categorical variables (e.g. Post Code) but I did not use the fv specification.
          Does that mean the effect of (in this case) the region of residence is estimated wrongly in my model?

          Regarding the attrition:


          Originally posted by Carlo Lazzaro View Post
          Alex:
          As far as dealing with missing values is concerned, if panel units scoring low performance in wave 1 are missing in wave 2, it might be that your data are missing at random (MAR).
          But if you cannot rely on a previous measurement of performance, you should assume that data are missing not at random (MNAR).
          Unfortunately, you cannot test MAR vs MNAR mechanism.
          For more details (and interesting references) on missing data, see -mi-related entries in Stata .pdf manual.
          Sorry I am a little confused: Shouldnt it be the other way round? If performance is related to attrition then I would assume a MNAR

          Thanks for your patience


          Cheers,Alex

          Comment


          • #6
            Maybe it's the best to share some part of the dataset and the code i used:

            Dataset:
            Code:
             
            year workerID promoted_in_observation_period performance age postcode work_exp
            2000 1 1,10 32 10115 10
            2000 2 0,90 44 10319 22
            2000 3 1,20 28 10435 8
            2001 4 0,80 50 10435 33
            2001 1 1 1,50 33 10115 11
            2001 2 1,20 45 10319 23
            2002 1 1 1,60 34 10245 12
            2002 3 1 0,90 29 10435 9
            2002 4 1,10 51 10435 34
            Code:
            xtset workerID year, yearly
            xtreg performance promoted_in_observation_period age postcode work_exp i.year, fe (vce) robust
            in my model promoted_in_observation_period would be the variable of interest. Does this deliver the desired output?

            Best, A.

            Comment


            • #7
              Alex:
              - if you have a two-level categorical variable, using -fvvarlist- or not does not make any difference;
              - if you have a three(or more)-level categorical variable, -fvvarlist- makes a relevant difference, because it tells Stata that the predictor is not continuous and integers are, in fact, levels.
              As fa as the missing values quastion is concerned:
              - if, in a previous wave of data, those who reported poor performance (assumption: performance score is self-reported) are missing in the next wave, you can assume that missingness is MAR, as it depends on the observed data (that is, the poor performance score registered in the previous wave of data) and not on unobserved values;
              - if you do not have a previous wave of data and panel units do not report any performance score, you can suspect that missing is not at random, as it probably depends on the unreported data (instead of reporting a poor performance score, panel units skip the question altogether).
              About your code:
              -if performance is a continuous variable, -xtreg- is the way to go (the user-written command -xtoverid- turns out handy to test which specification, -fe- or -re-, fits your data better; -please note that -hausman- allows default standard errors only);
              - if working experience is expressed in years, you may want to test whethet there's a turning point (that is, a squared relationship between -work_exp- and -performance-) and rewrite the regression code accordingly (I still assume that -fe- is the right specification for your data):
              -(vce) is pleonastic with -robust-
              Code:
              xtreg performance promoted_in_observation_period age postcode c.work_exp##c.work_exp i.year, fe robust
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Carlo,
                Many thanks again.
                So does this imply if I code the postal code variable as region 1,2,3,…,n (postcode_regiondummy) my code would be

                ... fv postcode_regiondummy … ?

                For the missing values, I still don’t see why I would assume a random missing if attrition is correlated with performance. If poor performance is linked to being fired i.e. drop out of the panel it seems –for me- to be a systematic error/attrition.

                -yes, performance shall be continuous. I want to go for fe model as I assume performance depends on the unobservable ,though time-invariant, natural skill of a worker.
                I will have a look at xtoverid. Does that mean the hausman test is not applicable to my setting? Why would the standard error of my models would not be default.

                - the squared relationship approach sounds very interesting but I never worked with non-linear models. Does the model now simply square the experience in years? I dont really see why that is happening =) Additionally Stata help says the c. implies a continous variable whereas work_exp would be discrete (0,1,2,...n) (countable infite; maybe in this context limited to say a max of 50 years or so)
                - thanks for that hint about vce robust

                Code:
                xtreg performance promoted_in_observation_period age i.postcode_regiondummy c.work_exp##c.work_exp i.year, fe robust
                Carlo, many thanks for your advise!!!
                Best, A.
                Last edited by Alex Mueller; 07 Jun 2018, 07:03.

                Comment


                • #9
                  Alex:
                  - your chunk of code will be -i.postcode_regiondummy-;
                  - there's an interesting example on the (subtle) difference between MAR and MNAR data (it focuses on repeated math tests, but that's immaterial) in https://www.amazon.com/Missing-Data-.../dp/1593853939, pages 43-50;
                  - if you introduce a squarede terms, -xtreg-is still linear in coefficients but non linear in the relationship between the predictor and the dependent variable. The logic behing squaring experience is that performance scored at its top after some years of experience, then it usually declines. In your case the variable is discrete because its more comfortable to deal with integers representing years (all in all it's the same for age), but time is ontologically continuous, so check whether squaring experience makes sense with your data.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Carlo,
                    thank you for your help!
                    The mentioned book is unfortunately not available in our library but i will try to find alternative literature.
                    i ran the regression again with your hints :
                    Code:
                     .xtreg performance promoted_in_observation_period c.Age##c.Age c.work_exp##c.work_exp former_achievments job_quits number_former_employers job_rotation i.year graduate sex medium_size_enterprise , fe robust
                    which generated the following outcome:
                    Code:
                    note: 2015.year omitted because of collinearity
                    
                    Fixed-effects (within) regression               Number of obs     =        301
                    Group variable: ID                              Number of groups  =        97
                    
                    R-sq:                                           Obs per group:
                         within  = 0.4426                                         min =          1
                         between = 0.1227                                         avg =        3.1
                         overall = 0.1647                                         max =         14
                    
                                                                    F(24,96)         =      25.57
                    corr(u_i, Xb)  = -0.4997                        Prob > F          =     0.0000
                    
                                                                         (Std. Err. adjusted for 97 clusters in ID)
                    ------------------------------------------------------------------------------------------------
                                                   |               Robust
                                       performance |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------------------------+----------------------------------------------------------------
                    promoted_in_observation_period |   .0090608   .0052476     1.73   0.086    -.0012893    .0194108
                                               Age |  -.0191297    .011546    -1.66   0.099    -.0419024    .0036429
                                                   |
                                       c.Age#c.Age |   .0002122    .000092     2.31   0.022     .0000307    .0003937
                                                   |
                                          work_exp |    .005203   .0060487     0.86   0.391    -.0067269     .017133
                                                   |
                             c.work_exp#c.work_exp |   -.000138   .0000785    -1.76   0.080    -.0002928    .0000168
                                                   |
                                former_achievments |  -.0000657   .0002536    -0.26   0.796     -.000566    .0004345
                                         job_quits |  -.0051987   .0075022    -0.69   0.489    -.0199954    .0095981
                           number_former_employers |   .0032433   .0076221     0.43   0.671      -.01179    .0182766
                                      job_rotation |   .0004076   .0008302     0.49   0.624    -.0012299    .0020451
                                                   |
                                              year |
                                             2003  |   .0018279   .0042328     0.43   0.666    -.0065205    .0101763
                                             2004  |   .0050953   .0044401     1.15   0.253    -.0036621    .0138527
                                             2005  |   .0115785   .0063166     1.83   0.068    -.0008798    .0240369
                                             2006  |  -.0008196   .0044127    -0.19   0.853    -.0095229    .0078838
                                             2007  |  -.0032424   .0053398    -0.61   0.544    -.0137742    .0072894
                                             2008  |  -.0116758   .0054543    -2.14   0.034    -.0224335   -.0009182
                                             2009  |  -.0100215   .0068462    -1.46   0.145    -.0235244    .0034814
                                             2010  |   -.004674   .0092071    -0.51   0.612    -.0228335    .0134855
                                             2011  |  -.0132499   .0093119    -1.42   0.156    -.0316162    .0051163
                                             2012  |  -.0155686   .0063318    -2.46   0.015    -.0280571   -.0030802
                                             2013  |  -.0108537   .0073446    -1.48   0.141    -.0253397    .0036323
                                             2014  |  -.0004855   .0077775    -0.06   0.950    -.0158254    .0148544
                                             2015  |          0  (omitted)
                                                   |
                                          graduate |  -.0115111   .0058921    -1.95   0.052    -.0231323    .0001101
                                               sex |  -.0027046   .0046743    -0.58   0.564    -.0119238    .0065147
                            medium_size_enterprise |   .0590412   .0068199     8.66   0.000     .0455901    .0724923
                                             _cons |   .4233649   .3144232     1.35   0.180    -.1967821    1.043512
                    -------------------------------+----------------------------------------------------------------
                                           sigma_u |  .07633407
                                           sigma_e |  .02945436
                                               rho |  .87040613   (fraction of variance due to u_i)
                    ------------------------------------------------------------------------------------------------
                    Does this imply Age and work_exp are not significant but Agesquared and work_expsquared? (at least p=0.08) Does this mean there is the quadratic relationship you mentioned?
                    None of the years seem to be significant apart from 2014. In my opinion that does not have a lot of explanatory power, but just controls for the year.

                    --> Can i conclude that whether a worker was promoted matters positively (coef 0.009) at least at the 10 percent level

                    Cheers, A.

                    Comment


                    • #11
                      Alex:
                      let's focus on the usual arbitrary level p<0.05:
                      - results for age reports a turning point at [(-Age)/(2*c.Age#c.Age)]=45.07 years.
                      Studying the first derivative, it should be a minimum (please check it).
                      If what above is true, provided that 45.07 years is included among the values of -Age- in your dataset, performance increases after 45.07 years, when adjusted for the remaining predictors.
                      -i.year- seems redundant (you can test if they're jointly significant via -testparm(i.year)-; under -fe- -i.year- tells that, within each panel, time shows a negligible effect in explaining variation in promotion, when adjusted for the remaining predictors;
                      - I would conclude that previous promotions show a dubious contribution on performance: all in all, that makes sense if we consider that in many organizations works the so called promoveatur ut amoveatur scheme (that is, some people, obviously not all, are promoted to stop making damages in their previous positions).
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment


                      • #12
                        Carlo,
                        sorry I dont understand that calculation:

                        Originally posted by Carlo Lazzaro View Post
                        - results for age reports a turning point at [(-Age)/(2*c.Age#c.Age)]=45.07 years.
                        Studying the first derivative, it should be a minimum (please check it).
                        If what above is true, provided that 45.07 years is included among the values of -Age- in your dataset, performance increases after 45.07 years, when adjusted for the remaining predictors..
                        How do you come up with 45.07? In my data set tha variable is rounded to full years (min 33, max 70, mean 47,63) How can i build the derivative of a variable?

                        testparm (i.years) delivers
                        Code:
                        2003.year = 0
                        ... until 2015.year = 0
                        chi2( 14) =   26.44
                        Prob > chi2 =    0.0148
                        testparm (i.years), equal delivers
                        Code:
                         ( 1)  2003.year +2004.year = 0
                        ... until 2003.year + 2015.year = 0
                        chi2( 12) =   23.54
                        Prob > chi2 =    0.0235
                        Which of the both options is correct?
                        As the 0 Hypothesis is "coefficients are equal" that means years does have an impact (and is NOT redundant) since the hypothesis is rejected, right?
                        thanks, Alex
                        Last edited by Alex Mueller; 08 Jun 2018, 02:53.

                        Comment


                        • #13
                          I just skimmed through the posts here; the two first potential problems I gather are

                          1. Should promotion not be a function of performance, i,e, higher performance leads promotion rather than the other way round? Perhaps you should somehow address this problem of reverse causality.

                          2. Ignoring 1., do you expect promotion to affect performance for one year only? I ask, because you are looking at the effect of promotion in one/this year. I do not know about the theoretical background, but I could imagine that promotion should be coded 1 for every year following initial promotion.

                          Best
                          Daniel

                          Comment


                          • #14
                            Alex:
                            - calculation of the turning point with respect to -Age- (which is expressed in years) (this calculation implies the first derivative: see below):
                            Code:
                            . di [(-(-.0191297))/(2*.0002122)]
                            45.074694
                            as 45.07 falls within the range for _Age- in your dataset you can report a turning point.
                            - the primitive function for -Age- is: .0002122x2 -.0191297x;
                            - the first derivative of the abovementioned primitive is: [2*(0002122)x - .0191297]
                            - to identify the turning point as minimum or a maximum, you should find out the values for which the first derivative is >0 and <0, that is:
                            - [2*(0002122)x - .0191297]>0 and [2*(0002122)x - .0191297]<0. The slope of the function is positive for values >45.074694 and negative for values<45.074694. Hence, the turning point is a minimum;
                            - -i.years- (I would consider the first code) are jointly significant and should be kept in your regression model. Jointly, they tell that time gives a relevant contribution in explaining the within panel vraition of the dependent variable.

                            PS: crossed in the cyberspace with Daniel's reply, who, as usual, gives valuable inputs to reflect upon.
                            Last edited by Carlo Lazzaro; 08 Jun 2018, 03:25.
                            Kind regards,
                            Carlo
                            (Stata 19.0)

                            Comment


                            • #15
                              Daniel, thanks for your hints:

                              Originally posted by daniel klein View Post
                              I just skimmed through the posts here; the two first potential problems I gather are

                              1. Should promotion not be a function of performance, i,e, higher performance leads promotion rather than the other way round? Perhaps you should somehow address this problem of reverse causality.

                              2. Ignoring 1., do you expect promotion to affect performance for one year only? I ask, because you are looking at the effect of promotion in one/this year. I do not know about the theoretical background, but I could imagine that promotion should be coded 1 for every year following initial promotion.

                              Best
                              Daniel
                              You are absolutely right! To adress these problem i look at the effect of promotion on performance in the year after promotion took place. I.e. in th year after promotion the indicator turns to 1 and sticks to 1 in all following periods. promoted_in_observation_period therefore will for example look like 0 0 0 1 1 1 1 1 1 1... for a worker promoted in t=3 and sticks to 1 until T = 15.
                              This approach should control for reverse causality and the second issues.

                              Cheers, Alex

                              Comment

                              Working...
                              X