Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Probit/Logit with Fixed Effect and Repeated values within Panel

    Dear all,

    I'm having trouble running a probit model with fixed effect for a dataset with repeated values within the panel.

    My data is a panel of firm and year, however, data is on employment-level. For example, the data is from 2000 to 2020, and there are many firms each year, and each firm hires multiple new workers every year.
    I've created a dummy variable for a worker to characterize whether she is skilled or not, and I'm interested to see whether the probability of hiring a skilled worker increases/decreases over time (trend coefficients of year dummies) controlling for firm fixed effect.

    Originally, I used a linear probability model and run OLS, regressing the dummy variable on year dummies with firm fixed effect:
    reghdfe skilldummy i.year, absorb(firmID)

    Now the referee is not happy with the OLS and asked us to run a probit/logit model with firm fixed effect.
    My understanding is that Probit with FE doesn't exist in STATA because of possible incidental parameter problem.

    Here are two methods I've already tried but failed:
    1. I tried xtprobit and xtlogit, but they only work for the case that each firm has only one observation within each year. But I have repeated values within the panel (i.e., each firm has multiple new workers within a year).
    2. I used Probit directly: probit skilldummy i.year i.firmID. However, I have too many firm observations and STATA failed to execute the code.

    I was wondering if there is any way to get the probit or logit model with fixed effect for the data with repeated values within the panel?

    Many thanks in advance!
    Last edited by Lucia Jiang; 28 Aug 2023, 04:25.

  • #2
    Lucia:
    1) if Stata returns -repeated time values within panel- error, you can simply -xtset- your dataset with -panelid- only. This fix comes at the cost of making time-series operators, such as lags and leads, unavailable, yet allows you to plug -timevar- as a predictor in te right-hand side of your regression variable;
    2) -xtlogit,fe- refers to conditional fe, because of possible incidental parameter bias. That said, you can run that code with multipleobservations per panel, as in the following toy-example:
    Code:
    . use https://www.stata-press.com/data/r17/union
    (NLS Women 14-24 in 1968)
    
    . xtlogit union age grade not_smsa south##c.year, fe
    note: multiple positive outcomes within groups encountered.
    note: 2,744 groups (14,165 obs) omitted because of all positive or
          all negative outcomes.
    
    Iteration 0:   log likelihood = -4516.5881  
    Iteration 1:   log likelihood = -4510.8906  
    Iteration 2:   log likelihood =  -4510.888  
    Iteration 3:   log likelihood =  -4510.888  
    
    Conditional fixed-effects logistic regression        Number of obs    = 12,035
    Group variable: idcode                               Number of groups =  1,690
    
                                                         Obs per group:
                                                                      min =      2
                                                                      avg =    7.1
                                                                      max =     12
    
                                                         LR chi2(6)       =  78.60
    Log likelihood = -4510.888                           Prob > chi2      = 0.0000
    
    ------------------------------------------------------------------------------
           union | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             age |   .0710973   .0960536     0.74   0.459    -.1171643    .2593589
           grade |   .0816111   .0419074     1.95   0.051    -.0005259     .163748
        not_smsa |   .0224809   .1131786     0.20   0.843     -.199345    .2443069
         1.south |  -2.856488   .6765694    -4.22   0.000    -4.182539   -1.530436
            year |  -.0636853   .0967747    -0.66   0.510    -.2533602    .1259896
                 |
    south#c.year |
              1  |   .0264136   .0083216     3.17   0.002     .0101036    .0427235
    ------------------------------------------------------------------------------
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Repeated time values within panel error indicates that your dataset is not a panel. As you describe it, you have worker-level data and not firm-level data. On average, how many workers do you have per firm? If this number is sufficiently large, there is no problem with running logit with firm dummies (note the terminology dummies and not fixed effects; see https://www.statalist.org/forums/for...uated-at-means). Note that -xtlogit,fe- is conditional logit, in which case you are conditioning the fixed effects out of the likelihood. It should also be possible to use this estimator. I believe that your problem is that you are trying to xtset using both firm and year, where using firm only is sufficient. See https://journals.sagepub.com/doi/ful...6867X231162020.

      Code:
      xtset firm

      Comment


      • #4
        Originally posted by Carlo Lazzaro View Post
        Lucia:
        1) if Stata returns -repeated time values within panel- error, you can simply -xtset- your dataset with -panelid- only. This fix comes at the cost of making time-series operators, such as lags and leads, unavailable, yet allows you to plug -timevar- as a predictor in te right-hand side of your regression variable;
        2) -xtlogit,fe- refers to conditional fe, because of possible incidental parameter bias. That said, you can run that code with multipleobservations per panel, as in the following toy-example:
        Code:
        . use https://www.stata-press.com/data/r17/union
        (NLS Women 14-24 in 1968)
        
        . xtlogit union age grade not_smsa south##c.year, fe
        note: multiple positive outcomes within groups encountered.
        note: 2,744 groups (14,165 obs) omitted because of all positive or
        all negative outcomes.
        
        Iteration 0: log likelihood = -4516.5881
        Iteration 1: log likelihood = -4510.8906
        Iteration 2: log likelihood = -4510.888
        Iteration 3: log likelihood = -4510.888
        
        Conditional fixed-effects logistic regression Number of obs = 12,035
        Group variable: idcode Number of groups = 1,690
        
        Obs per group:
        min = 2
        avg = 7.1
        max = 12
        
        LR chi2(6) = 78.60
        Log likelihood = -4510.888 Prob > chi2 = 0.0000
        
        ------------------------------------------------------------------------------
        union | Coefficient Std. err. z P>|z| [95% conf. interval]
        -------------+----------------------------------------------------------------
        age | .0710973 .0960536 0.74 0.459 -.1171643 .2593589
        grade | .0816111 .0419074 1.95 0.051 -.0005259 .163748
        not_smsa | .0224809 .1131786 0.20 0.843 -.199345 .2443069
        1.south | -2.856488 .6765694 -4.22 0.000 -4.182539 -1.530436
        year | -.0636853 .0967747 -0.66 0.510 -.2533602 .1259896
        |
        south#c.year |
        1 | .0264136 .0083216 3.17 0.002 .0101036 .0427235
        ------------------------------------------------------------------------------
        Thanks a lot, Carlo!

        Comment


        • #5
          Originally posted by Andrew Musau View Post
          Repeated time values within panel error indicates that your dataset is not a panel. As you describe it, you have worker-level data and not firm-level data. On average, how many workers do you have per firm? If this number is sufficiently large, there is no problem with running logit with firm dummies (note the terminology dummies and not fixed effects; see https://www.statalist.org/forums/for...uated-at-means). Note that -xtlogit,fe- is conditional logit, in which case you are conditioning the fixed effects out of the likelihood. It should also be possible to use this estimator. I believe that your problem is that you are trying to xtset using both firm and year, where using firm only is sufficient. See https://journals.sagepub.com/doi/ful...6867X231162020.

          Code:
          xtset firm
          Thanks Andrew! Now xtlogit works.

          Comment

          Working...
          X