Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Implausibly small standard error

    Hi,

    I am trying to figure out why my standard error are too small and whether this model is feasible since the main variables are all omitted leaving only the interactions variables. Should I just use OLS instead and is it possible to use only fixed effects on quarters or year? Any thoughts on this?

    I have a three way panel data which I use command below

    Code:
    egen pan_id = group(gender agegroup)
    I then use xtset on i and t and regress as below
    Code:
    xtset pan_id qyear
    Code:
    . xtreg lnemp i.gender##ib5.agegroup##i.covid i.quarter i.year, fe vce(r)
    note: 2.gender omitted because of collinearity
    note: 1.agegroup omitted because of collinearity
    note: 2.agegroup omitted because of collinearity
    note: 3.agegroup omitted because of collinearity
    note: 4.agegroup omitted because of collinearity
    note: 2.gender#1.agegroup omitted because of collinearity
    note: 2.gender#2.agegroup omitted because of collinearity
    note: 2.gender#3.agegroup omitted because of collinearity
    note: 2.gender#4.agegroup omitted because of collinearity
    
    Fixed-effects (within) regression               Number of obs     =        160
    Group variable: pan_id                          Number of groups  =         10
    
    R-sq:                                           Obs per group:
         within  = 0.6450                                         min =         16
         between = 0.1870                                         avg =       16.0
         overall = 0.0070                                         max =         16
    
                                                    F(6,9)            =          .
    corr(u_i, Xb)  = -0.1448                        Prob > F          =          .
    
                                             (Std. Err. adjusted for 10 clusters in pan_id)
    ---------------------------------------------------------------------------------------
                          |               Robust
                    lnemp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ----------------------+----------------------------------------------------------------
                   gender |
                    Male  |          0  (omitted)
                          |
                 agegroup |
                   15-24  |          0  (omitted)
                   25-34  |          0  (omitted)
                   35-44  |          0  (omitted)
                   45-54  |          0  (omitted)
                          |
          gender#agegroup |
              Male#15-24  |          0  (omitted)
              Male#25-34  |          0  (omitted)
              Male#35-44  |          0  (omitted)
              Male#45-54  |          0  (omitted)
                          |
                  1.covid |   .0765138   .0136277     5.61   0.000     .0456857    .1073419
                          |
             gender#covid |
                  Male#1  |  -.0253093   1.19e-16 -2.1e+14   0.000    -.0253093   -.0253093
                          |
           agegroup#covid |
                 15-24#1  |  -.2049255   7.45e-17 -2.8e+15   0.000    -.2049255   -.2049255
                 25-34#1  |  -.1201755   5.82e-17 -2.1e+15   0.000    -.1201755   -.1201755
                 35-44#1  |  -.0738413   5.28e-17 -1.4e+15   0.000    -.0738413   -.0738413
                 45-54#1  |  -.0746435   5.48e-17 -1.4e+15   0.000    -.0746435   -.0746435
                          |
    gender#agegroup#covid |
            Male#15-24#1  |   .0587779   1.45e-16  4.1e+14   0.000     .0587779    .0587779
            Male#25-34#1  |   .0281182   1.28e-16  2.2e+14   0.000     .0281182    .0281182
            Male#35-44#1  |   .0289398   1.22e-16  2.4e+14   0.000     .0289398    .0289398
            Male#45-54#1  |   -.016157   1.28e-16 -1.3e+14   0.000     -.016157    -.016157
                          |
                  quarter |
                       2  |   .0045243    .006701     0.68   0.517    -.0106345    .0196831
                       3  |   .0179993   .0073301     2.46   0.036     .0014176    .0345811
                       4  |   .0175383   .0051526     3.40   0.008     .0058823    .0291944
                          |
                     year |
                    2018  |   .0276628   .0057882     4.78   0.001      .014569    .0407565
                    2019  |   .0511159   .0088087     5.80   0.000     .0311892    .0710425
                    2020  |   .0689929   .0159836     4.32   0.002     .0328355    .1051504
                          |
                    _cons |   7.124184   .0073743   966.09   0.000     7.107502    7.140866
    ----------------------+----------------------------------------------------------------
                  sigma_u |  .58711638
                  sigma_e |  .02629098
                      rho |  .99799878   (fraction of variance due to u_i)
    Thank you!
    Last edited by Ali Zul; 08 Apr 2021, 14:05.

  • #2
    Several things stand out.

    1. You use age group and gender to define your panel. Therefore, by definition, they are constant within panel. That is why everything involving age group and gender, except for the three way interaction involving covid, gets omitted.

    2. The sigma_e is very small, and rho is extremely close to 1. This suggests that there is almost no variation in lnemp within your panels (defined by age group and gender). It would also imply that those small standard errors are probably correct, or, if they are not, it is because your data are wrong. So I suggest you run:

    Code:
    tabstat lnemp, by(pan_id) statistics(mean sd)
    This will show you how much variability there is in lnemp within your groups. I'm pretty sure you will see it is very small. You then have to decide whether this reflects data errors or whether that is really the way it is.

    3. With only 10 panels, you don't have enough to justify using cluster robust standard errors (which is what you get from -xtreg, fe- when you specify -vce(r)-). Cluster robust errors are asymptotically correct, and while there is no consensus about the minimum acceptable number, I think pretty much every one agrees that 10 panels is too few.

    4. It is a bit unusual, though not unheard of, to define panels in terms of age groups and sex. What is the rationale for this in your context? Maybe you should, in fact, be using OLS--but please explain why you chose to do it as panel data as it may be appropriate.

    Comment


    • #3
      Hi Clyde,

      My panel dataset only consist of age group and gender across different quarters. I have gender across different age groups and different quarters (Q1-Q4). Hence why I assume it would be a panel data. Is this interpretation wrong?
      Additionally, I've created a dummy variable covid from the post impacted quarters. I would like to look at the impact of covid on these two variables age group and gender.

      Thank you!

      Best,
      Aliya

      Comment


      • #4
        You can do it that way if you want, but the results in #1 suggest that all of the variation is happening across age groups with almost nothing occurring within age group. So the results are not like what is usually seen in a panel regression and it suggests that perhaps there is a better way to think about it. You have 16 observations in each age-group. Are these 16 different people or do they represent something else? If so, what are they?

        Additionally, I've created a dummy variable covid from the post impacted quarters. I would like to look at the impact of covid on these two variables age group and gender.
        This is completely unclear.

        Comment

        Working...
        X