Implausibly small standard error

Ali Zul

Join Date: Mar 2021
Posts: 23

Implausibly small standard error

08 Apr 2021, 13:55

Hi,

I am trying to figure out why my standard error are too small and whether this model is feasible since the main variables are all omitted leaving only the interactions variables. Should I just use OLS instead and is it possible to use only fixed effects on quarters or year? Any thoughts on this?

I have a three way panel data which I use command below

Code:

egen pan_id = group(gender agegroup)

I then use xtset on i and t and regress as below

Code:

xtset pan_id qyear

Code:

. xtreg lnemp i.gender##ib5.agegroup##i.covid i.quarter i.year, fe vce(r)
note: 2.gender omitted because of collinearity
note: 1.agegroup omitted because of collinearity
note: 2.agegroup omitted because of collinearity
note: 3.agegroup omitted because of collinearity
note: 4.agegroup omitted because of collinearity
note: 2.gender#1.agegroup omitted because of collinearity
note: 2.gender#2.agegroup omitted because of collinearity
note: 2.gender#3.agegroup omitted because of collinearity
note: 2.gender#4.agegroup omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =        160
Group variable: pan_id                          Number of groups  =         10

R-sq:                                           Obs per group:
     within  = 0.6450                                         min =         16
     between = 0.1870                                         avg =       16.0
     overall = 0.0070                                         max =         16

                                                F(6,9)            =          .
corr(u_i, Xb)  = -0.1448                        Prob > F          =          .

                                         (Std. Err. adjusted for 10 clusters in pan_id)
---------------------------------------------------------------------------------------
                      |               Robust
                lnemp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------------+----------------------------------------------------------------
               gender |
                Male  |          0  (omitted)
                      |
             agegroup |
               15-24  |          0  (omitted)
               25-34  |          0  (omitted)
               35-44  |          0  (omitted)
               45-54  |          0  (omitted)
                      |
      gender#agegroup |
          Male#15-24  |          0  (omitted)
          Male#25-34  |          0  (omitted)
          Male#35-44  |          0  (omitted)
          Male#45-54  |          0  (omitted)
                      |
              1.covid |   .0765138   .0136277     5.61   0.000     .0456857    .1073419
                      |
         gender#covid |
              Male#1  |  -.0253093   1.19e-16 -2.1e+14   0.000    -.0253093   -.0253093
                      |
       agegroup#covid |
             15-24#1  |  -.2049255   7.45e-17 -2.8e+15   0.000    -.2049255   -.2049255
             25-34#1  |  -.1201755   5.82e-17 -2.1e+15   0.000    -.1201755   -.1201755
             35-44#1  |  -.0738413   5.28e-17 -1.4e+15   0.000    -.0738413   -.0738413
             45-54#1  |  -.0746435   5.48e-17 -1.4e+15   0.000    -.0746435   -.0746435
                      |
gender#agegroup#covid |
        Male#15-24#1  |   .0587779   1.45e-16  4.1e+14   0.000     .0587779    .0587779
        Male#25-34#1  |   .0281182   1.28e-16  2.2e+14   0.000     .0281182    .0281182
        Male#35-44#1  |   .0289398   1.22e-16  2.4e+14   0.000     .0289398    .0289398
        Male#45-54#1  |   -.016157   1.28e-16 -1.3e+14   0.000     -.016157    -.016157
                      |
              quarter |
                   2  |   .0045243    .006701     0.68   0.517    -.0106345    .0196831
                   3  |   .0179993   .0073301     2.46   0.036     .0014176    .0345811
                   4  |   .0175383   .0051526     3.40   0.008     .0058823    .0291944
                      |
                 year |
                2018  |   .0276628   .0057882     4.78   0.001      .014569    .0407565
                2019  |   .0511159   .0088087     5.80   0.000     .0311892    .0710425
                2020  |   .0689929   .0159836     4.32   0.002     .0328355    .1051504
                      |
                _cons |   7.124184   .0073743   966.09   0.000     7.107502    7.140866
----------------------+----------------------------------------------------------------
              sigma_u |  .58711638
              sigma_e |  .02629098
                  rho |  .99799878   (fraction of variance due to u_i)

Thank you!

Last edited by Ali Zul; 08 Apr 2021, 14:05.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#2

08 Apr 2021, 14:14

Several things stand out.

1. You use age group and gender to define your panel. Therefore, by definition, they are constant within panel. That is why everything involving age group and gender, except for the three way interaction involving covid, gets omitted.

2. The sigma_e is very small, and rho is extremely close to 1. This suggests that there is almost no variation in lnemp within your panels (defined by age group and gender). It would also imply that those small standard errors are probably correct, or, if they are not, it is because your data are wrong. So I suggest you run:

Code:

tabstat lnemp, by(pan_id) statistics(mean sd)

This will show you how much variability there is in lnemp within your groups. I'm pretty sure you will see it is very small. You then have to decide whether this reflects data errors or whether that is really the way it is.

3. With only 10 panels, you don't have enough to justify using cluster robust standard errors (which is what you get from -xtreg, fe- when you specify -vce(r)-). Cluster robust errors are asymptotically correct, and while there is no consensus about the minimum acceptable number, I think pretty much every one agrees that 10 panels is too few.

4. It is a bit unusual, though not unheard of, to define panels in terms of age groups and sex. What is the rationale for this in your context? Maybe you should, in fact, be using OLS--but please explain why you chose to do it as panel data as it may be appropriate.
1 like
Comment
Ali Zul

Join Date: Mar 2021

Posts: 23
#3

16 Apr 2021, 04:54

Hi Clyde,

My panel dataset only consist of age group and gender across different quarters. I have gender across different age groups and different quarters (Q1-Q4). Hence why I assume it would be a panel data. Is this interpretation wrong?
Additionally, I've created a dummy variable covid from the post impacted quarters. I would like to look at the impact of covid on these two variables age group and gender.

Thank you!

Best,
Aliya
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#4

18 Apr 2021, 12:08

You can do it that way if you want, but the results in #1 suggest that all of the variation is happening across age groups with almost nothing occurring within age group. So the results are not like what is usually seen in a panel regression and it suggests that perhaps there is a better way to think about it. You have 16 observations in each age-group. Are these 16 different people or do they represent something else? If so, what are they?

Additionally, I've created a dummy variable covid from the post impacted quarters. I would like to look at the impact of covid on these two variables age group and gender.

This is completely unclear.
Comment

Announcement

Implausibly small standard error

Comment

Comment

Comment