Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Thanks Clyde for your help. I have clearer idea now.

    Regards

    Stefano

    Comment


    • #17
      This is an old post, which i just saw only today. I am getting an error message "insufficient observations when i attempt to run the following simple regression using 14,500 observations:
      quietly eststo est1: xtreg total_employer_cost year##i.race_grp
      where total_employer_cost, race_grp= race groups which are 4 groups. The year range is 2000 to 2022. How could i address this problem? Thanks

      Comment


      • #18
        Have you checked if you have missing values? Show us the results of the following:

        Code:
        describe total_employer_cost year race_grp
        misstable summarize total_employer_cost year race_grp
        Last edited by Andrew Musau; 06 Jul 2023, 13:53.

        Comment


        • #19
          Thanks Andrew for getting to me. Here are the results you requested:
          describe total_employer_cost year race_grp

          storage display value
          variable name type format label variable label

          total_employe~t float %9.0g
          year int %8.0g * Year
          race_grp byte %10.0g wbho_only
          * Race: white only, black only, hispanic, other

          . misstable summarize total_employer_cost year race_grp
          Obs<.

          Variable Obs=. Obs>. Obs<. unique values Min Max

          total_employer_cost 139,955 14,727 >500 1.710358 316.5543

          Comment


          • #20
            So the output shows that you have 139,995 missing values for your outcome total_employer_cost. It is highly likely that this is the cause of your error as Stata implements listwise-deletion of missing values. You can count how many observations are kept each year in case you run a regression using the code below:

            Code:
            gen available= !missing(total_employer_cost) & !missing(race_grp) & !missing(year)
            tab year available if available

            Comment


            • #21
              Thanks Andrew for your comments, very helpful. I did as you suggested and found I still have at least 130,000 observations each year. Please see the able below. Do you still think few observations is the reason behind the error I am getting?:
              . tab year available if available

              available
              Year 1 Total

              2000 161,163 161,163
              2001 171,478 171,478
              2002 184,035 184,035
              2003 180,657 180,657
              2004 177,702 177,702
              2005 178,959 178,959
              2006 178,484 178,484
              2007 176,647 176,647
              2008 174,593 174,593
              2009 169,298 169,298
              2010 167,352 167,352
              2011 165,801 165,801
              2012 165,697 165,697
              2013 165,492 165,492
              2014 166,730 166,730
              2015 165,002 165,002
              2016 165,427 165,427
              2017 163,393 163,393
              2018 159,524 159,524
              2019 154,476 154,476
              2020 133,339 133,339

              Total 3,525,249 3,525,249

              Comment


              • #22
                Let's try out with the first four nonmissing panels at each year. Copy and paste the result of the following:

                Code:
                gen missing= missing(total_employer_cost)
                bys year (missing `r(panelvar)'): gen tag= _n<=4
                dataex  `r(panelvar)' year total_employer_cost race_grp if tag

                Comment


                • #23
                  Thanks Andrew. Here is what i get when i run the code you provided.

                  ----------------------- copy starting from the next line -----------------------
                  Code:
                  * Example generated by -dataex-. For more info, type help dataex
                  clear
                  input int year float total_employer_cost byte race_grp
                  2000  31.95709 1
                  2000  31.67816 1
                  2000  31.52447 1
                  2000  34.48257 1
                  2001   39.2215 1
                  2001 37.063946 1
                  2001  35.45247 1
                  2001  23.55646 1
                  2002 22.883417 1
                  2002 17.048145 1
                  2002  46.76455 1
                  2002  34.40501 1
                  2003 30.457506 1
                  2003  24.59294 1
                  2003  43.18762 1
                  2003 15.800676 1
                  2004 32.914497 1
                  2004 27.315195 1
                  2004 27.550533 1
                  2004 16.457249 3
                  2005   21.6062 3
                  2005  35.65023 1
                  2005 22.146355 1
                  2005  29.10329 1
                  2006  41.19201 1
                  2006   94.1767 1
                  2006  61.91007 1
                  2006 16.620155 1
                  2007  20.25581 1
                  2007  38.88393 1
                  2007 25.076696 1
                  2007  50.80833 3
                  2008  41.39631 3
                  2008  37.97585 1
                  2008 14.913792 1
                  2008  41.48207 1
                  2009   39.5593 1
                  2009  19.21947 1
                  2009  29.98279 1
                  2009  49.38342 1
                  2010 30.604553 1
                  2010 14.638657 1
                  2010 31.229134 1
                  2010 19.127846 3
                  2011 36.411217 1
                  2011 20.729683 1
                  2011 34.354736 1
                  2011  55.00809 1
                  2012  26.69479 1
                  2012  65.17803 1
                  2012 17.450657 3
                  2012  54.47948 1
                  2013  52.22928 3
                  2013 33.180954 1
                  2013  17.22085 1
                  2013 20.738096 3
                  2014 37.632053 1
                  2014  85.86938 2
                  2014 69.278915 1
                  2014 13.217688 3
                  2015 37.869564 1
                  2015  29.30383 1
                  2015 23.968664 3
                  2015 14.426502 4
                  2016  39.06137 1
                  2016 21.407064 1
                  2016  16.90869 4
                  2016   19.6449 1
                  2017  17.46366 1
                  2017  25.93992 2
                  2017  34.92732 1
                  2017 18.162205 1
                  2018 19.473356 1
                  2018 36.806915 1
                  2018  26.56913 1
                  2018 32.696007 1
                  2019 24.718136 4
                  2019 25.137085 1
                  2019  78.06974 1
                  2019    17.837 1
                  2020  75.32077 1
                  2020  28.20381 1
                  2020  19.90857 4
                  2020   65.1387 1
                  end
                  label values race_grp wbho_only
                  label def wbho_only 1 "White only", modify
                  label def wbho_only 2 "Black only", modify
                  label def wbho_only 3 "Hispanic", modify
                  label def wbho_only 4 "Other", modify
                  ------------------ copy up to and including the previous line ------------------

                  Listed 84 out of 136487 observations

                  Comment


                  • #24
                    Where is your panel identifier? That is, the variable that uniquely identifies an individual in the sample. You need this to xtset your data and run the regression. Had you properly done this, the code I gave you in #22 should have added this variable to the dataex output. So for now, I will speculate that this is the cause of your problem. If I create one as below, I run the regression with no issues.

                    Code:
                    bys year: gen id=_n
                    xtset id year
                    xtreg total_employer_cost year##i.race_grp
                    Res.:

                    Code:
                    . xtset id year
                           panel variable:  id (strongly balanced)
                            time variable:  year, 2000 to 2020
                                    delta:  1 unit
                    
                    . xtreg total_employer_cost year##i.race_grp
                    note: 2000.year#2.race_grp identifies no observations in the sample.
                    note: 2000.year#3.race_grp identifies no observations in the sample.
                    note: 2000.year#4.race_grp identifies no observations in the sample.
                    note: 2001.year#2.race_grp identifies no observations in the sample.
                    note: 2001.year#3.race_grp identifies no observations in the sample.
                    note: 2001.year#4.race_grp identifies no observations in the sample.
                    note: 2002.year#2.race_grp identifies no observations in the sample.
                    note: 2002.year#3.race_grp identifies no observations in the sample.
                    note: 2002.year#4.race_grp identifies no observations in the sample.
                    note: 2003.year#2.race_grp identifies no observations in the sample.
                    note: 2003.year#3.race_grp identifies no observations in the sample.
                    note: 2003.year#4.race_grp identifies no observations in the sample.
                    note: 2004.year#2.race_grp identifies no observations in the sample.
                    note: 2004.year#4.race_grp identifies no observations in the sample.
                    note: 2005.year#2.race_grp identifies no observations in the sample.
                    note: 2005.year#4.race_grp identifies no observations in the sample.
                    note: 2006.year#2.race_grp identifies no observations in the sample.
                    note: 2006.year#3.race_grp identifies no observations in the sample.
                    note: 2006.year#4.race_grp identifies no observations in the sample.
                    note: 2007.year#2.race_grp identifies no observations in the sample.
                    note: 2007.year#4.race_grp identifies no observations in the sample.
                    note: 2008.year#2.race_grp identifies no observations in the sample.
                    note: 2008.year#4.race_grp identifies no observations in the sample.
                    note: 2009.year#2.race_grp identifies no observations in the sample.
                    note: 2009.year#3.race_grp identifies no observations in the sample.
                    note: 2009.year#4.race_grp identifies no observations in the sample.
                    note: 2010.year#2.race_grp identifies no observations in the sample.
                    note: 2010.year#4.race_grp identifies no observations in the sample.
                    note: 2011.year#2.race_grp identifies no observations in the sample.
                    note: 2011.year#3.race_grp identifies no observations in the sample.
                    note: 2011.year#4.race_grp identifies no observations in the sample.
                    note: 2012.year#2.race_grp identifies no observations in the sample.
                    note: 2012.year#4.race_grp identifies no observations in the sample.
                    note: 2013.year#2.race_grp identifies no observations in the sample.
                    note: 2013.year#4.race_grp identifies no observations in the sample.
                    note: 2014.year#4.race_grp identifies no observations in the sample.
                    note: 2015.year#2.race_grp identifies no observations in the sample.
                    note: 2015.year#3.race_grp omitted because of collinearity.
                    note: 2016.year#2.race_grp identifies no observations in the sample.
                    note: 2016.year#3.race_grp identifies no observations in the sample.
                    note: 2017.year#2.race_grp omitted because of collinearity.
                    note: 2017.year#3.race_grp identifies no observations in the sample.
                    note: 2017.year#4.race_grp identifies no observations in the sample.
                    note: 2018.year#2.race_grp identifies no observations in the sample.
                    note: 2018.year#3.race_grp identifies no observations in the sample.
                    note: 2018.year#4.race_grp identifies no observations in the sample.
                    note: 2019.year#2.race_grp identifies no observations in the sample.
                    note: 2019.year#3.race_grp identifies no observations in the sample.
                    note: 2020.year#2.race_grp identifies no observations in the sample.
                    note: 2020.year#3.race_grp identifies no observations in the sample.
                    note: 2020.year#4.race_grp omitted because of collinearity.
                    
                    Random-effects GLS regression                   Number of obs     =         84
                    Group variable: id                              Number of groups  =          4
                    
                    R-sq:                                           Obs per group:
                         within  = 0.4918                                         min =         21
                         between = 0.3166                                         avg =       21.0
                         overall = 0.4850                                         max =         21
                    
                                                                    Wald chi2(35)     =      45.21
                    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.1158
                    
                    ----------------------------------------------------------------------------------
                    total_employer~t | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                    -----------------+----------------------------------------------------------------
                                year |
                               2001  |   1.413021   11.28749     0.13   0.900    -20.71005    23.53609
                               2002  |  -2.135292   11.28749    -0.19   0.850    -24.25837    19.98778
                               2003  |  -3.900887   11.28749    -0.35   0.730    -26.02396    18.22219
                               2004  |  -3.150497   12.19188    -0.26   0.796    -27.04615    20.74515
                               2005  |  -3.443947   12.19188    -0.28   0.778     -27.3396     20.4517
                               2006  |   21.06416   11.28749     1.87   0.062    -1.058914    43.18723
                               2007  |  -4.338427   12.19188    -0.36   0.722    -28.23408    19.55722
                               2008  |  -.9533353   12.19188    -0.08   0.938    -24.84899    22.94232
                               2009  |   2.125672   11.28749     0.19   0.851     -19.9974    24.24875
                               2010  |  -6.919791   12.19188    -0.57   0.570    -30.81544    16.97586
                               2011  |   4.215359   11.28749     0.37   0.709    -17.90771    26.33843
                               2012  |   16.37353   12.19188     1.34   0.179    -7.522122    40.26918
                               2013  |  -7.209671    13.8243    -0.52   0.602    -34.30479    19.88545
                               2014  |   21.04491    13.8243     1.52   0.128    -6.050209    48.14003
                               2015  |   1.176124    13.8243     0.09   0.932      -25.919    28.27124
                               2016  |  -5.706128   12.19188    -0.47   0.640    -29.60178    18.18952
                               2017  |  -8.892845   12.19188    -0.73   0.466     -32.7885    15.00281
                               2018  |   -3.52422   11.28749    -0.31   0.755    -25.64729    18.59885
                               2019  |   7.937369   12.19188     0.65   0.515    -15.95828    31.83302
                               2020  |   23.81052   12.19188     1.95   0.051    -.0851288    47.70617
                                     |
                            race_grp |
                         Black only  |   2.422193   18.43239     0.13   0.895    -33.70463    38.54902
                           Hispanic  |  -9.618032   19.55051    -0.49   0.623    -47.93632    28.70025
                              Other  |  -36.31252   18.43239    -1.97   0.049    -72.43935   -.1856972
                                     |
                       year#race_grp |
                    2000#Black only  |          0  (empty)
                      2000#Hispanic  |          0  (empty)
                         2000#Other  |          0  (empty)
                    2001#Black only  |          0  (empty)
                      2001#Hispanic  |          0  (empty)
                         2001#Other  |          0  (empty)
                    2002#Black only  |          0  (empty)
                      2002#Hispanic  |          0  (empty)
                         2002#Other  |          0  (empty)
                    2003#Black only  |          0  (empty)
                      2003#Hispanic  |          0  (empty)
                         2003#Other  |          0  (empty)
                    2004#Black only  |          0  (empty)
                      2004#Hispanic  |  -3.184794    26.8696    -0.12   0.906    -55.84824    49.47865
                         2004#Other  |          0  (empty)
                    2005#Black only  |          0  (empty)
                      2005#Hispanic  |   2.257607    26.8696     0.08   0.933    -50.40584    54.92105
                         2005#Other  |          0  (empty)
                    2006#Black only  |          0  (empty)
                      2006#Hispanic  |          0  (empty)
                         2006#Other  |          0  (empty)
                    2007#Black only  |          0  (empty)
                      2007#Hispanic  |   32.35422    26.8696     1.20   0.229    -20.30923    85.01767
                         2007#Other  |          0  (empty)
                    2008#Black only  |          0  (empty)
                      2008#Hispanic  |    19.5571    26.8696     0.73   0.467    -33.10634    72.22055
                         2008#Other  |          0  (empty)
                    2009#Black only  |          0  (empty)
                      2009#Hispanic  |          0  (empty)
                         2009#Other  |          0  (empty)
                    2010#Black only  |          0  (empty)
                      2010#Hispanic  |   3.255097    26.8696     0.12   0.904    -49.40835    55.91855
                         2010#Other  |          0  (empty)
                    2011#Black only  |          0  (empty)
                      2011#Hispanic  |          0  (empty)
                         2011#Other  |          0  (empty)
                    2012#Black only  |          0  (empty)
                      2012#Hispanic  |  -21.71541    26.8696    -0.81   0.419    -74.37886    30.94804
                         2012#Other  |          0  (empty)
                    2013#Black only  |          0  (empty)
                      2013#Hispanic  |   20.90082   25.23959     0.83   0.408    -28.56788    70.36951
                         2013#Other  |          0  (empty)
                    2014#Black only  |    29.9917    26.8696     1.12   0.264    -22.67175    82.65515
                      2014#Hispanic  |  -30.61976   27.64859    -1.11   0.268    -84.81001    23.57048
                         2014#Other  |          0  (empty)
                    2015#Black only  |          0  (empty)
                      2015#Hispanic  |          0  (omitted)
                         2015#Other  |   17.15233    26.8696     0.64   0.523    -35.51112    69.81578
                    2016#Black only  |          0  (empty)
                      2016#Hispanic  |          0  (empty)
                         2016#Other  |   26.51677   26.06734     1.02   0.309    -24.57428    77.60782
                    2017#Black only  |          0  (omitted)
                      2017#Hispanic  |          0  (empty)
                         2017#Other  |          0  (empty)
                    2018#Black only  |          0  (empty)
                      2018#Hispanic  |          0  (empty)
                         2018#Other  |          0  (empty)
                    2019#Black only  |          0  (empty)
                      2019#Hispanic  |          0  (empty)
                         2019#Other  |   20.68272   26.06734     0.79   0.428    -30.40833    71.77377
                    2020#Black only  |          0  (empty)
                      2020#Hispanic  |          0  (empty)
                         2020#Other  |          0  (omitted)
                                     |
                               _cons |   32.41057   7.981461     4.06   0.000      16.7672    48.05395
                    -----------------+----------------------------------------------------------------
                             sigma_u |          0
                             sigma_e |  16.196293
                                 rho |          0   (fraction of variance due to u_i)
                    ----------------------------------------------------------------------------------
                    
                    .
                    Now, do not attempt to create this individual identifier the way I have above as your results will make no sense if you mismatch individuals. Go back to the dataset and figure out what variable uniquely identifies an individual and then properly xtset your data.

                    Comment


                    • #25
                      Hi Andrew,
                      you guessed right, my unique identifier was causing the massive data loss once i invoke "duplicates drop". I fixed it to ensured each observation is uniquely identified and it worked. Thanks a lot for your kind help and time.
                      Best

                      Comment

                      Working...
                      X