Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Splitting data in regression analysis

    Hi,

    I am new to Satalist so I apologise for any errors in my post.
    I am currently running a regression model on unemployment (independent) and crime (dependent), using a panel-data approach of 50 U.S states between 2000-2015 with a multitude of control variables. I have used the xtreg command and it is a Fixed effects (FE) model.

    I have ran my model i.e. 2000-2015 but now want to split my data into pre-recession (2000-2007) and post-recsession (2008-2013) periods. I am unsure weather this is econometrically correct? I have read the use of time-specific dummy variables may be useful however I wanted to understand if 'splitting the data' is acceptable to draw conclusions on?

    Any help on this would greatly appreciated.
    Thanks

  • #2
    Dan:
    welcome to the list.
    Plugging among predictors a categorical variable (before/after crisis) sounds to me as a wiser approach.
    Please see -help fvvarlist- for categorical variables (and interactions) creation.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hi Carlo,

      Is the best way to do this to use the i.Year command on Stata? My concern is that this will only give me a coefficient on the unemployment variable for the overall time period (2000-2015) and not a coefficient for unemployment each individual time period (i.e. pre and post recession time periods?).

      Thanks,
      Dan

      Comment


      • #4
        Dan:
        not quite.
        With -i.year- you would estimate time fixed effects.
        I meant something along the following lines:
        Code:
        g recession=0 if year<=2007
        replace recession=1 if year>2007
        label define recession 0 "Before recession" 1 "After recession"
        label val recession recession
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Hi Carlo,

          Click image for larger version

Name:	Screen Shot 2017-05-06 at 16.57.23.png
Views:	1
Size:	85.4 KB
ID:	1391291


          The image above is the output generated by Stata.

          Could you advise me on how to interpret these results? (Concerning Unemployment and recession only)
          As I explained before however, I wanted to obtain a coefficient on unemployment for each time period so I could see how the coefficient changes in magnitude or direction between periods, however these results only give me one coefficient? I am probably interpreting the results incorrectly so any guidance on this would be helpful?

          Comment


          • #6
            Dan:
            for the future please post what you typed (that is missing in your present post) and what Stata gave you back (as you did) via CODE delimiters, not screnshot (see the FAQ on these topics). Thanks.
            As far as you query is concerned:
            - moving from before recession to after recession, reduces the -depvar- by 189.9299;
            - each 1-unit increase in unemployment increases the -depvar- by 0.196296;
            - I'm not clear with your research goal of obtaining a coefficient for unemployment for each time period.
            Do you mea interacting -year- with -unemployment-?
            I cannot say whether your model (and the fixed effect machinery) will allow you to do so, anyway, you can try plugging in, among your predictors, the following interaction:
            Code:
            c.unemployment##i.year
            I'm editing this post tomake myself clearer with a toy-example:
            Code:
            . use "http://www.stata-press.com/data/r14/nlswork.dta", clear
            (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
            
            . xtreg ln_wage c.tenure##i.year, fe
            
            Fixed-effects (within) regression               Number of obs     =     28,101
            Group variable: idcode                          Number of groups  =      4,699
            
            R-sq:                                           Obs per group:
                 within  = 0.1392                                         min =          1
                 between = 0.1943                                         avg =        6.0
                 overall = 0.1538                                         max =         15
            
                                                            F(29,23373)       =     130.34
            corr(u_i, Xb)  = 0.1394                         Prob > F          =     0.0000
            
            -------------------------------------------------------------------------------
                  ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            --------------+----------------------------------------------------------------
                   tenure |   .0460022   .0099618     4.62   0.000     .0264764     .065528
                          |
                     year |
                      69  |   .0767027   .0164924     4.65   0.000     .0443764    .1090289
                      70  |   .0171287   .0157895     1.08   0.278    -.0138198    .0480772
                      71  |    .066154   .0153825     4.30   0.000     .0360034    .0963047
                      72  |   .0666567   .0159018     4.19   0.000     .0354882    .0978253
                      73  |   .0735309   .0151855     4.84   0.000     .0437662    .1032956
                      75  |   .1023057   .0149763     6.83   0.000     .0729512    .1316602
                      77  |   .1508128    .014786    10.20   0.000     .1218312    .1797944
                      78  |   .1902381   .0155584    12.23   0.000     .1597427    .2207335
                      80  |   .2171641   .0152967    14.20   0.000     .1871815    .2471466
                      82  |   .2059075   .0150451    13.69   0.000     .1764181    .2353969
                      83  |   .2302122    .015574    14.78   0.000     .1996861    .2607383
                      85  |   .2760856   .0152147    18.15   0.000     .2462638    .3059074
                      87  |   .2740928   .0150693    18.19   0.000     .2445561    .3036296
                      88  |   .3541284    .015308    23.13   0.000     .3241237    .3841331
                          |
            year#c.tenure |
                      69  |   -.001617   .0131544    -0.12   0.902    -.0274004    .0241664
                      70  |   .0164736    .011756     1.40   0.161    -.0065688    .0395161
                      71  |   .0059405   .0110512     0.54   0.591    -.0157206    .0276016
                      72  |   .0061243   .0109935     0.56   0.577    -.0154238    .0276723
                      73  |    .002744   .0106346     0.26   0.796    -.0181005    .0235884
                      75  |  -.0117935   .0104658    -1.13   0.260    -.0323071    .0087201
                      77  |  -.0146246     .01026    -1.43   0.154    -.0347348    .0054856
                      78  |   -.020725   .0102293    -2.03   0.043    -.0407752   -.0006749
                      80  |  -.0269004   .0101589    -2.65   0.008    -.0468125   -.0069883
                      82  |  -.0230145   .0101088    -2.28   0.023    -.0428284   -.0032007
                      83  |  -.0259333    .010093    -2.57   0.010    -.0457163   -.0061503
                      85  |   -.026761   .0100576    -2.66   0.008    -.0464745   -.0070475
                      87  |  -.0251743   .0100392    -2.51   0.012    -.0448518   -.0054969
                      88  |   -.031173   .0100295    -3.11   0.002    -.0508316   -.0115144
                          |
                    _cons |   1.431128   .0108593   131.79   0.000     1.409843    1.452412
            --------------+----------------------------------------------------------------
                  sigma_u |  .38945547
                  sigma_e |  .29661148
                      rho |  .63289411   (fraction of variance due to u_i)
            -------------------------------------------------------------------------------
            F test that all u_i=0: F(4698, 23373) = 8.03                 Prob > F = 0.0000
            Last edited by Carlo Lazzaro; 06 May 2017, 10:24. Reason: Example added.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Hi Carlo,

              Sorry I was unaware of how to do that, I hope this is ok.

              Code:
              . xtreg  TotalCrime Unemployment Income HighSchool Migrants Poverty Police Youth TOTCG recession c.Unemployment##i.Year, fe
              note: Unemployment omitted because of collinearity
              note: 2015.Year omitted because of collinearity
              
              Fixed-effects (within) regression               Number of obs      =       743
              Group variable: State1                          Number of groups   =        50
              
              R-sq:  within  = 0.7470                         Obs per group: min =        13
                     between = 0.0056                                        avg =      14.9
                     overall = 0.0494                                        max =        15
              
                                                              F(36,657)          =     53.89
              corr(u_i, Xb)  = -0.4639                        Prob > F           =    0.0000
              
              -------------------------------------------------------------------------------------
                       TotalCrime |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              --------------------+----------------------------------------------------------------
                     Unemployment |   .0966797   .0918944     1.05   0.293    -.0837624    .2771218
                           Income |   .0112565   .0089662     1.26   0.210    -.0063494    .0288625
                       HighSchool |   .0067321   .0079051     0.85   0.395    -.0087903    .0222544
                         Migrants |   .0780313   .0277428     2.81   0.005     .0235561    .1325066
                          Poverty |   48.71014   9.849692     4.95   0.000     29.36947    68.05081
                           Police |  -.2984332   .2804785    -1.06   0.288    -.8491755    .2523091
                            Youth |   .1074687   .0366193     2.93   0.003     .0355636    .1793737
                            TOTCG |  -.0658798   .0112357    -5.86   0.000     -.087942   -.0438176
                        recession |  -298.3668   343.0545    -0.87   0.385    -971.9821    375.2485
                     Unemployment |          0  (omitted)
                                  |
                             Year |
                            2002  |  -368.1669   292.7322    -1.26   0.209    -942.9705    206.6366
                            2003  |  -379.4704   300.2437    -1.26   0.207    -969.0233    210.0825
                            2004  |  -197.7837   299.8283    -0.66   0.510    -786.5209    390.9536
                            2005  |  -91.08073   301.3621    -0.30   0.763    -682.8298    500.6683
                            2006  |  -398.8091   288.7708    -1.38   0.168     -965.834    168.2158
                            2007  |  -336.0185   289.8194    -1.16   0.247    -905.1023    233.0654
                            2008  |  -199.2803   277.5728    -0.72   0.473    -744.3171    345.7564
                            2009  |   35.19003   274.7342     0.13   0.898    -504.2729     574.653
                            2010  |   31.16998   270.2913     0.12   0.908    -499.5689    561.9089
                            2011  |   6.831486   265.5592     0.03   0.979    -514.6156    528.2786
                            2012  |  -121.5598   263.7558    -0.46   0.645    -639.4657    396.3462
                            2013  |  -71.93492    266.255    -0.27   0.787    -594.7483    450.8785
                            2014  |   -191.881     274.26    -0.70   0.484    -730.4129    346.6509
                            2015  |          0  (omitted)
                                  |
              Year#c.Unemployment |
                            2002  |   .1028646   .1140267     0.90   0.367    -.1210361    .3267652
                            2003  |   .1132165    .112505     1.01   0.315    -.1076962    .3341293
                            2004  |    .057738   .1159682     0.50   0.619     -.169975    .2854509
                            2005  |  -.0074513   .1185145    -0.06   0.950    -.2401641    .2252615
                            2006  |   .0321847   .1163639     0.28   0.782    -.1963053    .2606747
                            2007  |  -.0402392    .115915    -0.35   0.729    -.2678478    .1873694
                            2008  |  -.0230607   .1059873    -0.22   0.828    -.2311755     .185054
                            2009  |  -.1366447   .0959978    -1.42   0.155    -.3251441    .0518547
                            2010  |  -.1425046   .0959586    -1.49   0.138    -.3309271    .0459179
                            2011  |  -.1483504   .0973943    -1.52   0.128     -.339592    .0428912
                            2012  |  -.1389991   .0992809    -1.40   0.162    -.3339452    .0559471
                            2013  |  -.1901316   .1020541    -1.86   0.063    -.3905231      .01026
                            2014  |  -.2235618   .1099225    -2.03   0.042    -.4394036     -.00772
                            2015  |  -.3637617   .1231523    -2.95   0.003    -.6055813    -.121942
                                  |
                            _cons |   -2921.93   1120.728    -2.61   0.009    -5122.569   -721.2899
              --------------------+----------------------------------------------------------------
                          sigma_u |  954.30321
                          sigma_e |   245.4577
                              rho |  .93794753   (fraction of variance due to u_i)
              -------------------------------------------------------------------------------------
              F test that all u_i=0:     F(49, 657) =    87.82             Prob > F = 0.0000
              The aim of my study was to see if the coefficient on Unemployment changes between pre and post recession as literature suggests the crime-unemployment relationship is positive in pre-recession/economically good periods and negative in post-recession/economically bad periods, hence, I desired a coefficient for each period.

              When I split the data into pre and post recession periods I obtained a positive coefficient in the pre-recession period and a negative coefficient in the post-recession period. With the exception of years 2005 and 2007, these results are similar to splitting the data into two periods and running two separate regressions.

              In your opinion, would this be a more appropriate way of analysing the data?


              Thanks,
              Dan

              Comment


              • #8
                I think that for your goal of determining of the coefficient of unemployment is different in the pre and post recession years you need to do this:

                Code:
                gen byte recession = (year > 2007) if !missing(year)
                xtreg TotalCrime i.recession##c.Unemployment other variables Ilike i.year Income etc., fe
                margins recession, dydx(Unemployment)
                If you want to run a hypothesis test that the coefficient changed after the onset of the recession, the statistics in the 1.recession#c.Unemployment row of the -xtreg- output will give you that. To see the actual pre- and post-recession coefficients themselves, look at the -margins- output.

                Comment


                • #9
                  Dan:
                  thanks for providing further calrifications about your research goal (and using CODE delimiters appropriately).
                  However, if you're interested in investigating whether unemployment (rate?) changes between pre and post recession period, there's probably a better interaction to test, that would change your original regression code a bit:
                  Code:
                  xtreg  TotalCrime c.Unemployment##i.recession  Income HighSchool Migrants Poverty Police Youth TOTCG i.Year, fe
                  PS: I realized only now that my post crossed in the cyberspace with Clydes'reply.
                  Last edited by Carlo Lazzaro; 06 May 2017, 11:31.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Carlo - Thank you for your continued input, my aim is not to see how unemployment changed between periods it is how crime changes between periods dependant on unemployment levels (per 100,000). I think your previous code was more appropriate for this analysis and I may use the interaction term you suggested.

                    Thanks for the input Clyde, I also did as you suggested and obtained the following results

                    Code:
                    . xtreg TotalCrime i.recession##c.Unemployment i.Year Income HighSchool Migrants Poverty Police Youth TOTCG DeathPen, fe
                    note: 1.recession omitted because of collinearity
                    
                    Fixed-effects (within) regression               Number of obs      =       743
                    Group variable: State1                          Number of groups   =        50
                    
                    R-sq:  within  = 0.7418                         Obs per group: min =        13
                           between = 0.0023                                        avg =      14.9
                           overall = 0.0537                                        max =        15
                    
                                                                    F(24,669)          =     80.08
                    corr(u_i, Xb)  = -0.4709                        Prob > F           =    0.0000
                    
                    ------------------------------------------------------------------------------------------
                                  TotalCrime |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------------------+----------------------------------------------------------------
                                 1.recession |          0  (omitted)
                                Unemployment |   .1406037   .0417833     3.37   0.001     .0585614    .2226459
                                             |
                    recession#c.Unemployment |
                                          1  |  -.1751107   .0388409    -4.51   0.000    -.2513755   -.0988459
                                             |
                                        Year |
                                       2002  |  -104.8444    55.6655    -1.88   0.060    -214.1445    4.455723
                                       2003  |  -110.6037   94.27026    -1.17   0.241    -295.7049    74.49753
                                       2004  |  -96.53367   95.05399    -1.02   0.310    -283.1737    90.10639
                                       2005  |   -166.519   102.6882    -1.62   0.105    -368.1491    35.11104
                                       2006  |  -369.4891   101.5318    -3.64   0.000    -568.8484   -170.1298
                                       2007  |  -474.5525   109.5162    -4.33   0.000    -689.5893   -259.5157
                                       2008  |  -151.2357   153.4505    -0.99   0.325    -452.5382    150.0669
                                       2009  |  -241.7451   172.9573    -1.40   0.163    -581.3496    97.85951
                                       2010  |  -268.0589   176.6529    -1.52   0.130    -614.9198    78.80192
                                       2011  |  -315.7569   178.4954    -1.77   0.077    -666.2354    34.72171
                                       2012  |  -408.3058    180.022    -2.27   0.024    -761.7819    -54.8297
                                       2013  |  -521.3037    179.461    -2.90   0.004    -873.6782   -168.9292
                                       2014  |  -710.9725   184.2288    -3.86   0.000    -1072.709   -349.2363
                                       2015  |  -828.9256    188.795    -4.39   0.000    -1199.628   -458.2236
                                             |
                                      Income |   .0161796   .0090791     1.78   0.075    -.0016474    .0340065
                                  HighSchool |   .0040221    .007797     0.52   0.606    -.0112875    .0193317
                                    Migrants |   .0777707   .0277966     2.80   0.005     .0231916    .1323498
                                     Poverty |   49.03345   9.599759     5.11   0.000     30.18417    67.88274
                                      Police |  -.2108817   .2782808    -0.76   0.449    -.7572905    .3355272
                                       Youth |     .09153   .0358617     2.55   0.011      .021115    .1619451
                                       TOTCG |  -.0712152   .0110616    -6.44   0.000    -.0929348   -.0494956
                                    DeathPen |   76.65132   54.11071     1.42   0.157    -29.59594    182.8986
                                       _cons |  -2966.944   1110.604    -2.67   0.008    -5147.633   -786.2544
                    -------------------------+----------------------------------------------------------------
                                     sigma_u |  955.20221
                                     sigma_e |  245.74479
                                         rho |  .93792106   (fraction of variance due to u_i)
                    ------------------------------------------------------------------------------------------
                    F test that all u_i=0:     F(49, 669) =    84.24             Prob > F = 0.0000
                    I am unsure how to interpret these results? Could offer me any advice?
                    As I say I am looking to analyse how unemployment has caused changes in crime levels between pre and post crisis periods incase it needed clarifying.

                    Thanks

                    Comment


                    • #11
                      Dan, I described how to interpret these results in #10:

                      If you want to run a hypothesis test that the coefficient changed after the onset of the recession, the statistics in the 1.recession#c.Unemployment row of the -xtreg- output will give you that. To see the actual pre- and post-recession coefficients themselves, look at the -margins- output.
                      You need to run the -margins- command shown there as well.

                      Comment


                      • #12
                        Dan:
                        I do share Clyde's remark about -margins (and, I would add, -marginsplot-).
                        That said, in a hypothethical pre-margins- era, referring to the same panel (as you’re estimating within panel variance as years go by) and when adjusted for the remaining predictors, I would comment on -unemployment- as follows:
                        • in the pre-recession era, a 1-unit increase in unemployment rate increases the criminality rate by .1406037;
                        • in the post-recession, era a 1-unit increase in unemployment rate reduces the criminality rate by (.1406037-.1751107)=-0.034507.
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment

                        Working...
                        X