Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Omitted Time Dummy Variables.

    Hi all,

    I'm relatively new to Stata and I was wondering if anyone could potentially help with a problem I'm encountering. I'm attempting to model crime rates on a number of explanatory variables (inequality, median income, poverty, unemployment, population density, young males, ethnic minorities, prison population and police force strength). The data I am using is panel data, for ten different cities over a ten year period, and I am estimating my model using a pooled OLS.

    I have carried out the regression itself, but I am currently going through standard robustness tests by investigating whether my coefficients hold up when I include dummy variables for each year I have covered in the data however I am encountering an apparent problem of collinearity.
    When I add these dummies (using the command i.Year), I have the last year in my sample omitted due to collinearity.

    I have no idea as to why this may be the case, but I'm keen to know if there's a way around this or ideas as to what may be the cause?

    I have attached some photos for reference.

    Thanks for any replies in advance!
    Pooled OLS Regression Year Dummies included (2014 omitted)
    Last edited by Wesley Bell; 30 Mar 2016, 20:34.

  • #2
    This is not a problem. This is expected and there is nothing you need to get around.

    So, imagine you had created your indicator (dummy) variables for years 2006 through 2014 by hand. The first one, call it 2006.year would equal 1 for all observations in which year == 2006, and 0 in all other observations. Similarly for each year all the way up through 2014.year equaling 1 for all observations in which year = 2014, and 0 in all others. So, in every observation, one and only one of the year indicator variables will be 1 and all the others will be zero. So in every observation if you add them up, the total is always 1. It follows that these variables are co-linear with the constant term. This is always the case with indicator variables: they sum to 1 in every observation and are, as a group, colinear with the constant term.

    Since you can't solve the regression equations if there are co-linear terms, something has to give. So Stata will resolve this automatically dropping one of the indicators, typically the last one (though you can different one if you want to.)

    Where I think you may have a real problem is your choice of pooled OLS to investigate this problem. Since your data is panel data, normally the observations within panels exhibit some intra-panel correlation, violating the assumption underlying OLS regression that all the observations are independent. So I'm wondering why you aren't using one of the panel-data estimators, like -xtreg- instead.

    As an aside, going forward, please do not attach screenshots as a way of showing output. They are typically unreadable on many people's computers. These particular ones happen to be readable on mine, but that was just a lucky break. The way to show output so that everyone can read it well is to copy from your Results window or log file to the clipboard and then paste into a code block on this forum. To set up a code block, follow the instructions in FAQ #12-7th paragraph. Also, FYI, if you need to show an example of your data (which is often a very good thing to do), the best way to do that is to use the -dataex- command. You can get -dataex- by running -ssc install dataex- and then follow the instructions in -help dataex-. By using -dataex- you assure that someone who wants to experiment with your data to help you can quickly and easily create an exact replica of your example in their own Stata setup.
    Last edited by Clyde Schechter; 30 Mar 2016, 20:43.

    Comment


    • #3
      Wesley:
      I do agree with Clyde's remarks.
      At the top of that, you seem to have chosen a half-way between pooled OLS (POLS) and panel data analysis.
      If you want to go POLS (and you should have strong reasons for preferring POLS to, say, -xtreg-), you should cluster your standard errors (SEs) on observation identifier, because you do not have independent observations, but multiple meaurements about the same unit over time (which is not the same).
      An F test that investigate whether panel data regression outperforms POLS is given at the foot of the outcome table after -xtreg, fe- (with default SEs).
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        correct me here if I'm wrong, but if stata was omitting one of the years due to it being the base level of the factor variable year, it would not have reported omitting due to co-linearity. Also, Bell said that he has "10 time periods" and the regression output only shows 9 (including the dropped 2014) - so I suspect there's something more there.

        Bell - what I would do is some sort of elimnation process. start by regressing your dependent variable just on i.year . if that works (stata doesn't drop due to co-linearity) continue adding variables until the issue pops up again. then investigate just on i.year and that variable etc. I once had an issue where I had two variables x1 x2 which when controlling for year, their sum was constant - and stata indeed dropped one of the years for me.

        I also by the way recommend using XT and reading about XT. a good place to start is this document here which Iv'e linked to numerous times on this forum:
        https://www.princeton.edu/~otorres/Panel101.pdf


        Comment


        • #5
          Hi, thank you all for your responses, much appreciated.


          Clyde and Carlo,
          It may be beneficial to explain why I chose to use OLS over RE and FE.

          When I attempt to estimate the RE model I get the exact same results as I do for OLS. The variance of u I get is zero and I also get rho = 0. When I conduct the BP-LM test for RE, I'm told that Var (u) = 0 and my P value is 1. This can be seen in the attached screenshots. Apologies I cant seem to get the dataex command to work (I am operating smallStata 14) so I have pasted my results to my clipboard and pasted them below.

          . xtset
          panel variable: City (strongly balanced)
          time variable: Year, 2005 to 2014
          delta: 1 year

          . xtreg Log_Crime Inequality Median_Income Poverty Unemployed Pop_Density Males Ethnic Prison_Pop Police_Strength, re

          Random-effects GLS regression Number of obs = 93
          Group variable: City Number of groups = 10

          R-sq: Obs per group:
          within = 0.7888 min = 6
          between = 0.8125 avg = 9.3
          overall = 0.8036 max = 10

          Wald chi2(9) = 339.50
          corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

          ---------------------------------------------------------------------------------
          Log_Crime | Coef. Std. Err. z P>|z| [95% Conf. Interval]
          ----------------+----------------------------------------------------------------
          Inequality | .0714046 .0396458 1.80 0.072 -.0062997 .149109
          Median_Income | -.0000336 8.48e-06 -3.96 0.000 -.0000503 -.000017
          Poverty | -.0516008 .0178942 -2.88 0.004 -.0866729 -.0165288
          Unemployed | -.0518952 .0067212 -7.72 0.000 -.0650685 -.038722
          Pop_Density | .0000785 .0000198 3.96 0.000 .0000396 .0001173
          Males | .0913716 .0149505 6.11 0.000 .0620692 .120674
          Ethnic | .0014268 .0013552 1.05 0.292 -.0012292 .0040829
          Prison_Pop | -13.24905 4.270266 -3.10 0.002 -21.61862 -4.879483
          Police_Strength | .1041006 .0277715 3.75 0.000 .0496695 .1585316
          _cons | 7.133959 .6987943 10.21 0.000 5.764348 8.503571
          ----------------+----------------------------------------------------------------
          sigma_u | 0
          sigma_e | .07606655
          rho | 0 (fraction of variance due to u_i)
          ---------------------------------------------------------------------------------

          . xttest0

          Breusch and Pagan Lagrangian multiplier test for random effects

          Log_Crime[City,t] = Xb + u[City] + e[City,t]

          Estimated results:
          | Var sd = sqrt(Var)
          ---------+-----------------------------
          Log_Crime | .0660038 .2569121
          e | .0057861 .0760665
          u | 0 0

          Test: Var(u) = 0
          chibar2(01) = 0.00
          Prob > chibar2 = 1.0000


          Ariel,
          I was of the same belief, my sample is 2005-2014 and I understood the reason why 2005 would be dropped, not 2014. I had in fact carried out an elimination process and it appears my prison population variable was the variable which causes the collinearity drop. The Princeton resource has been a great help and I have made use of it and similar ones provided by my university. I preferably would like to conduct an xt regression but I just can't seem to get results different results from a random effects and OLS (as I mention in my response above).

          I have attached my dataset (it is quite small) in case there is anything glaringly obvious I may not be seeing.

          Thanks again all.
          Attached Files

          Comment


          • #6
            I would suggest:
            1. paste output in [CODE] delimiters so the output would be clearer
            2. Attach the data in stata (dta) format, so we can avoid the hassle of importing\exporting from excel

            Comment


            • #7
              Hi Ariel,

              I'm not too sure how to paste the output in code delimiters, but please find attached the .dta file.

              Thanks for the response again.

              Kind regards
              Attached Files

              Comment


              • #8
                Originally posted by Wesley Bell View Post
                I'm not too sure how to paste the output in code delimiters
                http://www.statalist.org/forums/foru...ode-delimiters

                You can play with this by posting attempts in the Sandbox forum

                ---------------------------------
                Maarten L. Buis
                University of Konstanz
                Department of history and sociology
                box 40
                78457 Konstanz
                Germany
                http://www.maartenbuis.nl
                ---------------------------------

                Comment


                • #9
                  As a sidelight, I've grown fond of the dataex command for posting data, at least if the file isn't too humongous. That way users can know for sure they aren't downloading something malicious. Do

                  ssc install dataex
                  -------------------------------------------
                  Richard Williams, Notre Dame Dept of Sociology
                  StataNow Version: 19.5 MP (2 processor)

                  EMAIL: [email protected]
                  WWW: https://academicweb.nd.edu/~rwilliam/

                  Comment


                  • #10
                    Hi again, thank you all for your help, much appreciated.

                    Below are my findings formatted a bit clearer.
                    I have alos managed to make use of the dataex command however it is quite unclearly formatted (from previewing my post) but I can share if need be.
                    I ideally would like to estimate a XT model, but I keep getting the same results as I do under OLS, does anyone have any idea why this might be?

                    Code:
                    . xtset
                           panel variable:  City (strongly balanced)
                            time variable:  Year, 2005 to 2014
                                    delta:  1 year
                    
                    . xtreg Crime_Log Inequality Median_Income Poverty Unemployed Pop_Density Males Ethnic Prison_Pop Police_Strength, re
                    
                    Random-effects GLS regression                   Number of obs     =         93
                    Group variable: City                            Number of groups  =         10
                    
                    R-sq:                                           Obs per group:
                         within  = 0.7888                                         min =          6
                         between = 0.8125                                         avg =        9.3
                         overall = 0.8036                                         max =         10
                    
                                                                    Wald chi2(9)      =     339.50
                    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
                    
                    ---------------------------------------------------------------------------------
                          Crime_Log |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    ----------------+----------------------------------------------------------------
                         Inequality |   .0714046   .0396458     1.80   0.072    -.0062997     .149109
                      Median_Income |  -.0000336   8.48e-06    -3.96   0.000    -.0000503    -.000017
                            Poverty |  -.0516008   .0178942    -2.88   0.004    -.0866729   -.0165288
                         Unemployed |  -.0518952   .0067212    -7.72   0.000    -.0650685    -.038722
                        Pop_Density |   .0000785   .0000198     3.96   0.000     .0000396    .0001173
                              Males |   .0913716   .0149505     6.11   0.000     .0620692     .120674
                             Ethnic |   .0014268   .0013552     1.05   0.292    -.0012292    .0040829
                         Prison_Pop |  -13.24905   4.270266    -3.10   0.002    -21.61862   -4.879483
                    Police_Strength |   .1041006   .0277715     3.75   0.000     .0496695    .1585316
                              _cons |   7.133959   .6987943    10.21   0.000     5.764348    8.503571
                    ----------------+----------------------------------------------------------------
                            sigma_u |          0
                            sigma_e |  .07606655
                                rho |          0   (fraction of variance due to u_i)
                    ---------------------------------------------------------------------------------
                    
                    . xttest0
                    
                    Breusch and Pagan Lagrangian multiplier test for random effects
                    
                            Crime_Log[City,t] = Xb + u[City] + e[City,t]
                    
                            Estimated results:
                                             |       Var     sd = sqrt(Var)
                                    ---------+-----------------------------
                                   Crime_Log |   .0660038       .2569121
                                           e |   .0057861       .0760665
                                           u |          0              0
                    
                            Test:   Var(u) = 0
                                                 chibar2(01) =     0.00
                                              Prob > chibar2 =   1.0000
                    
                    .
                    Thanks again, Wesley.
                    Last edited by Wesley Bell; 31 Mar 2016, 07:58.

                    Comment


                    • #11
                      Wesley:
                      thanks for providing an easily redable output via CODE delimiters.
                      Two remarks about your regression model:
                      - the main issue that hits my eyes is that there's no variance at the individual effect level (sigma_u=0); i would check my data and try to track down what causes this weird result;
                      - with 9 predictors out of 93 observations you're getting very near the red zone of the gauge (i.e. you're at risk of asking too much out of your data).
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment

                      Working...
                      X