Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • First difference regression and fixed effect regression

    Hello
    I try to make a regression, with panel data over a period of eight years, in which I will investigate the relationship between crime and migration. My regression equation is as follows where crime is the number of registered violent crimes (dependent variable). Migrants are the number of asylum seekers (independent variable, main variable of interest) and four possible control variables represented by X in a region i in year t are used.



    My questions are:
    1) I have seen in papers that make a fixed or first difference regression often use e.g. year dummies and/or region dummies. In other papers they are given in the tables as fixed effect instead of dummies. What is the difference between e.g. a year dummy and a year fixed effect?

    2) How important is it to include in such an estimation e.g. year dummies? or should one include year dummies and also region dummies?



  • #2
    Lucca:
    1) there's no practical difference between year dummy and year fixed effect (however, in panel data regression the groupwise effect you investigate is the one of -panelid-). That said, in Stata year dummies creation has been superseded by -fvvarlist- notation.
    2) you can include both -i.year- and -i.country-. After regression you can test their joint statistical significance via -testparm-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you very much Carlo. Now if I do a first difference regression only with year dummies, it works, but as soon as I want to use i.region for region dummies, I get an error message ("Region: string variables may not be used as factor variables"). Do you know if I have an error in my codes? (Regression 1 works but regression 2 does not work)

      Code:
      ***** Preparing Data
      sort Region Year
      egen panel_id = group(Region)
      sort panel_id Year
      tsset panel_id Year
      
      gen adult_pop = Pop-Pop_0_14
      gen asylum_pop = (asylum/adult_pop)
      gen a8_pop = (EUcum/adult_pop)
      gen young_share = (Pop_15_24/adult_pop)
      gen benefit_claimants = Benefit/adult_pop
      gen lnpop = ln(adult_pop)
      gen a8_iv = EU_8_IV/adult_pop
      gen viol_crime_rate = Violence/adult_pop
      
      sort panel_id Year
      by panel_id: egen avg_adult_pop = mean(adult_pop)
      gen trend = Year-2010
      
      
      ****First-Difference Regression
      ***1)
      regress D.(viol_crime_rate asylum_pop a8_pop lnpop benefit_claimants young_share) i.Year [aw=adult_pop], vce(cluster panel_id)
      test D.asylum_pop=D.a8_pop
      
      
      ***2)
      regress D.(viol_crime_rate asylum_pop a8_pop lnpop benefit_claimants young_share) i.Year i.Region [aw=adult_pop], vce(cluster panel_id)
      test D.asylum_pop=D.a8_pop

      Comment


      • #4
        Lucca:
        you cannot have a variable in -string- format as a predictor: you should -destring- it first, as you can see from the followiing toy-example:
        Code:
        . set obs 1
        number of observations (_N) was 0, now 1
        
        . g Region="1" in 1
        
        . list
        
             +--------+
             | Region |
             |--------|
          1. |      1 |
             +--------+
        
        . destring Region, replace
        Region: all characters numeric; replaced as byte
        
        . list
        
             +--------+
             | Region |
             |--------|
          1. |      1 |
             +--------+
        .
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Besides Carlo's helpful advise, regional dummies should not be included in a first difference regression anyway because differencing has the same effect as demeaning, it already gets rid of the regional fixed effects if you have set region as your panel id.

          Comment


          • #6
            And as Professor Jeffrey Wooldridge keeps on repeating, you distinguish between the model and the technique used to estimate it.
            The model contains the regional dummies. When you take the first difference to estimate the model you get one minus one or zero minus zero.
            This is making the same point as Wouter Wakker in a different way.

            Comment


            • #7
              Wouter and Eric are obvioiusly correct.
              I forgot to mention in my previous reply that each time-invariant predictor will be wiped out by the -fe- machinery.
              Thinking about -xtreg, fe-, things might be different if some panel unit changes region during the 8-year timespan (ie, -region- is no more time-invariant).
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Thank you so much for your help. Exactly I chose region as panel_id.

                If I understand it correctly, if I perform the following panel regression in first differences and only include in the codes "i.Year", I have year fixed-effects in the regression and for the region I don't have to write "i.Region" in the codes anymore, because I already chose region as panel_id and so region fixed-effects is directly included.
                Am I right?


                Code:
                ******* Panel Settings
                
                sort Kanton Year
                egen panel_id = group(Region)
                sort panel_id Year
                tsset panel_id Year
                
                
                *** First Difference Regression
                
                regress D.(viol_crime_rate asylum_pop a8_pop lnpop benefit_claimants young_share) i.Year [aw=adult_pop], robust
                Last edited by Lucca Mancini; 11 Jul 2019, 07:24.

                Comment


                • #9
                  Correct.
                  However, I do not think that you should -xtset- or -tsset- your data before running First Difference Regression.
                  Last edited by Carlo Lazzaro; 11 Jul 2019, 09:34.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Carlo: D. is a time series operator so will not work without tsset or xtset. Also from a technical point of view, a first difference regression uses the differences of the variables within each panel, so if you have 15 observations per panel in levels, you will have 14 per panel in FD. In other words, to calculate these differences within panels Stata needs to know the time and panelvar.

                    Lucca: Your code looks alright although I'm not familiar with weight regressions so I cannot comment on the weights. Also, be aware that robust after -reg- is not the same as robust after -xtreg, fe-.

                    Comment


                    • #11
                      Wouter is correct.
                      Admittedly, I seldom use time-series operator so I've forgotten that -tsset-ing data beforehand is mandatory to make them work.
                      About the difference between -robust- option in -regress- and -xtreg-, I recall a really interesting thread led by some points raised by daniel klein https://www.statalist.org/forums/for...ls-assumptions
                      Last edited by Carlo Lazzaro; 11 Jul 2019, 09:43.
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment


                      • #12
                        Thank you so much for your helpful replies. I have now noticed regarding the robust standard errors that I could also do vce(cluster Region), which means that standard errors are clustered at regional level. If I calculate with "robust", then the standard error is bigger than with vce(cluster Region). But I don't know which one is better or what does vce(cluster Region) exactly mean and in which cases one should use it?

                        Code:
                          
                         regress D.(viol_crime_rate asylum_pop a8_pop lnpop benefit_claimants young_share) i.Year [aw=adult_pop], robust   **** or  regress D.(viol_crime_rate asylum_pop a8_pop lnpop benefit_claimants young_share) i.Year [aw=adult_pop], vce(cluster Region)

                        Comment


                        • #13
                          Lucca:
                          you should go as you did (-vce(cluster Region)-).
                          Kind regards,
                          Carlo
                          (Stata 19.0)

                          Comment

                          Working...
                          X