Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Including year dummies in linear regression model

    Hi all,

    I have a panel dataset from 2003-2015 for 120 firms. I now want to include year dummies as control variables.

    I am pretty new to Stata and kind of lost. I'd be really grateful if someone could help with the code.

    Thanks in advance.

    Lena

  • #2
    Welcome to the Stata Forum / Statalist,

    You may wish to read the FAQ. There you’ll find information on how to share data/command/output in the forum.

    That said, I wish to make two comments. First, depending on your aims, it may be unnecessary to include year dummies, once you - xtset - the data. In case you do need it, you won’t need to create year dummies, for the additional ‘i.’ before the variable will indicate it is a categorical variable.

    Hopefully that helps.
    Best regards,

    Marcos

    Comment


    • #3
      Lena:
      Marcos gave helpful hints.
      By including -i.year- some years might be omitted due to collinearity with the panel .timevar-.
      That said, after running your panel data regression you may want to test whether -i.year- as a whole is statistical significant via:
      Code:
      testparm i.year
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Hi Marcos,

        thanks for your reply. I am wirting my thesis and trying to find out whether media Attention (measured by number of articles published) has an influence on sales and whether this realtionship is moderated by the female gender of the Journalist. My thesis advisor told me it would be useful to include year dummies to capture the influence of time Trends.

        That being said, I have xtset my data.

        Best,

        Lena

        Comment


        • #5
          Lena:
          as per FAQ, please note that posting what you typed and what Stata gave you back (via CODE delimiters, please) will increase your chances of getting helpful replies.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Hi Carlo,
            thanks for your help.
            Please see the error message I get when using your code:

            Code:
            testparm i.CY
            no such variables;
            the specified varlist does not identify any testable coefficients
            r(111);
            Note: CY stands for corporate year and is our year variable

            Do you have any suggestions?
            Best
            Lena

            Comment


            • #7
              Lena:
              please look at the following example:
              Code:
               use "http://www.stata-press.com/data/r14/nlswork.dta", clear
              (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
              
              . xtset idcode year
                     panel variable:  idcode (unbalanced)
                      time variable:  year, 68 to 88, but with gaps
                              delta:  1 unit
              
              . xtreg ln_wage i.race i.year
              
              Random-effects GLS regression                   Number of obs     =     28,534
              Group variable: idcode                          Number of groups  =      4,711
              
              R-sq:                                           Obs per group:
                   within  = 0.1058                                         min =          1
                   between = 0.0975                                         avg =        6.1
                   overall = 0.0907                                         max =         15
              
                                                              Wald chi2(16)     =    3310.99
              corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
              
              ------------------------------------------------------------------------------
                   ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                      race |
                    black  |  -.1279049   .0128852    -9.93   0.000    -.1531594   -.1026504
                    other  |   .0900274   .0537654     1.67   0.094     -.015351    .1954057
                           |
                      year |
                       69  |    .085839   .0123757     6.94   0.000      .061583     .110095
                       70  |   .0702121   .0115599     6.07   0.000     .0475552     .092869
                       71  |   .1200453   .0114213    10.51   0.000     .0976599    .1424307
                       72  |   .1329282   .0117421    11.32   0.000     .1099142    .1559422
                       73  |   .1480458   .0113851    13.00   0.000     .1257314    .1703603
                       75  |   .1615023   .0112522    14.35   0.000     .1394483    .1835562
                       77  |   .2215681   .0112623    19.67   0.000     .1994945    .2436418
                       78  |   .2603374   .0115062    22.63   0.000     .2377857    .2828891
                       80  |   .2685209    .011652    23.04   0.000     .2456834    .2913585
                       82  |   .2858463   .0113927    25.09   0.000      .263517    .3081756
                       83  |   .3132819    .011535    27.16   0.000     .2906736    .3358902
                       85  |   .3656784    .011431    31.99   0.000      .343274    .3880827
                       87  |   .3814745   .0113629    33.57   0.000     .3592036    .4037454
                       88  |   .4370321   .0113003    38.67   0.000     .4148839    .4591802
                           |
                     _cons |     1.4612   .0109536   133.40   0.000     1.439732    1.482669
              -------------+----------------------------------------------------------------
                   sigma_u |  .36492114
                   sigma_e |  .30294584
                       rho |  .59200363   (fraction of variance due to u_i)
              ------------------------------------------------------------------------------
              
              . testparm i.year
              
               ( 1)  69.year = 0
               ( 2)  70.year = 0
               ( 3)  71.year = 0
               ( 4)  72.year = 0
               ( 5)  73.year = 0
               ( 6)  75.year = 0
               ( 7)  77.year = 0
               ( 8)  78.year = 0
               ( 9)  80.year = 0
               (10)  82.year = 0
               (11)  83.year = 0
               (12)  85.year = 0
               (13)  87.year = 0
               (14)  88.year = 0
              
                         chi2( 14) = 3202.52
                       Prob > chi2 =    0.0000
              
              .
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                You first need to add i.CY to the list of regressors when using regress (or whichever other command you are using to estimate the model).

                Edit: Carlo was faster.
                https://www.kripfganz.de/stata/

                Comment


                • #9
                  Thank you so much, Carlo and Sebastian!
                  Now it worked out.

                  My results are the following:

                  Code:
                   ( 1)  2004.CY = 0
                   ( 2)  2005.CY = 0
                   ( 3)  2006.CY = 0
                   ( 4)  2007.CY = 0
                   ( 5)  2008.CY = 0
                   ( 6)  2009.CY = 0
                   ( 7)  2010.CY = 0
                   ( 8)  2011.CY = 0
                   ( 9)  2012.CY = 0
                   (10)  2013.CY = 0
                   (11)  2014.CY = 0
                   (12)  2015.CY = 0
                  
                         F( 12,  2462) =   26.70
                              Prob > F =    0.0000
                  Is it correct, that I can reject the null hypothesis (since Prob > F is < 0.05) that the coefficients for all years are jointly euqal to zero. Therefore I do need time fixed effects in this case?
                  So I do need to add year dummies, right?

                  That leads me back to my original question, how do I generate year dummies?

                  Thank you!
                  Best,
                  Lena

                  Comment


                  • #10
                    Lena:
                    what -testparm- is telling you is that you'd better include -i.CY- as a predictor.
                    I do not understand your last question: you have already generated year dummies via -i:CY-.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Hello Carlo,

                      my procedure was the following:
                      first I entered the command
                      Code:
                      xtreg sales_annual articles_WSJ_annual i.CY, fe
                      then, I entered
                      Code:
                      testparm i.CY
                      Does that mean, that I already included i.CY as a predictor and have therefore generated year dummies?

                      Sorry if this seems a stupid question

                      Thank you
                      best, Lena
                      Last edited by Chelsea Ludwig; 14 Dec 2017, 04:10.

                      Comment


                      • #12
                        Lena:
                        yes, your intution is correct.
                        Actually, via factor variable notation (see -help fvvarlist-) Stata has generated the year dummies for you (that is -i.CY-) and omitted 2003 automatically to avoid the so called "dummy trap".
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Thank you so much for your help Carlo!

                          Comment


                          • #14
                            Dear Carlo. I have a somewhat similar question. I'm running a logistic regression using the logit command. My thesis supervisor told me to run the regression as mentioned below. I have set up the variables in stata but I am struggling with operationalizing the dummy variables. How do I make sure that stata recognizes that the dummy year variables are connected to the correct year. Say the ROA in 2013 is 0.78 and that year dummy is set to 1 (event occurred in that year); how do I make sure that stata includes the 0.78 of that year? Should I add interaction variables for every year dummy so that Stata does 0.78*1 (for example)?
                            Below you can find my input in stata:

                            . logit international roa1_w leverage_w dummy_2019 dummy_2020 dummy_2018 dummy_2017 dummy_2016 dummy_2015 dummy_2014 dummy_2013 dummy_2012 ln_turnover_2020 ln_turnover_2019 ln
                            > _turnover_2018 ln_turnover_2017 ln_turnover_2016 ln_turnover_2015 ln_turnover_2014 ln_turnover_2013 ln_turnover_2012

                            note: dummy_2012 omitted because of collinearity.
                            Iteration 0: log likelihood = -569.21942
                            Iteration 1: log likelihood = -556.42306
                            Iteration 2: log likelihood = -556.37253
                            Iteration 3: log likelihood = -556.3725

                            Logistic regression Number of obs = 822
                            LR chi2(19) = 25.69
                            Prob > chi2 = 0.1389
                            Log likelihood = -556.3725 Pseudo R2 = 0.0226

                            ----------------------------------------------------------------------------------
                            international | Coefficient Std. err. z P>|z| [95% conf. interval]
                            -----------------+----------------------------------------------------------------
                            roa1_w | -.2747988 .1533583 -1.79 0.073 -.5753755 .0257779
                            leverage_w | .2657568 .3724137 0.71 0.475 -.4641606 .9956743
                            dummy_2019 | -.4517241 .344697 -1.31 0.190 -1.127318 .2238696
                            dummy_2020 | -.5865182 .3407913 -1.72 0.085 -1.254457 .0814205
                            dummy_2018 | -.4733611 .3476267 -1.36 0.173 -1.154697 .2079747
                            dummy_2017 | -.502913 .3517165 -1.43 0.153 -1.192265 .1864386
                            dummy_2016 | -.2641382 .3785688 -0.70 0.485 -1.006119 .4778431
                            dummy_2015 | .0760428 .5423545 0.14 0.888 -.9869525 1.139038
                            dummy_2014 | .2204868 .5088472 0.43 0.665 -.7768355 1.217809
                            dummy_2013 | -.8334708 .4160443 -2.00 0.045 -1.648903 -.0180389
                            dummy_2012 | 0 (omitted)
                            ln_turnover_2020 | .0934439 .0347541 2.69 0.007 .0253271 .1615607
                            ln_turnover_2019 | -.0525974 .0493168 -1.07 0.286 -.1492566 .0440618
                            ln_turnover_2018 | -.0077357 .0489943 -0.16 0.875 -.1037627 .0882913
                            ln_turnover_2017 | .0031539 .0489214 0.06 0.949 -.0927303 .0990381
                            ln_turnover_2016 | .0155531 .0524566 0.30 0.767 -.0872599 .1183661
                            ln_turnover_2015 | .0327029 .0555374 0.59 0.556 -.0761485 .1415543
                            ln_turnover_2014 | .0117654 .0500905 0.23 0.814 -.0864101 .1099409
                            ln_turnover_2013 | -.0437958 .0510383 -0.86 0.391 -.1438291 .0562376
                            ln_turnover_2012 | -.0155094 .0281762 -0.55 0.582 -.0707337 .0397148
                            _cons | .1889482 .3890296 0.49 0.627 -.5735358 .9514321
                            ----------------------------------------------------------------------------------



                            Info provided by professor:

                            Logit model:

                            DV: 1=international 0=domestic

                            IV: R&amp;D intensity
                            • Winsorize measure
                            • If missing, assume 0
                            Controls:
                            • Financial characteristics of the focal firm
                            • Industry (SIC 2 digit) and year dummies
                            • Do not include controls for the alliance
                            Table:
                            • Column 1 with just controls and year dummies
                            • Column 2 adding R&D intensity
                            • Column 3 adding industry dummies
                            logit international x1 x2 …

                            I'm really hoping to see what your valuable expertise suggests and otherwise, how I should rephrase my question.

                            Best regards,

                            Thijs Titulaer

                            Comment


                            • #15
                              Thijs:
                              welcome to this forum.
                              First off, please read the FAQ and act on them to post more effectively (using CODE delimiters on a
                              on a routine basis would help interested listers to discover what's the matter with your data).
                              That said:
                              1) time and industry dummy are more manageable when grouped together in one variable only (then you can use -label- to distinguish years and industries). See also -help fvvarlist-.
                              2) I'm not that supportive in winsorizing variables, as data are what they are. If your supervisor fears the so called outliers, set aside mistakes in data entries, there are simply a matter of fact.
                              3) I'm totally unsupportive of replacing missing values with 0 (or 1 ,or whatever), because missing is missing. It's relevant to diagnose the mechanism according to data are missing and if their missingness is (on not) informative.
                              4) you can check the joint significance of -year- post -logit- via -testparm-.
                              Kind regards,
                              Carlo
                              (Stata 19.0)

                              Comment

                              Working...
                              X