Panel Regression Help!!

Katherine Paradise

Join Date: Feb 2023

Posts: 14
#1

Panel Regression Help!!

26 Feb 2023, 20:27

Hi all!!

Firstly, thank you for reading/helping! I'm attempting a project on looking at CEO personal campaign contributions before the 2002 election, and seeing if that changed their stock price (pre/post 6 month average). I have an output of my dataex for the format of the data, and after some dataset transformation/manipulation, this is the final result I have. I'm curious as to what the best regression would be to use? Regular reg, or should I use areg? I was originally going to use, "reg changestockprice totalrep if post == 1 i.companyid, robust" but I also want to loop through other variables, like totaldem and towinners, for example. How would I go about writing a loop for that, and then making a table to append all of those regressions together? Apologies for the longer post, kind of stumped as to where to go from here, and also fix the collinearity problem I ran into when I ran the first regression previously mentioned. Any and all help would be so so appreciated!! Thank you!
Tags: foreach, panel, panel data, regression, Suggestion
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

27 Feb 2023, 00:36

Katherine:
1) as per FAQ, please do not post screenshots but use -dataex- and -CODE- delimiters to share an example/excerpt of your dataset and what you typed ann what Stata gave you back. Thanks.
That said:
2) why usinng -regress. or -areg. as your first options when -xtreg- is available (I assume that your regerssand is continuous);
3) how could interested listers help you out if you do not provide them with what described in 1)?

Kind regards,
Carlo
(Stata 19.0)
Comment
Katherine Paradise

Join Date: Feb 2023

Posts: 14
#3

27 Feb 2023, 09:40

thank you carlo so sorry!! Here is the dataex, areg was my first thought just due to the panel nature, but I'm unfamiliar with the benefits of xtreg, could you please assist? Thanks!

* Example generated by -dataex-. For more info, type help dataex
clear
input str43 corpname str8 ticker float(companyid totaldem totalrep) str9(toincumbs towinner tolosers) double _2002_ byte post float changestockprice
"Apple Inc" "AAPL" 1 2300 0 "0" "0" "0" 57.79999923706055 1 .
"Apple Inc" "AAPL" 1 2300 0 "0" "0" "0" 51.54999923706055 2 -6.25
"Amerisourcebergen Corp" "ABC" 2 3667 10750 "16417" "11417" "1500" 54.5099983215332 1 .
"Amerisourcebergen Corp" "ABC" 2 3667 10750 "16417" "11417" "1500" 48.90000152587891 2 -5.609997
"Abbott Laboratories" "ABT" 3 15450 266740 "86600" "88800" "18200" 50.2599983215332 1 .
"Abbott Laboratories" "ABT" 3 15450 266740 "86600" "88800" "18200" 43.65000152587891 2 -6.609997
"Archer-Daniels-Midland Co" "ADM" 4 2000 0 "43500" "1000" "0" 53.54999923706055 1 .
"Archer-Daniels-Midland Co" "ADM" 4 2000 0 "43500" "1000" "0" 49.7400016784668 2 -3.8099976
"Automatic Data Processing" "ADP" 5 700 4300 "1000" "200" "500" 51.54999923706055 1 .
"Automatic Data Processing" "ADP" 5 700 4300 "1000" "200" "500" 59.47999954223633 2 7.93
"Ameren" "AEE" 6 11450 14200 "12750" "11500" "6250" 53.23714229038784 1 .
"Ameren" "AEE" 6 11450 14200 "12750" "11500" "6250" 48.370000566755024 2 -4.867142
"American Electric Power Co Inc" "AEP" 7 28820 83020 "70700" "59150" "18800" 53.23714229038784 1 .
"American Electric Power Co Inc" "AEP" 7 28820 83020 "70700" "59150" "18800" 15.5 2 -37.73714
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17712

27 Feb 2023, 10:16

Katherine:
considering numerical variables only (that is, discarding those in -string- format, as they're not suitable for Stata procedures that imply numerical values unless they go -destring- before):

Code:

. xtset companyid post

Panel variable: companyid (strongly balanced)
 Time variable: post, 1 to 2
         Delta: 1 unit

. xtreg _2002_ totaldem totalrep i.post, fe vce(cluster companyid)
note: totaldem omitted because of collinearity.
note: totalrep omitted because of collinearity.

Fixed-effects (within) regression               Number of obs     =         14
Group variable: companyid                       Number of groups  =          7

R-squared:                                      Obs per group:
     Within  = 0.2827                                         min =          2
     Between =      .                                         avg =        2.0
     Overall = 0.1611                                         max =          2

                                                F(1,6)            =       2.18
corr(u_i, Xb) = 0.0000                          Prob > F          =     0.1901

                              (Std. err. adjusted for 7 clusters in companyid)
------------------------------------------------------------------------------
             |               Robust
      _2002_ | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
    totaldem |          0  (omitted)
    totalrep |          0  (omitted)
      2.post |  -8.136325   5.507594    -1.48   0.190    -21.61292    5.340273
       _cons |   53.44918   2.753797    19.41   0.000     46.71088    60.18748
-------------+----------------------------------------------------------------
     sigma_u |  7.1817043
     sigma_e |  9.8995374
         rho |  .34481657   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

Four caveats:
1) you cannot get away with panel data regression if you do not invest a (relevan) amount of your time to learn the goal and the spirit of the game (start for -xtreg- entry and related references);
2) the -fe- estimator is the first choice in this kind of analysis, but wipes out all time-invariant variables;
3) the evidence of a panel effect does should not be taken for granted;
4) cluster-robut standard errors (that I invoked to show you this option) work well if you have at least 30 panels in your dataset see (
https://cameron.econ.ucdavis.edu/res...5_February.pdf).

Kind regards,
Carlo
(Stata 19.0)

Comment

Katherine Paradise

Join Date: Feb 2023

Posts: 14
#5

27 Feb 2023, 11:20

Thank you Carlo! I greatly appreciate your assistance (apologies again, as I am new to the program/platform). Would dropping the first instance of each company (that has identical information for totaldem and totalrep) fix the collinearity problem? Or, would that no longer make it a panel dataset, as there would be one observation per company? Then if I modified the data that way, would I run a regular "reg" instead of xtreg?

Very, very grateful for your help!!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#6

27 Feb 2023, 12:18

Katherine:
1) omitting the first observation has two consequences: a) you're actually making up your original dataset (and this is not scientific); b) you're turning a panel dataset into a cross-sectional one (via a)): this way, you end up with a made-up sample that has nothing to do with the original one.
2) the -fe- estimator wipes out time-invariant variable. We should live with that or, if appropriate, switch to the -re- estimator (that has other possible drawbacks though). See -hausman-.

As an aside, the point 2) implies that there'e evidence of a panel-wise effect in your dataset. If this were not the case, you should switch to a pooled OLS.

Kind regards,
Carlo
(Stata 19.0)
Comment
Katherine Paradise

Join Date: Feb 2023

Posts: 14
#7

27 Feb 2023, 12:39

Thank you again Carlo!!

I am pretty confused as to what to do with my dataset at this point. In my research question, I am hoping to see if how much an individual CEO donated to republican/democrat changed the six month average stock price pre/post election. I understand that since the totalrep and totaldem amount is the same because it is for the amount donated prior to the election, and doesn't have a time element, and is removed for collinearity. How would I be able to observe the effect for my question? Just really lost at this point, but again am grateful for your assistance.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#8

28 Feb 2023, 00:27

Katherine:
assuming that -_2002_- is your dependent variable, from your data excerpt there's no evidence of a within-panel variation led by the -timevar- -post-.
In addition:
1) I doubt that your data support the evidence of a panel-wise effect (as sigma_e>sigma_u), but it may be the effect of non-default standar errors with such a limited numner of panels;
2) I'd try to collect other time-varying predictor and see whether (or not) something change in your results.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement