Modelling problem with Pooled Cross-sectional/Panel Data

Federico Mancuso

Join Date: Apr 2020
Posts: 2

Modelling problem with Pooled Cross-sectional/Panel Data

01 Apr 2020, 12:24

Dear Statlist Users,

I have a data set with 500 firms, 550 mergers and ranging from 1996 to 2010.

Basically for each merger, I have an acquirer and observations for the acquirer firm from 5 years before to 5 years after the merger.

I want to predict a_sales_t+1,i using independent variables, including: a_sales_t,ia_sales_t-1,ia_market_t-1,ipurpose1_str_i
With i being a firm and t the year in which the merger occurred, namely merger_year. So year_i = t, if year_i = merger_year_i

Each merger has a unique id sdcdealno, each firm has a unique id a_cusip6. As you can imagine, the same acquirer id a_cusip6 can appear for different mergers.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str10 sdcdealno str6 a_cusip6 int merger_year double year str17(purpose1_str purpose2_str) double(a_market a_sales)
"1000844020" "500453" 2000 1995 "synergy"           "none"                        0  512.248
"1000844020" "500453" 2000 1996 "synergy"           "none"                       14  579.096
"1000844020" "500453" 2000 1997 "synergy"           "none"                        0   632.47
"1000844020" "500453" 2000 1998 "synergy"           "none"                       16  330.872
"1000844020" "500453" 2000 1999 "synergy"           "none"                       16  335.249
"1000844020" "500453" 2000 2000 "synergy"           "none"                        7  358.463
"1000844020" "500453" 2000 2001 "synergy"           "none"                        0  282.613
"1000844020" "500453" 2000 2002 "synergy"           "none"                       20  286.704
"1000844020" "500453" 2000 2003 "synergy"           "none"                        9  438.292
"1000844020" "500453" 2000 2004 "synergy"           "none"                        8  458.377
"1000844020" "500453" 2000 2005 "synergy"           "none"                        0  685.946
"1018256020" "882508" 2001 1996 "strengthen"        "none"                     3406    13128
"1018256020" "882508" 2001 1997 "strengthen"        "none"                        0     9940
"1018256020" "882508" 2001 1998 "strengthen"        "none"                      795     9750
"1018256020" "882508" 2001 1999 "strengthen"        "none"                      805     8460
"1018256020" "882508" 2001 2000 "strengthen"        "none"                       43     9468
"1018256020" "882508" 2001 2001 "strengthen"        "none"                    808.5    11860
"1018256020" "882508" 2001 2002 "strengthen"        "none"        946.8333129882813     8201
"1018256020" "882508" 2001 2003 "strengthen"        "none"       2022.6666259765625     8383
"1018256020" "882508" 2001 2004 "strengthen"        "none"                        0     9834
"1018256020" "882508" 2001 2005 "strengthen"        "none"                        0    12580
"1018256020" "882508" 2001 2006 "strengthen"        "none"                        1    13392
"1019582020" "920355" 2002 1997 "product_extension" "technology"                  2  790.175
"1019582020" "920355" 2002 1998 "product_extension" "technology"                  0  859.799
"1019582020" "920355" 2002 1999 "product_extension" "technology"                 12 1017.271
"1019582020" "920355" 2002 2000 "product_extension" "technology"                  0 1155.134
"1019582020" "920355" 2002 2001 "product_extension" "technology"                 12 1387.677
"1019582020" "920355" 2002 2002 "product_extension" "technology"                 22  1483.32
"1019582020" "920355" 2002 2003 "product_extension" "technology"                 40  1920.97
"1019582020" "920355" 2002 2004 "product_extension" "technology"                  9 2126.853
end

My doubt concerns how to model this data conceptually and in Stata.

It is not a proper pooled cross-sectional data set, since a same firm could appear in the range 1996-2006 and in the range 2000-2010, if it makes an acquisition in different years. Thus, the assumption of independent samples is violated.

It is not a balanced panel data set, because I don't have observations for each firm for the entire period (1996 to 2019) but only for the t-5, t+5 range.
I have some merger observations where, say t-5 is missing. Yet, I don't think the definition of unbalanced panel data applies, because if I dropped those observations, the data set would still be not balanced in strict terms.

So:

Q1: How should I state in Stata the type of data I have?
xtset does not seem the most appropriate for the reasons above mentioned.

Q2: What regression model should I use? FE would eliminate time-invariant variables such as purpose1_str.
Perhaps either RE or a normal OLS, stating in both cases the lagged independent variables, or alternatively the lead dependent variable.

Thank you in advance for your suggestions

Tags: categorical, panel data, pooled cross section, regression, Time Series

Federico Mancuso

Join Date: Apr 2020

Posts: 2
#2

02 Apr 2020, 10:15

For users who may bump into this question in the future

Looking a bit more around I understood that I have a so called rolling window model.
Possibilities include using either simply generating lag independent variables after using:

Code:

xtset id time

Then

Code:

bysort id (time) gen xlag = l.x

Finally, regress with either:

Code:

reg y x xlag if merger==1 *in my case I created a variable called merger=year-merger_year, so if merger=1 it means I am regressing on the obs. of y 1 year after the merger

Or:

Code:

ssc install rangestat rangestat (reg) y x xlag, interval(time -5 5) *alternatively one can use ssc install asreg

Even though I have not tried them yet, so I would not know the difference.

Furthermore, I am pretty sure my question regarding the use of either FE or RE was quite beginner level, since it seems obvious they do not apply for a rolling window data set!

Anyway, any thought from more experienced people on the overall issue would be appreciated
Comment

Announcement

Modelling problem with Pooled Cross-sectional/Panel Data

Comment