Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Modelling problem with Pooled Cross-sectional/Panel Data

    Dear Statlist Users,

    I have a data set with 500 firms, 550 mergers and ranging from 1996 to 2010.

    Basically for each merger, I have an acquirer and observations for the acquirer firm from 5 years before to 5 years after the merger.

    I want to predict a_salest+1,i using independent variables, including: a_salest,i a_salest-1,i a_markett-1,i purpose1_stri
    With i being a firm and t the year in which the merger occurred, namely merger_year. So yeari = t, if yeari = merger_yeari

    Each merger has a unique id sdcdealno, each firm has a unique id a_cusip6. As you can imagine, the same acquirer id a_cusip6 can appear for different mergers.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str10 sdcdealno str6 a_cusip6 int merger_year double year str17(purpose1_str purpose2_str) double(a_market a_sales)
    "1000844020" "500453" 2000 1995 "synergy"           "none"                        0  512.248
    "1000844020" "500453" 2000 1996 "synergy"           "none"                       14  579.096
    "1000844020" "500453" 2000 1997 "synergy"           "none"                        0   632.47
    "1000844020" "500453" 2000 1998 "synergy"           "none"                       16  330.872
    "1000844020" "500453" 2000 1999 "synergy"           "none"                       16  335.249
    "1000844020" "500453" 2000 2000 "synergy"           "none"                        7  358.463
    "1000844020" "500453" 2000 2001 "synergy"           "none"                        0  282.613
    "1000844020" "500453" 2000 2002 "synergy"           "none"                       20  286.704
    "1000844020" "500453" 2000 2003 "synergy"           "none"                        9  438.292
    "1000844020" "500453" 2000 2004 "synergy"           "none"                        8  458.377
    "1000844020" "500453" 2000 2005 "synergy"           "none"                        0  685.946
    "1018256020" "882508" 2001 1996 "strengthen"        "none"                     3406    13128
    "1018256020" "882508" 2001 1997 "strengthen"        "none"                        0     9940
    "1018256020" "882508" 2001 1998 "strengthen"        "none"                      795     9750
    "1018256020" "882508" 2001 1999 "strengthen"        "none"                      805     8460
    "1018256020" "882508" 2001 2000 "strengthen"        "none"                       43     9468
    "1018256020" "882508" 2001 2001 "strengthen"        "none"                    808.5    11860
    "1018256020" "882508" 2001 2002 "strengthen"        "none"        946.8333129882813     8201
    "1018256020" "882508" 2001 2003 "strengthen"        "none"       2022.6666259765625     8383
    "1018256020" "882508" 2001 2004 "strengthen"        "none"                        0     9834
    "1018256020" "882508" 2001 2005 "strengthen"        "none"                        0    12580
    "1018256020" "882508" 2001 2006 "strengthen"        "none"                        1    13392
    "1019582020" "920355" 2002 1997 "product_extension" "technology"                  2  790.175
    "1019582020" "920355" 2002 1998 "product_extension" "technology"                  0  859.799
    "1019582020" "920355" 2002 1999 "product_extension" "technology"                 12 1017.271
    "1019582020" "920355" 2002 2000 "product_extension" "technology"                  0 1155.134
    "1019582020" "920355" 2002 2001 "product_extension" "technology"                 12 1387.677
    "1019582020" "920355" 2002 2002 "product_extension" "technology"                 22  1483.32
    "1019582020" "920355" 2002 2003 "product_extension" "technology"                 40  1920.97
    "1019582020" "920355" 2002 2004 "product_extension" "technology"                  9 2126.853
    end
    My doubt concerns how to model this data conceptually and in Stata.

    It is not a proper pooled cross-sectional data set, since a same firm could appear in the range 1996-2006 and in the range 2000-2010, if it makes an acquisition in different years. Thus, the assumption of independent samples is violated.

    It is not a balanced panel data set, because I don't have observations for each firm for the entire period (1996 to 2019) but only for the t-5, t+5 range.
    I have some merger observations where, say t-5 is missing. Yet, I don't think the definition of unbalanced panel data applies, because if I dropped those observations, the data set would still be not balanced in strict terms.

    So:
    Q1: How should I state in Stata the type of data I have?
    xtset does not seem the most appropriate for the reasons above mentioned.

    Q2: What regression model should I use? FE would eliminate time-invariant variables such as purpose1_str.
    Perhaps either RE or a normal OLS, stating in both cases the lagged independent variables, or alternatively the lead dependent variable.
    Thank you in advance for your suggestions

  • #2
    For users who may bump into this question in the future

    Looking a bit more around I understood that I have a so called rolling window model.
    Possibilities include using either simply generating lag independent variables after using:

    Code:
    xtset id time
    Then

    Code:
    bysort id (time) gen xlag = l.x
    Finally, regress with either:

    Code:
    reg y x xlag if merger==1
    *in my case I created a variable called merger=year-merger_year, so if merger=1 it means I am regressing on the obs. of y 1 year after the merger
    Or:

    Code:
    ssc install rangestat
    rangestat (reg) y x xlag, interval(time -5 5)
    
    *alternatively one can use
    ssc install asreg
    Even though I have not tried them yet, so I would not know the difference.

    Furthermore, I am pretty sure my question regarding the use of either FE or RE was quite beginner level, since it seems obvious they do not apply for a rolling window data set!

    Anyway, any thought from more experienced people on the overall issue would be appreciated

    Comment

    Working...
    X