How to handle 3-dimensional ("panel") data?

Rolf Miller

Join Date: Feb 2019
Posts: 11

How to handle 3-dimensional ("panel") data?

20 Feb 2019, 05:50

Hi,

I have a dataset of individual daily investor trading data. In total, there are about 1 million observations containing about 40,000 distinct investors with on average 25 trades each.
The data is 3-dimensional in the sense that there is a time variable date (in days), an investorID variable and a stockID variable.

Let's say I would like to investigate the effect of some exogenous day- and stock-specific signal (like an analyst forecast or a news annoucement on that particular stock) on the volume traded by each investor per day per stock.

Example of the data for 2 investorIDs:

Code:

clear
input float date int stockID double(investorID volume) float signal
17591 128 1   13 0
17591 449 1   80 0
17885  61 1   80 0
17885 686 1   60 1
17896 449 1  350 0
17896 752 1   80 0
18155 743 1  250 0
18851 760 1 1000 1
16502 775 2   50 0
16628 698 2   50 0
17021 625 2   13 0
17021 625 2   37 0
17554 775 2  100 0
17793 585 2   50 0
17793 752 2   50 0
17805 752 2   50 0
17815  61 2   50 0
17815 585 2  100 1
17815 585 2  100 1
17821  75 2   50 0
17821 591 2   50 0
17821 752 2  100 0
18522  61 2   50 0
18913  61 2   50 0
18913 760 2  200 0
end
format %td date

I tried the following two approaches:

1.) I collapsed the data by summing the trading volume per day per stock ID. This eliminates my investor ID-dimension (as all the volume on one day in one stock is aggregated) and I receive panel data which I can group on stock ID over time.
This yields me about 500 groups for the stock IDs. If I run

Code:

xtset stockID date
xtreg volume signal CONTROLS, cluster(stockID) fe

the aimed effect of variable signal on volume is not there:

Code:

Fixed-effects (within) regression               Number of obs      =    753191
Group variable: stockID                       Number of groups   =       451

R-sq:  within  = 0.0615                         Obs per group: min =        10
       between = 0.3930                                        avg =    1479.0
       overall = 0.1741                                        max =      1878

                                                F(120,489)         =      7.86
corr(u_i, Xb)  = -0.0281                        Prob > F           =    0.0000

                                (Std. Err. adjusted for 451 clusters in stockID)
--------------------------------------------------------------------------------
               |               Robust
volume |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
  signal |   -2.11476    2.80901    -0.75   0.452    -7.620337    3.390816

2.) Actually, I do not want to sum volume across investors. Therefore, I tried to cope with the 3 dimensions by collapsing the the data by date, investorID and stockID such that the resulting dataset contains summed volume on the individual investor level per day (some investors trade a specific stock multiple times per day, that's why I had to do this).
Then I run

Code:

 egen grouping = group(investorID stockID)
xtset grouping date
xreg volume signal CONTROLS, cluster(grouping) fe

In this case I get the following but with lower R-squared and an incredibly high number of groups as compared to observations, of course.

Code:

Fixed-effects (within) regression               Number of obs      =    854643
Group variable: grouping                        Number of groups   =    367014

R-sq:  within  = 0.0222                         Obs per group: min =         1
       between = 0.0368                                        avg =       2.3
       overall = 0.0344                                        max =       257

                                                F(117,368003)      =     33.31
corr(u_i, Xb)  = -0.2855                        Prob > F           =    0.0000

                            (Std. Err. adjusted for 367014 clusters in grouping)
--------------------------------------------------------------------------------
               |               Robust
  volume |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
   signal |   15.64934   3.922797     3.99   0.000     7.960774    23.33791

I am no expert on panel data regressions. Is it "common"/acceptable to have such a high number of groups in panel data? Is there a better approach that copes with my issue?

Any comments are very welcome. Thank you!

Last edited by Rolf Miller; 20 Feb 2019, 05:56.

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17739
#2

20 Feb 2019, 06:05

Rolf:
perhaps a different approach would entail to reduce your dimension from 3 to 2 by classifying stocks in different industries via a categorical variables and use it as a predictor:

Code:

xtset investors date xreg volume signal CONTROLS i.stock, cluster(investors) fe

Kind regards,
Carlo
(Stata 19.0)
Comment
Rolf Miller

Join Date: Feb 2019

Posts: 11
#3

20 Feb 2019, 06:25

Originally posted by Carlo Lazzaro View Post

Rolf:
perhaps a different approach would entail to reduce your dimension from 3 to 2 by classifying stocks in different industries via a categorical variables and use it as a predictor:

Code:

xtset investors date xreg volume signal CONTROLS i.stock, cluster(investors) fe

Dear Carlo,
Thanks for your input.
I think this approach is difficult as

Code:

xtset investorID date

would require that I aggregate positions across stockIDs which is not possible as die signal variable is stock-specific.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17739
#4

20 Feb 2019, 07:19

Rolf:
I see the issue.
Second try: can't you group stock with similar in whatever respect and create a categorical variable to be included as a predictor in the right-hand side of your regression equation?

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

How to handle 3-dimensional ("panel") data?

Comment

Comment

Comment