Panel data regression for each id pair

Yifei Liu

Join Date: Aug 2022
Posts: 26

Panel data regression for each id pair

19 Oct 2022, 12:11

Dear Statalist,

I have an unbalance panel dataset, which includes around 500 IDs and each ID has eight years of hourly observations for variable outputs and variable lag-outputs. It is a fairly large dataset.

What I want to do is run two regressions for each ID-pair in two time window (hour 00-07 and hour 08-23). Something like:

Code:

forval time=0/1{reg output_it output_jt lag-output_it if timewindow==`time'

where t is hour and i-j is an ID-pair.

I face two difficulties now:
1. My data is not structured as what should be for this task. See below, one row of data is for one ID(say i) now instead of for one ID-pair(say i-j). I can change the structure by merging j to each i by time. But this poses a difficulty because the data is large. The current data has around 26,000,000 rows. If I restructure the data, the numbers of observation will explode. Is there any way this can be worked out? Or do I have to construct separate data one-by-one for each i-j pair. If so, how should I do that?
2. I think I will need to run multiple regressions in loop. How can I store the regressions results from each loop? Or how should I store the coefficient I am interested from each loop?

I appreciate every comments. Thanks a lot!

Below is a snap of what my data look like:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double output str10 time double id float lagoutput
                 0 "2018031900"  7         0
                 0 "2018031900"  8         0
                 0 "2018031900" 10         0
  1089.30615234375 "2018031900" 11 1095.8276
                 0 "2018031901"  7         0
 78.58380126953125 "2018031901"  8  79.20284
                 0 "2018031901" 10         0
1092.7843017578125 "2018031901" 11 1094.5234
end

Click image for larger version

Name: Screen Shot 2022-10-19 at 12.57.02 PM.png
Views: 1
Size: 176.5 KB
ID: 1685980

I think I need the data to be like below in order to run the regression:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double output_i str10 time double id_i float lagoutput double output_j double id_j
0 "2018031900"  7         0 0 8 
0 "2018031900"  7         0 0 10
0 "2018031900"  7         0 1089.30615234375 11
0 "2018031900"  8         0 0 7 
0 "2018031900"  8         0 0 10
0 "2018031900"  8         0 1089.30615234375 11
0 "2018031900" 10         0 0 7
0 "2018031900" 10         0 0 8    
0 "2018031900" 10         0 1089.30615234375 11
1089.30615234375 "2018031900" 11 1095.8276 0 7
1089.30615234375 "2018031900" 11 1095.8276 0 8
1089.30615234375 "2018031900" 11 1095.8276 0 10         
0 "2018031901"  7         0 78.58380126953125 8
0 "2018031901"  7         0 0 10
0 "2018031901"  7         0 1092.7843017578125 11
78.58380126953125 "2018031901"  8  79.20284 0 7
78.58380126953125 "2018031901"  8  79.20284 0 10
78.58380126953125 "2018031901"  8  79.20284 1092.7843017578125 11
0 "2018031901" 10         0 0 7
0 "2018031901" 10         0 78.58380126953125 8 
0 "2018031901" 10         0 1092.7843017578125 11
1092.7843017578125 "2018031901" 11 1094.5234 0 7                
1092.7843017578125 "2018031901" 11 1094.5234 78.58380126953125 8
1092.7843017578125 "2018031901" 11 1094.5234 0 10          
end

Click image for larger version

Name: Screen Shot 2022-10-19 at 1.04.20 PM.png
Views: 1
Size: 618.7 KB
ID: 1685981

Tags: data, loop, panel data, Regression Output, unbalance panel data

Announcement

Panel data regression for each id pair