Dear Statalist,
I have an unbalance panel dataset, which includes around 500 IDs and each ID has eight years of hourly observations for variable outputs and variable lag-outputs. It is a fairly large dataset.
What I want to do is run two regressions for each ID-pair in two time window (hour 00-07 and hour 08-23). Something like:
where t is hour and i-j is an ID-pair.
I face two difficulties now:
1. My data is not structured as what should be for this task. See below, one row of data is for one ID(say i) now instead of for one ID-pair(say i-j). I can change the structure by merging j to each i by time. But this poses a difficulty because the data is large. The current data has around 26,000,000 rows. If I restructure the data, the numbers of observation will explode. Is there any way this can be worked out? Or do I have to construct separate data one-by-one for each i-j pair. If so, how should I do that?
2. I think I will need to run multiple regressions in loop. How can I store the regressions results from each loop? Or how should I store the coefficient I am interested from each loop?
I appreciate every comments. Thanks a lot!
Below is a snap of what my data look like:

I think I need the data to be like below in order to run the regression:
I have an unbalance panel dataset, which includes around 500 IDs and each ID has eight years of hourly observations for variable outputs and variable lag-outputs. It is a fairly large dataset.
What I want to do is run two regressions for each ID-pair in two time window (hour 00-07 and hour 08-23). Something like:
Code:
forval time=0/1{reg output_it output_jt lag-output_it if timewindow==`time'
I face two difficulties now:
1. My data is not structured as what should be for this task. See below, one row of data is for one ID(say i) now instead of for one ID-pair(say i-j). I can change the structure by merging j to each i by time. But this poses a difficulty because the data is large. The current data has around 26,000,000 rows. If I restructure the data, the numbers of observation will explode. Is there any way this can be worked out? Or do I have to construct separate data one-by-one for each i-j pair. If so, how should I do that?
2. I think I will need to run multiple regressions in loop. How can I store the regressions results from each loop? Or how should I store the coefficient I am interested from each loop?
I appreciate every comments. Thanks a lot!
Below is a snap of what my data look like:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input double output str10 time double id float lagoutput 0 "2018031900" 7 0 0 "2018031900" 8 0 0 "2018031900" 10 0 1089.30615234375 "2018031900" 11 1095.8276 0 "2018031901" 7 0 78.58380126953125 "2018031901" 8 79.20284 0 "2018031901" 10 0 1092.7843017578125 "2018031901" 11 1094.5234 end
I think I need the data to be like below in order to run the regression:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input double output_i str10 time double id_i float lagoutput double output_j double id_j 0 "2018031900" 7 0 0 8 0 "2018031900" 7 0 0 10 0 "2018031900" 7 0 1089.30615234375 11 0 "2018031900" 8 0 0 7 0 "2018031900" 8 0 0 10 0 "2018031900" 8 0 1089.30615234375 11 0 "2018031900" 10 0 0 7 0 "2018031900" 10 0 0 8 0 "2018031900" 10 0 1089.30615234375 11 1089.30615234375 "2018031900" 11 1095.8276 0 7 1089.30615234375 "2018031900" 11 1095.8276 0 8 1089.30615234375 "2018031900" 11 1095.8276 0 10 0 "2018031901" 7 0 78.58380126953125 8 0 "2018031901" 7 0 0 10 0 "2018031901" 7 0 1092.7843017578125 11 78.58380126953125 "2018031901" 8 79.20284 0 7 78.58380126953125 "2018031901" 8 79.20284 0 10 78.58380126953125 "2018031901" 8 79.20284 1092.7843017578125 11 0 "2018031901" 10 0 0 7 0 "2018031901" 10 0 78.58380126953125 8 0 "2018031901" 10 0 1092.7843017578125 11 1092.7843017578125 "2018031901" 11 1094.5234 0 7 1092.7843017578125 "2018031901" 11 1094.5234 78.58380126953125 8 1092.7843017578125 "2018031901" 11 1094.5234 0 10 end