Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data regression for each id pair

    Dear Statalist,

    I have an unbalance panel dataset, which includes around 500 IDs and each ID has eight years of hourly observations for variable outputs and variable lag-outputs. It is a fairly large dataset.

    What I want to do is run two regressions for each ID-pair in two time window (hour 00-07 and hour 08-23). Something like:
    Code:
    forval time=0/1{reg output_it output_jt lag-output_it if timewindow==`time'
    where t is hour and i-j is an ID-pair.

    I face two difficulties now:
    1. My data is not structured as what should be for this task. See below, one row of data is for one ID(say i) now instead of for one ID-pair(say i-j). I can change the structure by merging j to each i by time. But this poses a difficulty because the data is large. The current data has around 26,000,000 rows. If I restructure the data, the numbers of observation will explode. Is there any way this can be worked out? Or do I have to construct separate data one-by-one for each i-j pair. If so, how should I do that?
    2. I think I will need to run multiple regressions in loop. How can I store the regressions results from each loop? Or how should I store the coefficient I am interested from each loop?

    I appreciate every comments. Thanks a lot!

    Below is a snap of what my data look like:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double output str10 time double id float lagoutput
                     0 "2018031900"  7         0
                     0 "2018031900"  8         0
                     0 "2018031900" 10         0
      1089.30615234375 "2018031900" 11 1095.8276
                     0 "2018031901"  7         0
     78.58380126953125 "2018031901"  8  79.20284
                     0 "2018031901" 10         0
    1092.7843017578125 "2018031901" 11 1094.5234
    end
    Click image for larger version

Name:	Screen Shot 2022-10-19 at 12.57.02 PM.png
Views:	1
Size:	176.5 KB
ID:	1685980


    I think I need the data to be like below in order to run the regression:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double output_i str10 time double id_i float lagoutput double output_j double id_j
    0 "2018031900"  7         0 0 8 
    0 "2018031900"  7         0 0 10
    0 "2018031900"  7         0 1089.30615234375 11
    0 "2018031900"  8         0 0 7 
    0 "2018031900"  8         0 0 10
    0 "2018031900"  8         0 1089.30615234375 11
    0 "2018031900" 10         0 0 7
    0 "2018031900" 10         0 0 8    
    0 "2018031900" 10         0 1089.30615234375 11
    1089.30615234375 "2018031900" 11 1095.8276 0 7
    1089.30615234375 "2018031900" 11 1095.8276 0 8
    1089.30615234375 "2018031900" 11 1095.8276 0 10         
    0 "2018031901"  7         0 78.58380126953125 8
    0 "2018031901"  7         0 0 10
    0 "2018031901"  7         0 1092.7843017578125 11
    78.58380126953125 "2018031901"  8  79.20284 0 7
    78.58380126953125 "2018031901"  8  79.20284 0 10
    78.58380126953125 "2018031901"  8  79.20284 1092.7843017578125 11
    0 "2018031901" 10         0 0 7
    0 "2018031901" 10         0 78.58380126953125 8 
    0 "2018031901" 10         0 1092.7843017578125 11
    1092.7843017578125 "2018031901" 11 1094.5234 0 7                
    1092.7843017578125 "2018031901" 11 1094.5234 78.58380126953125 8
    1092.7843017578125 "2018031901" 11 1094.5234 0 10          
    end
    Click image for larger version

Name:	Screen Shot 2022-10-19 at 1.04.20 PM.png
Views:	1
Size:	618.7 KB
ID:	1685981
Working...
X