Hi
I often find myself working with large data files in long format.
Typically, i am looking to tag a unique row, and then fill it to the rest of the panel.
However, this is painfully slow across 10 million rows.
whereas a piece of code from a Nick Cox presentation is much much faster
egen takes 40 seconds across 10 million rows and bysort, replace takes 20 second
I would have thought they would be doing similar things so why the time difference?
or does anyone have any faster suggestions?
bw
Adrian
I often find myself working with large data files in long format.
Typically, i am looking to tag a unique row, and then fill it to the rest of the panel.
Code:
by panel , sort : egen filled = min(tag)
whereas a piece of code from a Nick Cox presentation is much much faster
Code:
gen filled = . bysort panel (tag): replace filled = min(filled[_n-1], tag)
I would have thought they would be doing similar things so why the time difference?
or does anyone have any faster suggestions?
bw
Adrian
Comment