Hello,
Imagine you have a payment schedule for the number of widgets you produce. In the code below, this payment schedule is laid out in the "lookup" dataset. If you produce 0-3 widgets, you get paid 500. If you produce 3.0001 - 6, you get paid 600, etc.
I have a second dataset that has the number of widgets produced for a bunch of individuals ("detail" dataset, value = number of widgets produced). How might I do this in a memory-efficient way? Solutions that involve looping over individual observations don't work well in my case where I'm dealing with in the range of 100k individuals.
Thank you!!
Imagine you have a payment schedule for the number of widgets you produce. In the code below, this payment schedule is laid out in the "lookup" dataset. If you produce 0-3 widgets, you get paid 500. If you produce 3.0001 - 6, you get paid 600, etc.
I have a second dataset that has the number of widgets produced for a bunch of individuals ("detail" dataset, value = number of widgets produced). How might I do this in a memory-efficient way? Solutions that involve looping over individual observations don't work well in my case where I'm dealing with in the range of 100k individuals.
Thank you!!
Code:
// prep example lookup dataset clear set obs 3 gen range_lo = . gen range_hi = . gen output = . replace range_lo = 0 in 1 replace range_lo = 3 in 2 replace range_lo = 6 in 3 replace range_hi = 3 in 1 replace range_hi = 6 in 2 replace range_hi = 10 in 3 replace output = 500 in 1 replace output = 600 in 2 replace output = 800 in 3 tempfile lookup save `lookup' // prep example detail dataset clear set obs 1000 gen value = _n/100 gen output = .

Comment