I have a couple hundred thousand lines, each of which represents a contract with a certain dollar amount, start date, and end date. I've successfully created a single daily time series of the data, dividing the dollar amounts into equal parts for each day each contract lasts (so if a specific contract lasts for an eight day period, each of these eight days gets 1/8th the total dollar amount in the daily time series). The code I wrote for this is:
gen day = _n
gen double daily_amount = 0
forvalues i = 1/`=_N' {
replace daily_amount = daily_amount + (total_dollar_amount[`i']/(end_date[`i'] - start_date[`i'] + 1)) if day >= start_date[`i'] & end_date[`i'] >= day
}
However, I also need to redo this process by creating separate time series using only data from each city the contracts can place in. There are over 3,000 cities. The following code works:
forvalues j = 1/3489 {
gen double daily_amount_`j' = 0
gen temp_`j' = 0
replace temp_`j' = federal_action_obligation if city_num == `j'
forvalues i = 1/`=_N' {
replace daily_amount_`j' = daily_amount_`j' + (temp_`j'[`i']/(end_date[`i'] - start_date[`i'] + 1)) if day >= start_date[`i'] & end_date[`i'] >= day
}
drop temp_`j'
}
But, unsurprisingly, the double loop ensures the process takes an incredibly long amount of time. Letting it run for a few hours and extrapolating, it seems like it would take at least three months to finish. This obviously isn't viable.
Is there a faster way of doing this? Perhaps by incorporating by or bysort into the variable generation process?
Thank you.
gen day = _n
gen double daily_amount = 0
forvalues i = 1/`=_N' {
replace daily_amount = daily_amount + (total_dollar_amount[`i']/(end_date[`i'] - start_date[`i'] + 1)) if day >= start_date[`i'] & end_date[`i'] >= day
}
However, I also need to redo this process by creating separate time series using only data from each city the contracts can place in. There are over 3,000 cities. The following code works:
forvalues j = 1/3489 {
gen double daily_amount_`j' = 0
gen temp_`j' = 0
replace temp_`j' = federal_action_obligation if city_num == `j'
forvalues i = 1/`=_N' {
replace daily_amount_`j' = daily_amount_`j' + (temp_`j'[`i']/(end_date[`i'] - start_date[`i'] + 1)) if day >= start_date[`i'] & end_date[`i'] >= day
}
drop temp_`j'
}
But, unsurprisingly, the double loop ensures the process takes an incredibly long amount of time. Letting it run for a few hours and extrapolating, it seems like it would take at least three months to finish. This obviously isn't viable.
Is there a faster way of doing this? Perhaps by incorporating by or bysort into the variable generation process?
Thank you.
Comment