Dear All,
I posted earlier today but did not get a response. So, probably I did not do a good job posting my question. I will try again.
I have a panel data at the individual (N)-week level. I have 14 weeks/ waves - 7 before and 7 after an intervention. The 10 percent sample, which is not balanced, looks as follows:
I want to run a program to calculate the mean square prediction error for panels of varying lengths.
But its not running. It does nothing. I will greatly appreciate some help please.
Sincerely,
Sumedha.
I posted earlier today but did not get a response. So, probably I did not do a good job posting my question. I will try again.
I have a panel data at the individual (N)-week level. I have 14 weeks/ waves - 7 before and 7 after an intervention. The 10 percent sample, which is not balanced, looks as follows:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str10 npi int year float week int userTRA "J338339LLR" 2014 4 0 "J338339LLR" 2014 6 0 "J338339LLR" 2014 7 0 "J33833J3J3" 2014 2 0 "J33833J99R" 2014 2 1 "J33833JOLJ" 2014 4 0 "J33833NF9L" 2014 5 0 "J33833R8F8" 2014 1 0 "J33833RLFF" 2014 7 0 "J33833RO8R" 2014 2 0 "J338383FRV" 2014 7 0 "J338383R89" 2014 2 0 "J33838FR9R" 2014 6 0 "J33838LJ8R" 2014 3 0 "J33838LVFO" 2014 6 1 "J33838RFNL" 2014 1 1 "J338393FOR" 2014 7 0 "J338398J88" 2014 6 0 "J3383998JF" 2014 4 0 "J338399JRF" 2014 5 0 "J338399N33" 2014 2 1 "J338399V3R" 2014 7 0 "J33839F99O" 2014 6 1 "J33839F99O" 2014 7 0 "J33839FNL3" 2014 5 0 "J33839JFRV" 2014 6 0 "J33839JLRL" 2014 5 0 "J33839NNOL" 2014 6 0 "J33839O383" 2014 4 0 "J33839O8R8" 2014 2 0 "J33839OR8R" 2014 6 2 "J3383F33NJ" 2014 2 0 "J3383F38JN" 2014 4 0 "J3383F988V" 2014 2 0 "J3383FN3VR" 2014 2 1 "J3383FNFNL" 2014 1 0 "J3383FNFNL" 2014 5 0 "J3383FR8L9" 2014 2 0 "J3383FROOF" 2014 2 0 "J3383FVRVO" 2014 5 0 "J3383J3983" 2014 1 0 "J3383J3JV8" 2014 3 1 "J3383J88FO" 2014 3 0 "J3383J8RJV" 2014 3 0 "J3383J8RJV" 2014 4 0 "J3383J8VFV" 2014 5 0 "J3383JONVF" 2014 5 0 "J3383JRLJ8" 2014 2 1 "J3383JRLJ8" 2014 7 1 "J3383L3VJV" 2014 7 0 "J3383L88NV" 2014 5 0 "J3383LF888" 2014 1 0 "J3383LFR3J" 2014 7 0 "J3383LJJFO" 2014 2 0 "J3383LL9RN" 2014 6 1 "J3383LLN8N" 2014 5 0 "J3383LLVFJ" 2014 1 0 "J3383LLVFJ" 2014 5 0 "J3383LRVO8" 2014 2 0 "J3383LVFOR" 2014 7 0 "J3383LVN93" 2014 3 0 "J3383N83R8" 2014 5 0 "J3383N888L" 2014 2 0 "J3383N9LFJ" 2014 5 0 "J3383NL93R" 2014 2 0 "J3383NLV8O" 2014 7 1 "J3383NNJFV" 2014 1 0 "J3383NNJFV" 2014 6 1 "J3383NO3RF" 2014 6 0 "J3383NVJNJ" 2014 4 0 "J3383O3LVV" 2014 7 0 "J3383OFJJL" 2014 5 0 "J3383ON8LO" 2014 3 1 "J3383OOO9F" 2014 1 2 "J3383OORLN" 2014 1 0 "J3383ORLLO" 2014 2 0 "J3383OVN8F" 2014 4 0 "J3383OVRF3" 2014 6 0 "J3383R3F9N" 2014 2 0 "J3383RF9O9" 2014 6 0 "J3383RF9O9" 2014 7 0 "J3383RNV9R" 2014 1 0 "J3383ROLLV" 2014 6 0 "J3383V3OOO" 2014 4 0 "J3383V3OOO" 2014 7 0 "J3383V83FL" 2014 6 0 "J3383V83N9" 2014 7 0 "J3383V9J89" 2014 3 0 "J3383VL398" 2014 1 1 "J3383VL398" 2014 2 1 "J338F8FJFV" 2014 4 0 "J338FLNVF3" 2014 5 0 "J338FR9RLL" 2014 3 0 "J338FRR3LF" 2014 6 0 "J338FRV38F" 2014 5 0 "J338J33FOV" 2014 3 0 "J338J33RNO" 2014 5 1 "J338J38JR3" 2014 5 1 "J338J398ON" 2014 1 1 "J338J39OV3" 2014 5 0 end
For prediction, I want to iteratively leave out one individual each time (drop all waves of this one individual) and then use the estimates from the remaining sample to predict outcome for the individual who was left out. I repeat this one-by-one for each individual in the panel.
Then I add the prediction errors for all and store in a matrix. Next, I want to repeat this exercise for different panel lengths. So, I re-do the iterative exercise by leaving out one observation each time and calculating the prediction error for it using the estimates calculated for the remaining observations for pre-intervention panels of 7, 6, 5, 4, 3 and 2 weeks.
The idea being that I want to optimize the panel length by minimizing the MSPE for the 7 week period prior to the intervention. The program is as follows:Code:
/*NOTES: cllr_crossval The goal is to estimate the bandwidth that minimizes the IMSE of a local linear regression. A grid search is used and estimation is based on the cllr program described above. Arguments outcome: a stata variable containing the dependent variable x: a stata variable containing the independent variable start: a hardcoded number or local variable defining start of a sequence candidate bandwidths step: a hardcoded number or local variable defining the stepsize of the sequence of candidate bandwidth stop: a hardcoded number or local variable defining the end of a sequence of candidate bandwidths. sub: a stata variable set to 1 if the observation should be included in the analysis Returns A stata matrix and set of stata variables that contain the estimated IMSE for each candidate bandwidth. */ sort npi gen N=_n if npi[_n]~=npi[_n-1] bysort npi: egen maxN=max(N) replace N=maxN if N==. bysort N week: gen counter=_n drop if counter>1 xtset N week gen outcome = userTRA gen x = week capture program drop cllr_crossval program define cllr_crossval set more off args outcome x start step stop sub narrowsub tempvar cx ew e2 e2n local stop = 7 local start = 1 local step = 1 *make a matrix to store the estimated IMSE local size = ((`stop' - `start')/`step')+1 matrix M = J(`size', 3, .) *Iterate over candidate bandwidths local count = 0 forvalues h = `start'(`step')`stop'{ *increment counter local count = `count' + 1 *store location on the bandwidth grid matrix M[`count', 1] = `h' *initialize the residual variable gen `e2' = . *Iterate over observations forvalues i = 1(1)`N'{ capture quietly reghdfe /*regress*/ `outcome' `x' if _n~=`i' & week=<`h', absorb(npi) replace `e2' = (`outcome' - _b[_cons])^2 in `i' } *compute IMSE for the candidate bandwidth su `e2' matrix M[`count',2] = r(mean) drop `e2' } matrix list M svmat M end
Sincerely,
Sumedha.
Comment