My understanding has always been that any form of "stepwise elimination of insignificant variables" invalidates further inference (especially SE-estimates and p-values). This would be because stepwise procedures pick up random patterns in the data and understate the number of estimated parameters. I am trying to write a simulation in Stata to illustrate that point, but fail.
My generated data consists of a random treatment indicator, a 0-treatment effect, and 10 potential control variables, of which 4 actually go into the outcome. I use "stepwise" to pick the control variables for an ATE regression with control variables. My expectation would be, that if I repeat the process 1000 times and store the p-values for the average treatment effect, I would end up with a grossly non-uniformly distributed set of p-values.
The resulting histogram looks as if the p-values are nicely and uniformly distributed between 0 and 1 (maybe with a tiny kink in the higher ranges). This is in spite of using a rather small sample of only 100 observations

Is there something in my data generating process that is responsible for this? Does 'real' experimental data come with features which make these issues more pronounced? How could I adapt by simulation to illustrate the issues further, while retaining random treatment.
My generated data consists of a random treatment indicator, a 0-treatment effect, and 10 potential control variables, of which 4 actually go into the outcome. I use "stepwise" to pick the control variables for an ATE regression with control variables. My expectation would be, that if I repeat the process 1000 times and store the p-values for the average treatment effect, I would end up with a grossly non-uniformly distributed set of p-values.
Code:
set seed 1 cap mat drop p set matsize 2000 forvalues x = 1/2000 { clear //generate data with no treatment effect qui set obs 100 gen treatment = runiform()>.5 forvalues i = 1/10 { gen x`i'=rnormal() } gen y = x1 + 1/3*x2 + 1/9*x3 + 1/27*x4 + rnormal() //use stepwise to pick controls to be used in the ATE regression qui stepwise, pr(0.1) lockterm1: reg y treatment x* qui testparm treatment mat p=nullmat(p)\r(p) di "." _cont } clear svmat p hist p1, bin(5)
Is there something in my data generating process that is responsible for this? Does 'real' experimental data come with features which make these issues more pronounced? How could I adapt by simulation to illustrate the issues further, while retaining random treatment.
Comment