The reviewers told me to use a better simulation, so a better simulation I shall use. Consider the following data-generating process, very very similar to Prop 99's setup
At current, cigarette sales are a function of covariates: population, price, income levels, alcohol use, and noise- perhaps this is reasonable, perhaps not. How would I generate these outcomes, however, under a linear-factor model? That is, how would I program time-invariant factor loadings that affect the cigarette sales of all units similarly, and time-varying factors that affect the same differently? It seems like they generate the factor loadings here... but how would I generate the time-components?
Code:
clear * set obs 38 set seed 1066 // 38 units egen id = seq(), f(1) t(38) cls expand 40 // 40 time periods qbys id: g time = _n+1969 keep if inrange(time,1970,2000) xtset id time, g su `r(timevar)', mean loc yearmin =r(min) // Generate population data bys id: g population = runiformint(4000000,40000000) if time ==`yearmin' replace pop=L1.pop+rnormal(100000,50000) if time>`yearmin' // Generate income data bys id: g income = runiformint(20000,40000) if time ==`yearmin' replace income=L1.income+rnormal(1000,500) if time>`yearmin' replace income = ln((income/pop)*100000) // Generate proportion of alcohol drinkers data bys id: g growth = runiformint(10000,40000) if time ==`yearmin' replace growth=L1.growth+rnormal(2,10) if time>`yearmin' replace growth = (growth/pop)*100 replace pop = ln(population) g pop2 = exp(pop) // Generate price data bys id: g price = runiformint(27.3,42.2) if time ==`yearmin' replace price=L1.price+rnormal(8,1.5) if time>`yearmin' cls //!! Generate cigarette sales per capita data bys id: g cigsale = ((pop2)*rbeta(1,1900)-(price*40)-(income*2)-(growth*(.4))) if time == `yearmin' bys id: replace cigsale = (cigsale/pop2)*100000 bys id: replace cigsale = cigsale+150 bys id: replace cig=L1.cig-rnormal(2,4) if time >`yearmin'
Comment