Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Simulating a Linear Factor Model

    The reviewers told me to use a better simulation, so a better simulation I shall use. Consider the following data-generating process, very very similar to Prop 99's setup

    Code:
    clear *
    
    set obs 38
    set seed 1066
    
    // 38 units
    
    
    egen id = seq(), f(1) t(38)
    
    cls
    expand 40
    
    
    // 40 time periods
    
    qbys id: g time = _n+1969
    
    keep if inrange(time,1970,2000)
    
    xtset id time, g
    
    su `r(timevar)', mean
    
    loc yearmin =r(min)
    
    // Generate population data
    
    bys id: g population = runiformint(4000000,40000000) if time ==`yearmin'
    
    replace pop=L1.pop+rnormal(100000,50000) if time>`yearmin'
    
    
    // Generate income data
    
    
    bys id: g income = runiformint(20000,40000) if time ==`yearmin'
    
    replace income=L1.income+rnormal(1000,500) if time>`yearmin'
    
    
    replace income = ln((income/pop)*100000)
    
    // Generate proportion of alcohol drinkers data
    
    bys id: g growth = runiformint(10000,40000) if time ==`yearmin'
    
    replace growth=L1.growth+rnormal(2,10) if time>`yearmin'
    
    
    replace growth = (growth/pop)*100
    
    replace pop = ln(population)
    
    g pop2 = exp(pop)
    
    // Generate price data
    
    
    bys id: g price = runiformint(27.3,42.2) if time ==`yearmin'
    
    replace price=L1.price+rnormal(8,1.5) if time>`yearmin'
    
    cls
    
    //!! Generate cigarette sales per capita data
    
    
    bys id: g cigsale = ((pop2)*rbeta(1,1900)-(price*40)-(income*2)-(growth*(.4))) if time == `yearmin'
    
    bys id: replace cigsale = (cigsale/pop2)*100000
    
    bys id: replace cigsale = cigsale+150
    
    bys id: replace cig=L1.cig-rnormal(2,4) if time >`yearmin'
    At current, cigarette sales are a function of covariates: population, price, income levels, alcohol use, and noise- perhaps this is reasonable, perhaps not. How would I generate these outcomes, however, under a linear-factor model? That is, how would I program time-invariant factor loadings that affect the cigarette sales of all units similarly, and time-varying factors that affect the same differently? It seems like they generate the factor loadings here... but how would I generate the time-components?

  • #2
    EDIT: Found it!!!!!!
    Code:
    clear *
    {
    set obs 38
    set seed 1066
    
    
    egen id = seq(), f(1) t(38)
    
    cls
    
    
    generate u_i1 = 19 // latent factor lodings
    
    generate u_i2 = 4 // latent factor lodings
    
    g u_i3 = 7.2 // latent factor lodings
    
    expand 40
    
    qbys id: g time = _n+1969
    
    keep if inrange(time,1970,2000)
    
    bys id: g u_t = runiform()
    
    
    xtset id time, g
    
    su `r(timevar)', mean
    
    loc yearmin =r(min)
    
    // Generate population data
    
    bys id: g population = runiformint(8000000,40000000) if time ==`yearmin'
    
    replace pop=L1.pop+rnormal(100000,50000) if time>`yearmin'
    
    
    // Generate income data
    
    
    bys id: g income = runiformint(20000,40000) if time ==`yearmin'
    
    replace income=L1.income+rnormal(1000,500) if time>`yearmin'
    
    
    replace income = ln((income/pop)*100000)
    
    // Generate proportion of alcohol drinkers data
    
    bys id: g growth = runiformint(10000,40000) if time ==`yearmin'
    
    replace growth=L1.growth+rnormal(2,10) if time>`yearmin'
    
    
    replace growth = (growth/pop)*100
    
    replace pop = ln(population)
    
    g pop2 = exp(pop)
    
    // Generate price data
    
    
    bys id: g price = runiformint(27.3,42.2) if time ==`yearmin'
    
    replace price=L1.price+rnormal(8,1.5) if time>`yearmin'
    
    cls
    }
    //!! Generate cigarette sales per capita data
    
    
    bys id: g cigsale = floor(((pop2)*rbeta(1,1900) ///
        -(price*30)+ ///
        (income*20)- ///
        (growth*(.9))) + ///
        ((u_i1*u_i2*u_i3)*u_t) + rnormal(45000,6500)) if time == `yearmin'
        
    
    {
    bys id: replace cigsale = (cigsale/pop2)*100000
    
    bys id: replace cig=L1.cig-rnormal(3,4) if time >`yearmin'
    cls
    
    tempvar storename id2
    
    loc entity State
    
    
    g `storename' = "`entity' "
    
    egen `id2' = concat(`storename' id)
    
    labmask id, values(`id2')
    
    cls
    
    loc int_time = 1989
    qui xtset
    local lbl: value label `r(panelvar)'
    
    loc unit ="`entity' 10":`lbl'
    
    g treated = cond(id==`unit' & time >= `int_time',1,0)
    
    lab var treated "Anti-Smoking Policy"
    
    
    clonevar cigte = cig
    
    replace cigte = cigsale - (cigsale * .2)+runiform(10,2) if id ==`unit' & time >=`int_time'
    cls
    qui levelsof id, loc(units)
    loc a: word count `units'
    forval i=1/`a'{
        local lcolors " `lcolors' plot`i'opts(lcol(gs10))"
    }
    loc treat = `a'-1
    
    qui xtset
    xtline cigte,  overlay ///
        `lcolors' ///
        addplot((connected cigsale `r(timevar)' if `r(panelvar)' ==`unit', ///
            connect(L) lcolor(black) mcol(none) lwidth(thick))) ///
        legend(off) ///order(1 "Donors" `treat' "Cali") ring(0) pos(7)
        yti("Cigarette Sales per Capita") ///
        tline(`int_time')
    It does require a bit of playing with to ensure the DGP is realistic, but this is how you simulate a linear factor model with 3 factor loadings.
    Last edited by Jared Greathouse; 03 Dec 2022, 10:24.

    Comment

    Working...
    X