Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Effective sample size using Poisson with exposure variable

    We have claims data for hundreds of millions of people and want to look at how emergency department visit diagnoses changed at different periods of time. We want to report the percent change from baseline in the rate of ED visits with diagnosis_i per 1000 person-months of insurance enrollment at time_j.

    With the related problem looking at case mix (percent of ED visits with diagnosis_i at time_j), we had an easier time with calculations because there are a smaller number of ED visits than people with coverage. With our computing resources (which are fixed), we could just manage an ED visit-level dataset, but we don't have enough RAM for a person-month dataset. The grad student I'm working with was using a Poisson model for case mix with clustered standard errors at the patient level. We were talking about how to manage the person-month calculations. My suggestion was a Poisson model where the dependent variable is the total number of ED visits with diagnosis_i at time_j and an exposure variable of person_months_coverage_j. With this approach we can't account for the repeated observations of the same people, but my guess was that there would be minimal impact because most of the diagnoses we're looking at are unlikely to recur in the same patient over our study period. But I wanted to check to make sure this wouldn't be a huge issue. I created a very simple simulation to check the difference in standard errors using the summary dataset approach vs. the person-level analysis. I simulated two types of conditions with different correlation : one that's rare and can't recur in later periods (appendicitis) and one that's more common and is more likely to occur at time_j if it occurred at time_(j-1) [MI/heart attack].

    I used three approaches to the calculation in my little simulation: Poisson with clustered standard errors, population averaged/GEE Poisson, and summary analysis with no correction of standard errors. The results are the same to the 3rd decimal place, so I'm happy with the summary approach, but the grad student is concerned about a regression with 3 observations and 3 estimated coefficients (constant plus two time period IRRs), which goes against all the rules we teach in econometrics. Given the results, I suspect the effective sample size in the summary regression is similar to that in the individual-level analyses, but I have no idea how to explain it. I asked two outstanding econometricians on Twitter, but neither had any suggestions.



    Note on code: I used Ben Jann's eststo and esttab to display results

    Code:
    clear
    set obs 3000000
    gen id=_n
    gen t=3
    expand t
    bysort id : gen time=_n
    drop t
    gen appendicitis=0
    replace appendicitis=1 if runiform()<.0006 & t==1
    replace appendicitis=1 if runiform()<.0009 & t==2
    replace appendicitis=1 if runiform()<.0003 & t==3
    
    xtset id t
    
    replace appendicitis=0 if L.appendicitis==1
    replace appendicitis=0 if L2.appendicitis==1
    
    gen mi=0
    replace mi=1 if runiform()<.01 & t==1
    replace mi=1 if runiform()<.013 & t==2
    replace mi=1 if runiform()<.011 & t==3
    
    replace mi=1 if L.mi==1 & runiform()<.15
    replace mi=1 if L.mi==1 & runiform()<.15
    
    
    poisson appendicitis i.time, vce(cluster id) irr
    eststo a_cluster
    xtgee appendicitis i.time, family(poisson) link(log) corr(exchangeable) vce(robust) eform
    eststo a_gee
    preserve
    
    contract time appendicitis
    reshape wide _freq, i(time) j(appendicitis)
    gen exposure=_freq0+_freq1
    rename _freq1 appendicitis
    
    poisson appendicitis i.time, exposure(exposure) irr
    eststo a_summary
    restore
    
    poisson mi i.time, vce(cluster id) irr
    eststo m_cluster
    xtgee mi i.time, family(poisson) link(log) corr(exchangeable) vce(robust) eform
    eststo m_gee 
    preserve
    
    contract time mi
    reshape wide _freq, i(time) j(mi)
    gen exposure=_freq0+_freq1
    rename _freq1 mi
    
    poisson mi i.time, exposure(exposure) irr
    eststo m_summary
    restore
    
    
    esttab a_cluster a_gee a_summary, b(3) ci(3) eform mtitles drop(1.time)
    esttab m_cluster m_gee m_summary, b(3) ci(3) eform mtitles drop(1.time)
Working...
X