Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collapse Command, Complex Survey Design, and Difference-in-Differences Estimations

    Dear all,

    I am using the Behavioral Risk Factor and Surveillance System Data (BRFSS) from 2011 to 2017.

    I would like to employ a difference-in-differences estimation approach. For a graphical analysis I would like to compute averages over time across the treatment group and control group. In particular, I would like to create a graph to examine whether there are parallel trends during the pre-treatment period and to examine what happens after the treatment in both groups.

    Without survey data, the appropriate commands would be

    collapse (mean) y, by(treatment year)

    twoway connected y year, by(treatment)

    y: dependent variable
    treatment: treatment group dummy variable
    year: year variable

    However, the BRFSS is a compley survey that uses stratification, clustering and weights for sampling.

    This is why I use the following svyset command in Stata.

    svyset [pweight=_llcpwt], strata(_ststr) psu(_psu)

    My question is now. How can I produce collapsed data to graphically show the development in the unconditional mean of y over time and groups? It would also be great to produce confidence intervals for these graphs.

    I would be very glad if you could help me. I could not find a response to my answer. I have seen papers that use the BRFSS data and produce these graphs but I do not know how they did that.

    Thank you very much!

    Catherine


    Last edited by Catherine Colson; 27 Sep 2018, 14:30. Reason: adding tags

  • #2
    You can't do this with collapse, but you can plot the mean and confidence limits from svy: mean. In the following example with the auto data, rep78 stands for year and foreign stands for treatment. You will first need to download Roger Newson's xsvmat command from SSC. This is an enhanced version of the built-in svmat command, which converts matrices (in this case r(table)) to Stata data sets. The parameter "b" contains the means. Note the loop to cycle through the 0/1 levels of the "foreign". Substitute your treatment levels for these.
    Code:
    set more off
    sysuse auto, clear
    drop if rep78==.
    tempfile tm
    recode rep78 1/3=1 4=2 5=3
    svyset _n [pw = gear]
    save `tm', replace
    
    forvalues i = 0/1 {
    use `tm', clear
    svy, subpop(if foreign==`i'): mean  mpg  , over(rep78)
    matrix list r(table)
    xsvmat, from(r(table)') rownames(rname) names(col) norestore
    gen foreign =`i'
    gen rep78 = real(rname)
    tempfile t`i'
    save `t`i'', replace
    }
    
    use `t0', clear
    append using `t1'
    label define foreign  0 "Domestic" 1 "Foreign"
    label values foreign foreign
    rename b mean
    twoway connect ll mean ul rep78 , by(foreign) xlabel(1 2 3) saving(g01, replace)
    graph export g01.png, replace
    Click image for larger version

Name:	g01.png
Views:	1
Size:	32.0 KB
ID:	1463983

    Last edited by Steve Samuels; 29 Sep 2018, 19:54.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      The code above is not correct for your problem. In particular, the code employs subpop() to restrict the analysis to each category of foreign (the stand-in for treatment). The use of over() applies the subpop option to each category of rep78, the stand-in for year. However the subpop option is required only when group membership is known from information gathered at interview for each randomly selected subject. The effect of unnecessarily using subpop is usually to increase standard errors and widen confidence intervals.

      However , year and treatment for the BRFSS are not ascertained at interview. Rather they are fixed by design: the survey is done in every year and every state. I surmise from the reference to Difference-in-Differences, that states are the treatment units, with treatment occuring in specific years. Therefore the correct approach is to restrict svy: mean to specific states and treatment-years with if expressions. The resulting code is simpler than that above.
      Code:
      set more off
      sysuse auto, clear
      /* Create year treatment PSU and stratum variables variables */
      drop if rep78==.
      rename rep78 year
      rename foreign treat
      egen strat_id = group(year treat)
      rename trunk psu
      recode year 1/3=1 4=2 5=3
      
      svyset psu [pw = weight], strata(strat_id)
      tempfile tm
      save `tm', replace
      
      /* Apply svy: mean to each year and treatment category */
      forvalues i = 1/ 3{
      forvalues j = 0/1 {
          use `tm', clear
          qui svy: mean  mpg if year==`i' & treat==`j'
          xsvmat, from(r(table)')  names(col) norestore
          gen year=`i'
          gen treat =`j'
          tempfile t`i'`j'
          save `t`i'`j'', replace
      }
      }
      /* Append the t_ij files  */
      tempfile t0  // data set to hold the t_ij files
      save `t0', emptyok replace
      use `t0'
      forvalues i = 1/ 3{
      forvalues j = 0/1 {
          append using `t`i'`j''
      }
      }
      label define treat  0 "Control" 1 "Treated"
      label values treat treat
      rename b mean
      sort treat year
      list   treat year mean ll ul, sepby(treat)
      twoway connect ll mean ul year , by(treat) xlabel(1 2 3) saving(g01, replace)
      graph export g01.png, replace
      Last edited by Steve Samuels; 01 Oct 2018, 18:34.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        How can I import BRFSS data (ASCII) into Stata?

        Comment


        • #5
          Hi Michael,

          You can import BRFSS data into Stata by going to, for example, the page for the 2016 data: https://www.cdc.gov/brfss/annual_data/annual_2016.html
          Then click 2016 BRFSS Data(SAS Transport Format) under Data Files.
          Unzip the file and save it in a folder you will be using.

          Then in Stata, type

          import sasxport "(your working directory)\LLCP2016XPT\LLCP2016", clear
          svyset [pweight=_llcpwt], strata(_ststr) psu(_psu)

          The second line of code is to properly weight everything. I hope this is helpful! It took me way too long to figure this out on my own.

          Comment

          Working...
          X