Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Balance Table using Ranksum and Chi2

    Hi Everybody!

    I'm currently trying to produce a balance table for two different samples. The variables I'm looking at are either continuous (e.g., an outcome variable between 0 and 6) or binary (e.g., socio-economic characteristics such as gender). I was able to produce a code for another project, in which I use a t-test to compare the two samples and produce a nice table for it (see code below). Now, I'd like to have the same table, but report the p-value for a Ranksum test for continuous variables and the p-value for a Chi2 test for the binary variables.

    Can anyone help me or has done this before?
    Thanks a lot for your help!

    Code:
    // set globals
    global nonbin        outcome_1 outcome_2 outcome_3
    global bin            gender tertiary_education convicted
    global testvar1        $nonbin $bin
    
    ** Sample 1
    estpost sum $testvar1 if rep==0
    matrix cgmean=e(mean)
    matrix cgsd=e(sd)
    local cgN = e(N)
    
    ** Sample 2
    estpost sum $testvar1 if rep==1
    matrix tgmean=e(mean)
    matrix tgsd=e(sd)
    local tgN = e(N)
    
    ** exclusion off all regressors 
    reg rep $testvar1
    test $testvar1
    scalar pvalue = (int(r(p)*1000)/1000)
    local p1 = pvalue
    
    ** difference in samples
    gen     cg = rep==0 
    replace cg = .     if rep==.
    estpost ttest $testvar1, by(cg)
    eststo cg_tg 
    estadd scalar pvalue: cg_tg
    estadd matrix cgmean: cg_tg 
    estadd matrix cgsd: cg_tg
    estadd matrix tgmean: cg_tg 
    estadd matrix tgsd: cg_tg
    
    global varlist1        outcome_1 outcome_2 outcome_3 gender tertiary_education convicted
    
    esttab cg_tg using "$tables/rcomparison.tex", replace ///
            prehead("\begin{tabular}{lcccccc}" /// // 
            "%" "\toprule" "%" /// 
            " & \multicolumn{2}{c}{Sample 1} & \multicolumn{2}{c}{Sample 2} &         &  \\" ///
            " & \multicolumn{2}{c}{N = `cgN'} & \multicolumn{2}{c}{N = `tgN'} &         &  \\" ///
            "   \cmidrule(r){2-3}  \cmidrule(r){4-5}  " ///
            " & Mean & SD & Mean & SD & \it{p-val}        & \# obs \\" /// 
            "%" "\vspace{-0.3cm}" "%" ) ///
            cells("cgmean(fmt(3) label((1))) cgsd(par fmt(2) label((2))) tgmean(fmt(3) label((3))) tgsd(par fmt(2) label((4))) p(star fmt(2) label((6))) count(fmt(0) label((7)))") ///
            nonumber noobs star(* 0.1 ** .05 *** 0.01) /// 
            keep($varlist1) ///
            order($varlist1) /// //     
            varlabels($labellist1) ///
            postfoot( /// //                "Joint test (\it{p}-value)     &        & & & & `p1'  \\" ///
            "\bottomrule" ///     
            " \addlinespace " ///
            "\end{tabular}" ///
            "\parbox{\textwidth}{\textit {Notes:} Joint significance test (P-value) = `p1'., * \(p<0.10\), ** \(p<0.05\), *** \(p<0.01\)}")

  • #2
    I wouldn't use a t-test for balance analysis, since the results depend on sample size (which is not the problem). Use standardized differences. covbal will do it.

    Comment

    Working...
    X