Statalist - Forums for Discussing Stata

nbreg DiD: pooled post-period AME outside range of year-specific AMEs from separate model specifications

Shin Lee — Wed, 24 Jun 2026 20:50:52 GMT

I am running a difference-in-differences analysis with a negative binomial model and report three sets of estimates: a Year 1 effect, a Year 2+ effect, and a separate pooled post-period effect from an independent regression. In several subgroups, the pooled estimate falls outside the range defined by the Year 1 and Year 2+ estimates. Is this expected behavior when the pooled and period-specific estimates come from separate model specifications with different treatment indicators? Are there published examples in the health policy literature where this pattern is documented or discussed?

New -mixedpower- package for calculating power and sample size analytically for linear mixed and marginal models available from SSC

Matthew Burnell — Wed, 24 Jun 2026 18:45:05 GMT

Dear Statalist users,

With thanks to Kit Baum, I would like to introduce a new package mixedpower. As the title suggests, the synonymous program calculates power and sample size analytically for linear mixed models, typically for use in planning of an RCT with longitudinal continuous outcomes.

Code:

 ssc install mixedpower

There is flexibility in specifying the treatment effects, even allowing user-specified functions of the schedule time list (which needn't actually represent time), as well as the random effects or within-subject error structure. These aspects can even differ between treatment and control groups.

All variance parameter inputs may be entered manually or instead 'automatically' read-in from a 'suitable' mixed model in memory, saving on both time and risk of a mistake.

One may calculate power/sample size accounting for both a) the amount of longitudinal data collected at a given timepoint due to staggered recruitment and b) dropout, simultaneously. Dropout rates may differ between control and treatment groups, as can the allocation ratio.

One may also estimate power under situation where the nature of the treatment effect is mis-specified. For example, what is the power when you assume a proportionate slope effect for the treatment group in the analysis model, if the true treatment effect in fact changed non-linearly over time. mixedpower will also provide the subsequent slope effect estimate.

Additional programs in the package calculate power and sample size for 1) a multivariate mixed model (mvmixedpower) when you might want to synthesis multiple continuous outcomes to increase power, especially for an interim analysis and 2) a mixed model for 'directly measured' difference data (dmmixedpower). These programs come with less features.

The help files are unapologetically extensive and come with lots of examples. Here are some below:

1) First load Stata's pig dataset, and create a fake treatment group, as well as set week to starting at time zero. If we fit an unstructured marginal model, sometimes known as a mixed model for repeated measures (MMRM) on the first 5 measures (if week<=4) we may calculate power for a similarly scheduled trial but with n=100, testing just the final difference effect (which equals 2) whilst automatically loading all the variance parameters.

Code:

webuse pig , clear
gen trt=id>=25
replace week=week-1
mixed weight i.week i.week#1.trt if week<=4 || id: ,nocons resid(unstr, t(week)) reml
mixedpower, trtspec(factor) sched(0 1 2 3 4) altcont(factor) diff(0.5(0.5)2) lctest(0 0 0 1) n(100) marginal errxt(auto) nohead

This will give the following output, including the mixed model syntax for the implied analysis.

Code:

Mixed model syntax:
constraint 1 _b[0.time#1.trt]=0
mixed depvar i.time   i.time#1.trt   , constraints(1) || id_level2: , nocons resid(unstructured , t(time))
-------------------------------------------------------------------------------------------------------------------------

Calculating power for a 2-level mixed model with factor treatment effect parameterisation:
  visit schedule      = 0 1 2 3 4
  treatment effect(s) = 0.5 1 1.5 2
  alpha               = 0.050
  total sample size   = 100
  n in control arm    = 50
  n in treatment arm  = 50
  power               = 0.9147

2) An example emphasising the generalisability of mixedpower by recreating Stata's own power command for a cluster-randomised trial (output supressed):

Code:

power twomeans 0 0.4, power(0.9) m1(25) m2(25) sd(2) rho(0.1) kratio(2)
mixedpower, schedule(1(1)25) trtspec(intercept) altcont(noslope) difference(0.4) marginal errxt(input(exchangeable 4 0.1)) power(0.9) arat(1 2)

3) An example incorporating partial follow-up due to both staggered recruitment and dropout. Note, the output will also give the number of subjects reaching each visit of the schedule list:

Code:

mixedpower, trtspec(slope) schedule(0 1 2 3 4) diff(0.5) cov(10, 1\1, 2) error(10) n(500) alpha(0.1) strec(0.05 0.1 0.15 0.2 0.4) drop(0.1 0.05 0.05 0.05 0.75)

with output...

Code:

-------------------------------------------------------------------------------------------------------------------------
Mixed model syntax:
mixed depvar c.time   c.time#1.trt    || id_level2: time , cov(unstr)  resid(independent , t(time))
-------------------------------------------------------------------------------------------------------------------------

Calculating power for a 2-level mixed model with slope treatment effect parameterisation:
  visit schedule      = 0 1 2 3 4
  treatment effect(s) = 0.5
  alpha               = 0.100
  total sample size   = 500
  n in control arm    = 250
  n in treatment arm  = 250
  power               = 0.7718

Table of control and treatment group numbers (rounded) reaching each visit:
           |  visit 1  visit 2  visit 3  visit 4  visit 5
           |  time=0   time=1   time=2   time=3   time=4
-----------+-------------------------------------------------
control    |  225      191      159      120      75  
treatment  |  225      191      159      120      75

4) An example employing the user-supplied functions, where linear slopes are assumed for both groups but in fact the disease progression shows an 'early decline' with a long plateau from about year 2 to the end and the treatment effect is actually proportional to this complex function of time. This example recreates a result from a simulation study by Morgan et al**

Code:

mixedpower, trtspec(slope) sched(0(1)5) diff(-0.05) cov(0.5, .0354\.0354, 0.01) error(0.15) n(230) actualcont(user(-5*exp(-2*x)+5)) cbeta(6 0.2) actualtrt(user(-5*exp(-2*x)+5)) nosyn

Output:

Code:

Calculating power for a 2-level mixed model with slope treatment effect parameterisation but with actual user treatment effect:
  visit schedule      = 0 1 2 3 4 5
  treatment effect(s) = -0.05
  alpha               = 0.050
  total sample size   = 230
  n in control arm    = 115
  n in treatment arm  = 115
  power               = 0.5287

Please feel free to ask questions about the package, either here or by email (including any bugs spotted).

Matthew Burnell
MRC Centre of Research Excellence in Clinical Trial Innovation
University College London
London, UK
m.burnell@ucl.ac.uk

** Katy E. Morgan, Ian R. White, Chris Frost. How important is the linearity assumption in a sample size calculation for a randomised controlled trial where treatment is anticipated to affect a rate of change? BMC Medical Research Methodology (2023) 23:274 https://doi.org/10.1186/s12874-023-02093-2

Data Cleaning

aima khan — Tue, 23 Jun 2026 11:37:28 GMT

I am trying to run panel regression. Variables in sample have different number of observations starting from 17000 to 24000. I want to use maximum no.of observations but have the same number of observations in all the tested models. I tried drop command and it brings observations to a few thousand only. How do I ensure the same number of observation accross models while ensuring maximum numbers?

collect all results into a single table using collect

Debbie Burke — Mon, 22 Jun 2026 15:11:11 GMT

I am using stata 18.5 for windows

I have the following code. I want to see all the results of the loop into a single stacked table but only get the last looped result.
I've tried multple variation of the coding and cannot figure it out.

collect clear
svyset [pweight=weightvar]

foreach var in var list gender married etc... {

collect _r_b _r_ci: svy: mean hours_worked, over (year_group `var')

collect style cell result [_r_b _r_ci], nformat (%4.1f)
collect layout (`var'#result) (year_group#smdset)
}

collect style head cmdset, title(hide) level(hide)
collect preview

syntax model GBTM

Tommaso Salvitti — Mon, 22 Jun 2026 05:34:39 GMT

Good morning to everybody I have 4 variables measured at 3 timepoints: 12 months, 18 months, and 24 months. Is the syntax for choosing the GBTM model correct? I have 106 adults(at least two measurements per outcome).
Censoring limits were defined outcome by outcome in accordance with the observed empirical range, with a small margin beyond the extremes, as no unambiguous theoretical limits were available for these standardized variables.
Based on the data structure and criteria of parsimony, stability, and interpretability, with three time points, the search was limited to polynomial forms of order 0/1, without exploring quadratic terms. Although a quadratic specification is technically possible with three surveys, it is often poorly informative and potentially unstable, especially with multiple outcomes and a small sample size. What do you think? Thanks in advanced to everybody

Code:

**************************************************** 
 * GBTM MULTI-OUTCOME (4 outcomes, cnorm) * FINAL OPERATIONAL VERSION * CORRECT VERSION: pass uses OCC_pp, also checks TotProb and adds diagnostic entropy * Consistent with: Klijn + Nagin multitrajectory + recent review * * LOGIC: * STEP 0 = preliminary univariate exploration of individual outcomes * STEP 1 = choice of K in the multi-outcome model with equal initial order * STEP 2 = fixed K, structured comparison of all plausible 0/1 models * STEP 2B = refit/inspection of finalist models * * CRITERIA: * - BIC = primary criterion * - APPA / OCC_pp / minP / minTotProb / mismatch = adequacy/support criteria * - relative entropy = additional diagnostic of assignment clarity; NOT included in the pass * - absolute number of groups = descriptive; Does NOT qualify * - DELTABIC <= 2 = competing models * - final decision = BIC + parsimony + interpretability + classification diagnostics * - with 3 time points and MAXORDER=1, the possible orders are 0/1 
****************************************************

clear all
set more off
set seed 12345
set sortseed 12345

cd "C:\Users\xxxxxxxxxxx\Desktop\LCA_prova"

global DATAFILE "databasex.dta"
global IDVAR    "id"

****************************************************
* OUTCOME
****************************************************
global VAR1 "var1_12 var1_18 var1_24"
global VAR2 "var2_12 var2_18 var2_24"
global VAR3 "var3_12 var3_18 var3_24"
global VAR4 "var4_12 var4_18 var4_24"


**************************************************** 
 * CNORM RANGE - OPTIMIZED ON EMPIRICAL DATA * Expanded outward to ensure numerical stability and avoid artificial clipping 
****************************************************
global MIN1 -8
global MAX1  6
global MIN2 -9
global MAX2 13
global MIN3 -15
global MAX3  11
global MIN4 -3
global MAX4  9

****************************************************
* RICERCA
****************************************************
global MAXK       2
global MAXORDER   1
global STARTORDER 1
global NREFIT     5

****************************************************
* SOGLIE DI ADEGUATEZZA
**************************************************** 
 global THR_MINP 0.05 // minimum assigned proportion of the group global THR_MINTOTPROB 0.05 // minimum estimated proportion from posterior probabilities global THR_MINAPP 0.70 // minimum average posterior probability global THR_MINOCC 5 // minimum OCC_pp global THR_MAXMIS 0.05 // maximum mismatch 

global DELTABIC    2

****************************************************
* 
 PROGRAM: create times 
****************************************************
capture program drop make_time
program define make_time
    capture drop t1 t2 t3
    gen t1 = 0
    gen t2 = 1
    gen t3 = 2
end

****************************************************
* PROGRAMMA: statistiche post-traj
****************************************************
capture program drop gbtm_stats
program define gbtm_stats, rclass
    syntax , K(integer)

    capture drop Mp countG counter APP p n d OCC TotProb mismatch d_pp OCC_pp SD_post __sdtmp

    gen double Mp = 0
    foreach pr of varlist _traj_ProbG* {
        replace Mp = `pr' if `pr' > Mp
    }

    sort _traj_Group
    by _traj_Group: gen countG  = _N
    by _traj_Group: gen counter = _n
    by _traj_Group: egen double APP = mean(Mp)

    gen double p = countG/_N

    gen double TotProb = .
    forvalues gg = 1/`k' {
        quietly summarize _traj_ProbG`gg', meanonly
        replace TotProb = r(mean) if _traj_Group == `gg'
    }

    gen double mismatch = abs(TotProb - p)

    gen double OCC = .
    gen double OCC_pp = .
    if `k' > 1 {
        gen double n = APP/(1-APP)
        gen double d = p/(1-p)
        replace OCC = n/d
        gen double d_pp = TotProb/(1-TotProb)
        replace OCC_pp = n/d_pp
    }
    else {
        replace OCC    = 999
        replace OCC_pp = 999
    }

    * PROTEZIONE: Evita crash di Stata se un sottogruppo contiene un solo record (SD non calcolabile)
    gen double SD_post = .
    forvalues gg = 1/`k' {
        capture by _traj_Group: egen double __sdtmp = sd(_traj_ProbG`gg') if _traj_Group == `gg'
        if !_rc {
            replace SD_post = __sdtmp if _traj_Group == `gg'
            drop __sdtmp
        }
    }

    * Relative entropy (0-1)
    tempvar __hsum __plnp
    local entropy = 1
    if `k' > 1 {
        gen double `__hsum' = 0
        forvalues gg = 1/`k' {
            gen double `__plnp' = cond(_traj_ProbG`gg' > 0, _traj_ProbG`gg' * ln(_traj_ProbG`gg'), 0)
            replace `__hsum' = `__hsum' + `__plnp'
            drop `__plnp'
        }
        quietly summarize `__hsum', meanonly
        local entropy = 1 + (r(sum) / (_N * ln(`k')))
    }

    preserve
        keep if counter == 1
        quietly summarize APP, meanonly
        local minAPP  = r(min)
        local meanAPP = r(mean)
        quietly summarize p, meanonly
        local minP = r(min)
        quietly summarize TotProb, meanonly
        local minTotProb = r(min)
        quietly summarize mismatch, meanonly
        local maxMismatch = r(max)
        quietly summarize OCC, meanonly
        local minOCC = r(min)
        quietly summarize OCC_pp, meanonly
        local minOCCpp = r(min)
    restore

    local pass = (`minP' >= $THR_MINP) & ///
                 (`minTotProb' >= $THR_MINTOTPROB) & ///
                 (`minAPP' >= $THR_MINAPP) & ///
                 (`minOCCpp' >= $THR_MINOCC) & ///
                 (`maxMismatch' <= $THR_MAXMIS)

    return scalar minAPP      = `minAPP'
    return scalar meanAPP     = `meanAPP'
    return scalar minP        = `minP'
    return scalar minTotProb  = `minTotProb'
    return scalar maxMismatch = `maxMismatch'
    return scalar entropy     = `entropy'
    return scalar minOCC      = `minOCC'
    return scalar minOCCpp    = `minOCCpp'
    return scalar pass        = `pass'
end

**************************************************** 
 * TEMPORARY FILES 
****************************************************
tempfile phase0tmp step1tmp step2tmp finalists4 step2ranked

****************************************************

* PHASE 0: PRELIMINARY UNIVARIATE EXPLORATION
****************************************************
tempname h0
capture postclose `h0'
postfile `h0' str8 outcome int K str20 orders ///
    double ll aic bic minAPP minOCC minOCCpp minP minTotProb maxMismatch entropy pass ///
    using `phase0tmp', replace

forvalues vv = 1/4 {
    forvalues k = 1/$MAXK {
        use "$DATAFILE", clear
        sort $IDVAR, stable
        make_time
        local indep t1 t2 t3

        local oo ""
        forvalues g = 1/`k' {
            local oo "`oo' $STARTORDER"
        }
        local oo : list retok oo

        quietly capture traj, ///
            var(${VAR`vv'}) indep(`indep') order(`oo') model(cnorm) min(${MIN`vv'}) max(${MAX`vv'})
        if _rc continue

        quietly gbtm_stats, k(`k')
        post `h0' ("VAR`vv'") (`k') ("`oo'") (e(ll)) (e(AIC)) (e(BIC_n_subjects)) ///
            (r(minAPP)) (r(minOCC)) (r(minOCCpp)) ///
            (r(minP)) (r(minTotProb)) (r(maxMismatch)) (r(entropy)) (r(pass))
    }
}
postclose `h0'

use `phase0tmp', clear
save phase0_univariate_scan_4var.dta, replace

**************************************************** 
 * STEP 1: choice of K in multi-outcome 
****************************************************
tempname h1
capture postclose `h1'
postfile `h1' ///
    str5 stage int K str20 o1 str20 o2 str20 o3 str20 o4 ///
    int group nG ///
    double p TotProb APP OCC OCC_pp mismatch SD_post ///
    double ll aic bic minAPP meanAPP minOCC minOCCpp minP minTotProb maxMismatch entropy pass ///
    using `step1tmp', replace

forvalues k = 1/$MAXK {
    use "$DATAFILE", clear
    sort $IDVAR, stable
    make_time
    local indep t1 t2 t3

    local o1 ""
    local o2 ""
    local o3 ""
    local o4 ""
    forvalues g = 1/`k' {
        local o1 "`o1' $STARTORDER"
        local o2 "`o2' $STARTORDER"
        local o3 "`o3' $STARTORDER"
        local o4 "`o4' $STARTORDER"
    }
    local o1 : list retok o1
    local o2 : list retok o2
    local o3 : list retok o3
    local o4 : list retok o4

    quietly capture traj, multgroups(`k') ///
        var1($VAR1) indep1(`indep') order1(`o1') model1(cnorm) min1($MIN1) max1($MAX1) ///
        var2($VAR2) indep2(`indep') order2(`o2') model2(cnorm) min2($MIN2) max2($MAX2) ///
        var3($VAR3) indep3(`indep') order3(`o3') model3(cnorm) min3($MIN3) max3($MAX3) ///
        var4($VAR4) indep4(`indep') order4(`o4') model4(cnorm) min4($MIN4) max4($MAX4)
    if _rc continue

    quietly gbtm_stats, k(`k')
    local ll  = e(ll)
    local aic = e(AIC)
    local bic = e(BIC_n_subjects)
    local minAPP      = r(minAPP)
    local meanAPP     = r(meanAPP)
    local minOCC      = r(minOCC)
    local minOCCpp    = r(minOCCpp)
    local maxMismatch = r(maxMismatch)
    local entropy     = r(entropy)
    local minP        = r(minP)
    local minTotProb  = r(minTotProb)
    local pass        = r(pass)

    forvalues gg = 1/`k' {
        quietly summarize countG if _traj_Group == `gg', meanonly
        local nG = r(mean)
        quietly summarize p if _traj_Group == `gg', meanonly
        local pg = r(mean)
        quietly summarize TotProb if _traj_Group == `gg', meanonly
        local tpg = r(mean)
        quietly summarize APP if _traj_Group == `gg', meanonly
        local appg = r(mean)
        quietly summarize OCC if _traj_Group == `gg', meanonly
        local occg = r(mean)
        quietly summarize OCC_pp if _traj_Group == `gg', meanonly
        local occppg = r(mean)
        quietly summarize mismatch if _traj_Group == `gg', meanonly
        local misg = r(mean)
        
        local sdg = .
        quietly count if _traj_Group == `gg'
        if r(N) > 1 {
            quietly summarize SD_post if _traj_Group == `gg', meanonly
            local sdg = r(mean)
        }

        post `h1' ("STEP1") (`k') ("`o1'") ("`o2'") ("`o3'") ("`o4'") ///
            (`gg') (`nG') (`pg') (`tpg') (`appg') (`occg') (`occppg') (`misg') (`sdg') ///
            (`ll') (`aic') (`bic') (`minAPP') (`meanAPP') (`minOCC') (`minOCCpp') ///
            (`minP') (`minTotProb') (`maxMismatch') (`entropy') (`pass')
    }
}
postclose `h1'

use `step1tmp', clear
egen byte tagmodel = tag(K o1 o2 o3 o4)
keep if tagmodel
drop tagmodel
save step1_kselection_4var.dta, replace

gsort -pass -bic
count if pass == 1
if r(N) > 0 {
    keep if pass == 1
    gsort -bic
}
else {
    gsort -bic
}
quietly summarize K in 1, meanonly
local BESTK = r(min)
di as result "K selezionato = `BESTK'"

****************************************************

* STEP 2: STRUCTURED SEARCH (Safe Combinatorial Logic)
****************************************************
tempname h2
capture postclose `h2'
postfile `h2' ///
    str5 stage int K str20 o1 str20 o2 str20 o3 str20 o4 ///
    int group nG ///
    double p TotProb APP OCC OCC_pp mismatch SD_post ///
    double ll aic bic minAPP meanAPP minOCC minOCCpp minP minTotProb maxMismatch entropy pass ///
    using `step2tmp', replace

local k = `BESTK'
local base = $MAXORDER + 1
local ncomb = `base'^`k'

forvalues i1 = 1/`ncomb' {
    local o1 ""
    forvalues g = 1/`k' {
        local div   = `base'^(`k' - `g')
        local digit = mod(int((`i1' - 1)/`div'), `base')
        local o1 "`o1' `digit'"
    }
    local o1 : list retok o1

    forvalues i2 = 1/`ncomb' {
        local o2 ""
        forvalues g = 1/`k' {
            local div   = `base'^(`k' - `g')
            local digit = mod(int((`i2' - 1)/`div'), `base')
            local o2 "`o2' `digit'"
        }
        local o2 : list retok o2

        forvalues i3 = 1/`ncomb' {
            local o3 ""
            forvalues g = 1/`k' {
                local div   = `base'^(`k' - `g')
                local digit = mod(int((`i3' - 1)/`div'), `base')
                local o3 "`o3' `digit'"
            }
            local o3 : list retok o3

            forvalues i4 = 1/`ncomb' {
                local o4 ""
                forvalues g = 1/`k' {
                    local div   = `base'^(`k' - `g')
                    local digit = mod(int((`i4' - 1)/`div'), `base')
                    local o4 "`o4' `digit'"
                }
                local o4 : list retok o4

                use "$DATAFILE", clear
                sort $IDVAR, stable
                make_time
                local indep t1 t2 t3

                quietly capture traj, multgroups(`k') ///
                    var1($VAR1) indep1(`indep') order1(`o1') model1(cnorm) min1($MIN1) max1($MAX1) ///
                    var2($VAR2) indep2(`indep') order2(`o2') model2(cnorm) min2($MIN2) max2($MAX2) ///
                    var3($VAR3) indep3(`indep') order3(`o3') model3(cnorm) min3($MIN3) max3($MAX3) ///
                    var4($VAR4) indep4(`indep') order4(`o4') model4(cnorm) min4($MIN4) max4($MAX4)
                if _rc continue

                quietly gbtm_stats, k(`k')
                local ll  = e(ll)
                local aic = e(AIC)
                local bic = e(BIC_n_subjects)
                local minAPP      = r(minAPP)
                local meanAPP     = r(meanAPP)
                local minOCC      = r(minOCC)
                local minOCCpp    = r(minOCCpp)
                local maxMismatch = r(maxMismatch)
                local entropy     = r(entropy)
                local minP        = r(minP)
                local minTotProb  = r(minTotProb)
                local pass        = r(pass)

                forvalues gg = 1/`k' {
                    quietly summarize countG if _traj_Group == `gg', meanonly
                    local nG = r(mean)
                    quietly summarize p if _traj_Group == `gg', meanonly
                    local pg = r(mean)
                    quietly summarize TotProb if _traj_Group == `gg', meanonly
                    local tpg = r(mean)
                    quietly summarize APP if _traj_Group == `gg', meanonly
                    local appg = r(mean)
                    quietly summarize OCC if _traj_Group == `gg', meanonly
                    local occg = r(mean)
                    quietly summarize OCC_pp if _traj_Group == `gg', meanonly
                    local occppg = r(mean)
                    quietly summarize mismatch if _traj_Group == `gg', meanonly
                    local misg = r(mean)
                    
                    local sdg = .
                    quietly count if _traj_Group == `gg'
                    if r(N) > 1 {
                        quietly summarize SD_post if _traj_Group == `gg', meanonly
                        local sdg = r(mean)
                    }

                    post `h2' ("STEP2") (`k') ("`o1'") ("`o2'") ("`o3'") ("`o4'") ///
                        (`gg') (`nG') (`pg') (`tpg') (`appg') (`occg') (`occppg') (`misg') (`sdg') ///
                        (`ll') (`aic') (`bic') (`minAPP') (`meanAPP') (`minOCC') (`minOCCpp') ///
                        (`minP') (`minTotProb') (`maxMismatch') (`entropy') (`pass')
                }
            }
        }
    }
}
postclose `h2'

use `step2tmp', clear
save step2_models_4var.dta, replace

egen byte tagmodel = tag(K o1 o2 o3 o4)
keep if tagmodel
drop tagmodel
keep if K == `BESTK'

count if pass == 1
if r(N) > 0 {
    keep if pass == 1
}

gsort -bic
quietly summarize bic, meanonly
local bestbic = r(max)
keep if bic >= (`bestbic' - $DELTABIC)

gsort -bic -minAPP -minOCCpp maxMismatch
gen rank_finalista = _n
save `step2ranked', replace
save finalists_step2_4var.dta, replace

count
local NFINAL = r(N)
di as result "Numero modelli finalisti entro DeltaBIC = `NFINAL'"
list rank_finalista K o1 o2 o3 o4 bic minAPP minOCCpp maxMismatch entropy minP minTotProb pass, noobs

**************************************************** 
 * STEP 2B: refit of the finalist models 
****************************************************
local NINSPECT = cond(`NFINAL' < $NREFIT, `NFINAL', $NREFIT)
forvalues i = 1/`NINSPECT' {
    use finalists_step2_4var.dta, clear
    local CK  = K[`i']
    local CO1 = o1[`i']
    local CO2 = o2[`i']
    local CO3 = o3[`i']
    local CO4 = o4[`i']

    capture log close candlog
    log using "candidate4_`i'_K`CK'.smcl", replace name(candlog)

    use "$DATAFILE", clear
    sort $IDVAR, stable
    make_time
    local indep t1 t2 t3

    traj, multgroups(`CK') ///
        var1($VAR1) indep1(`indep') order1(`CO1') model1(cnorm) min1($MIN1) max1($MAX1) ///
        var2($VAR2) indep2(`indep') order2(`CO2') model2(cnorm) min2($MIN2) max2($MAX2) ///
        var3($VAR3) indep3(`indep') order3(`CO3') model3(cnorm) min3($MIN3) max3($MAX3) ///
        var4($VAR4) indep4(`indep') order4(`CO4') model4(cnorm) min4($MIN4) max4($MAX4)

    di as result "BIC = " e(BIC_n_subjects)
    di as result "AIC = " e(AIC)
    di as result "LL  = " e(ll)

    log close candlog
}

**************************************************** 
 * LEAD CANDIDATE ACCORDING TO PRE-SPECIFIED CRITERIA 
****************************************************
use `step2ranked', clear
gen byte _pick = (_n == 1)
quietly summarize K if _pick, meanonly
local FK = r(min)
levelsof o1 if _pick, local(FO1) clean
levelsof o2 if _pick, local(FO2) clean
levelsof o3 if _pick, local(FO3) clean
levelsof o4 if _pick, local(FO4) clean
drop _pick

di as result "CANDIDATO PRINCIPALE:"
di as result "K      = `FK'"
di as result "order1 = `FO1'"
di as result "order2 = `FO2'"
di as result "order3 = `FO3'"
di as result "order4 = `FO4'"

use "$DATAFILE", clear
sort $IDVAR, stable
make_time
local indep t1 t2 t3

traj, multgroups(`FK') ///
    var1($VAR1) indep1(`indep') order1(`FO1') model1(cnorm) min1($MIN1) max1($MAX1) ///
    var2($VAR2) indep2(`indep') order2(`FO2') model2(cnorm) min2($MIN2) max2($MAX2) ///
    var3($VAR3) indep3(`indep') order3(`FO3') model3(cnorm) min3($MIN3) max3($MAX3) ///
    var4($VAR4) indep4(`indep') order4(`FO4') model4(cnorm) min4($MIN4) max4($MAX4)

di as result "BIC finale = " e(BIC_n_subjects)
di as result "AIC finale = " e(AIC)
di as result "LL finale  = " e(ll)

**************************************************** 
 * FINAL STATISTICS OF THE SELECTED MODEL 
****************************************************
quietly gbtm_stats, k(`FK')

di as result "minAPP finale      = " r(minAPP)
di as result "meanAPP finale     = " r(meanAPP)
di as result "minP finale        = " r(minP)
di as result "minTotProb finale  = " r(minTotProb)
di as result "minOCC finale      = " r(minOCC)
di as result "minOCCpp finale    = " r(minOCCpp)
di as result "maxMismatch finale = " r(maxMismatch)
di as result "entropy finale     = " r(entropy)
di as result "pass finale        = " r(pass)

Error "break" without pressing break

Dirk Enzmann — Sun, 21 Jun 2026 20:41:37 GMT

Why does the following syntax stop with --Break-- r(1) although I don't press Break:

Code:

clear 
cap frame drop original 
frame create original 
frame change original 
clear 
input int(vp fp fn vn) 
 41  51 43 276 
 31  60 49 269 
 35  70  9 187 
 30  71 30 179 
 10  15 21 236 
185  27 63  75 
 29  53 53 683 
 27  43  4  99 
213 881 32 600 
 32  54 62 581 
end 
scalar nstudies = _N 
 
frame change default 
forvalues i = 1/`=scalar(nstudies)' { 
   frame original: scalar vp = vp[`i'] 
   frame original: scalar fp = fp[`i'] 
   frame original: scalar fn = fn[`i'] 
   frame original: scalar vn = vn[`i'] 
 
   clear 
   input x1 x2 freq 
      1 1 . 
      1 0 . 
      0 1 . 
      0 0 . 
   end 
   replace freq = scalar(vp) if _n==1 
   replace freq = scalar(fp) if _n==2 
   replace freq = scalar(fn) if _n==3 
   replace freq = scalar(vn) if _n==4 
    
   tab2 x1 x2 [fw=freq] 
}

Here is a part of the result:

Code:

. forvalues i = 1/`=scalar(nstudies)' {
  2.    frame original: scalar vp = vp[`i']
  3.    frame original: scalar fp = fp[`i']
  4.    frame original: scalar fn = fn[`i']
  5.    frame original: scalar vn = vn[`i']
  6. 
.    clear
  7.    input x1 x2 freq
  8.       1 1 .
  9.       1 0 .
 10.       0 1 .
 11.       0 0 .
 12.    end
--Break--
r(1);

If I move the part starting with -input- outside of -foreach- the syntax runs without break (but then I can't loop through the data of the studies in the frame "original"):

Code:

frame change default 
forvalues i = 1/`=scalar(nstudies)' { 
   frame original: scalar vp = vp[`i'] 
   frame original: scalar fp = fp[`i'] 
   frame original: scalar fn = fn[`i'] 
   frame original: scalar vn = vn[`i'] 
 
   clear 
}    
   input x1 x2 freq 
      1 1 . 
      1 0 . 
      0 1 . 
      0 0 . 
   end 
   replace freq = scalar(vp) if _n==1 
   replace freq = scalar(fp) if _n==2 
   replace freq = scalar(fn) if _n==3 
   replace freq = scalar(vn) if _n==4 
    
   tab2 x1 x2 [fw=freq]

CSDID Omitted results.

Passakorn Tapasanan — Thu, 18 Jun 2026 10:40:45 GMT

Hello,

I am trying to use the csdid command (Callaway & Sant'Anna 2021) with a monthly panel dataset but all ATT(g,t) coefficients are returned as zero (omitted) and I cannot identify the cause.

**Setup**
- Stata 16.1
- csdid and drdid installed from SSC
- Panel data: borrower-level monthly observations
- 6 treated cohorts entering the program in Jan–Jun 2025 (cohort_num 1–6)
- 1 never-treated control group (cohort_num 0)
- Time variable: sequential integer months (2024m7 = 1, 2024m8 = 2, ... 2025m11 = 17)
- Data spans 2024m7 to 2025m11 (17 months total)

**Variable construction**

gen int ym_csdid = int(ym) - 772 // sequential: 2024m7=1 to 2025m11=17
gen int entry_ym_csdid = 0
replace entry_ym_csdid = int(ym(2025, cohort_num)) - 772 if cohort_num >= 1 & cohort_num <= 6
// Results in: cohort1=7, cohort2=8, cohort3=9, cohort4=10, cohort5=11, cohort6=12

xtset panel_id ym_csdid
// panel variable: panel_id (unbalanced)
// time variable: ym_csdid, 1 to 17, delta 1 unit

**tab ym_csdid entry_ym_csdid (0.1% sample by cohort and control, ~5,000 obs)**

Sequential |
month: |
2024m7=1, |
2024m8=2, | entry_ym_csdid
... | 0 7 8 9 10 11 | Total
-----------+------------------------------------------------------------------+----------
1 | 191 24 29 22 16 12 | 304
2 | 191 24 29 22 16 12 | 304
3 | 191 24 29 22 16 12 | 304
4 | 191 24 29 22 16 12 | 304
5 | 190 24 29 22 16 12 | 303
6 | 188 24 29 22 16 12 | 301
7 | 184 24 29 22 16 12 | 297
8 | 182 24 29 22 16 12 | 295
9 | 182 24 29 22 16 12 | 295
10 | 179 24 29 22 16 12 | 292
11 | 178 24 29 22 16 12 | 291
12 | 176 24 29 22 16 12 | 289
13 | 176 23 29 22 16 12 | 288
14 | 175 23 29 22 16 12 | 287
15 | 175 23 29 22 16 12 | 287
16 | 172 23 29 22 16 12 | 284
17 | 172 23 29 22 16 12 | 284
-----------+------------------------------------------------------------------+----------
Total | 3,093 403 493 374 272 204 | 5,009

Sequential |
month: |
2024m7=1, | entry_ym_c
2024m8=2, | sdid
... | 12 | Total
-----------+-----------+----------
1 | 10 | 304
2 | 10 | 304
3 | 10 | 304
4 | 10 | 304
5 | 10 | 303
6 | 10 | 301
7 | 10 | 297
8 | 10 | 295
9 | 10 | 295
10 | 10 | 292
11 | 10 | 291
12 | 10 | 289
13 | 10 | 288
14 | 10 | 287
15 | 10 | 287
16 | 10 | 284
17 | 10 | 284
-----------+-----------+----------
Total | 170 | 5,009

**Command used**

csdid repayment, ivar(panel_id) time(ym_csdid) gvar(entry_ym_csdid) method(dripw) notyet

**Result**

All ATT(g,t) coefficients are 0 (omitted) across all 6 groups and all time periods. Number of obs = ~4,600. No x marks (estimation did not fail), but no estimates either.

Number of obs = 4,694
Outcome model : least squares
Treatment model: inverse probability
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g7 |
t_1_2 | 0 (omitted)
t_2_3 | 0 (omitted)
t_3_4 | 0 (omitted)
t_4_5 | 0 (omitted)
t_5_6 | 0 (omitted)
t_6_7 | 0 (omitted)
t_6_8 | 0 (omitted)
t_6_9 | 0 (omitted)
t_6_10 | 0 (omitted)
t_6_11 | 0 (omitted)
t_6_12 | 0 (omitted)
t_6_13 | 0 (omitted)
t_6_14 | 0 (omitted)
t_6_15 | 0 (omitted)
t_6_16 | 0 (omitted)
t_6_17 | 0 (omitted)
-------------+----------------------------------------------------------------
----------and so on, so on--------------

I have also tried:
- never instead of notyet
- Running without any covariates
- Converting time and gvar to integer storage type (was float before)
- Using the original %tm integer values (774–790) instead of sequential 1–17
- Reducing to 0.1% stratified sample by cohort

None of these resolved the issue.

For reference, I confirmed csdid works correctly on the mpdta example dataset on a separate machine (Stata 14), producing real estimates with no omissions.

The tab pattern looks similar to the working mpdta example. I cannot identify what structural difference in my data is causing all cells to be omitted.

Any help would be greatly appreciated.

Thank you.

medsem with gsem

Ylenia Curci — Thu, 18 Jun 2026 08:08:13 GMT

Hello, I want to run a mediation analysis, but I need to use sampling weights in my estimation. SEM does not accept weights, and GSEM seems not to allow mediation. Any hints? Thank you.

New addlegend package available from SSC

Ben Jann — Thu, 18 Jun 2026 07:43:09 GMT

Thanks to Kit Baum, a new package called addlegend is available from SSC. To install, type:

Code:

. ssc install addlegend, replace

addlegend is a utility to create a do-it-yourself legend and add it to a twoway graph. In contrast to Stata's legend() option, addlegend can combine multiple symbols in a single legend key. Here's an example:

Code:

sysuse auto
twoway (sc mpg turn, msize(large) ms(Oh)) ///
       (sc mpg turn, msize(large) ms(X) pstyle(p1)) ///
       (lfit mpg turn, pstyle(p2))
addlegend, y(50) x(105) margin(r=40): ///
       (Oh X) "Mileage (mpg)", msize(large) ///
    || (line) "Fitted values"

Array

See github.com/benjann/addlegend for some further examples.
ben

Old grammer mac def

Chen Samulsion — Thu, 18 Jun 2026 07:17:23 GMT

Dear Stata users,

I have checked an old and worn user-written command, the -unitroot- produced in 1992 or even earlier. There's a piece of code that raise error. And I report the code and set trace on information below. I want to know what is mac def ? And how to bypass this error in versions that > Stata 10 ? Thank you very much.

Code:

    local rc=_rc
        quietly use `tmpfile', clear
    capture erase `tmpfile'
        mac def S_FN "`dsn'"
    error `rc'
/*    Rest of program, SRB 6/17/92    */
    local j=0
    while (`j'<=`lags') {
        local j=`j'+1
        mac def S_`j' = `tau`j''
    }

Code:

  - local rc=_rc
  - quietly use `tmpfile', clear
  = quietly use ......\Temp\ST_e28_000004.tmp, clear
  - capture erase `tmpfile'
  = capture erase ......\Temp\ST_e28_000004.tmp
  - mac def S_FN "`dsn'"
  = mac def S_FN ""
  - error `rc'
  = error 198

geoplot: select or clip

Ben Jann — Thu, 18 Jun 2026 07:14:37 GMT

I reveived the following private message on statalist:

In geoplot, I have a map of Brazil (from the IBGE) with 26 states and 5500 municipalities. [...] I want to do my analysis for one state at a time. When I load the map, I have the whole country, but I want eliminate (clip) all the states except the one I'm working on. How can I do that?

Here's my answer:

To plot just one state, simply use the if qualifier in geoplot. Example

Code:

local url http://fmwww.bc.edu/repec/bocode/i/
geoframe create regions `url'Italy-RegionsData.dta, id(id) coord(xcoord ycoord) shp(Italy-RegionsCoordinates.dta)
geoplot (area regions if region=="Umbria", fcolor(AntiqueWhite)) ///
        (label regions region if region=="Umbria") ///
        , tight

Array

Alternatively, you can also use geoframe select to create a frame that contains the data of that state only and then use this frame in geoplot. Example:

Code:

frame regions: geoframe select if region=="Umbria", into(Umbria)
geoplot (area Umbria, fcolor(AntiqueWhite)) ///
        (label Umbria region) ///
        , tight

(same result as above)

If you want to generate a plot that clips the surrounding sates, rather than omitting them, you can use geoframe rclip to create a frame with the clipped data. Example:

Code:

frame regions: geoframe query bbox if region=="Umbria", pad(30)
frame regions: geoframe rclip r(limits), into(Umbria2)
geoplot (area Umbria2, fcolor(AntiqueWhite*.5)) ///
        (area Umbria2 if region=="Umbria", fcolor(AntiqueWhite)) ///
        (label Umbria2 region if region=="Umbria", psty(p2)) ///
        , tight background(water)

Array
ben

Update of nb_adjust (SSC)

Dirk Enzmann — Wed, 17 Jun 2026 20:15:01 GMT

Thanks to Kit Baum an update of nb_adjust (version 2.13) is available on SSC.

nb_adjust identifies and adjusts (or removes) outliers in a count variable, assuming that the values follow a negative binomial distribution. For more information, see the corresponding help file.

In the previous version, specifying a seed did not guarantee identical results, since the seed would also have had to be used to set set sortseed. This has been fixed. In addition, a reference to an example of using nb_adjust has been added to the help file.

By the way: It would be helpful if a note were added to the Stata documentation for set seed and mata: rseed() stating that there may be situations in which set sortseed is also required to ensure reproducible results. See also the posts by Brendan Halpin "Setting random seed is not enough?"

Frames with collapse

Susan Bondy — Wed, 17 Jun 2026 13:59:42 GMT

I'm new to v19 but used Stata Collapse for years (and also SAS proc something output to create a new separate dataset with the summary values).
I know Frames will do this (very excited), but I can't find the perfect self-help training tutorial or vide.

Q1) Can anyone recommend a frames tutorial or video with examples like this (similar to proc freq output=..., or collapse to a new dataset).
Q2) If anyone is feeling generous, would you offer a simple code example of using collapse to create a summary dataset, but then keep the original data in memory and the summary set in a new frame?
I'll 'recognize' the code when I see it and be very grateful.
Sue

SPSIV - Synthetic Instrumental Variables for Spatial Regression without External Instruments

Manh Hoang Ba — Wed, 17 Jun 2026 03:02:41 GMT

Dear Statalist members,

I would like to introduce spsiv, a new Stata command for generating synthetic instrument variables (SIV) used in spatial regression models with endogenous variables.
This command implements the aggregated IV method of Le Gallo & Paez (2013) and Fingleton (2023), providing instruments strongly correlated with endogenous regression variables while still meeting standard IV requirements. spsiv supports both cross-sectional and panel data setups and can be used in conjunction with commands such as spivreg, spivregress, xtdpd, and xtabond2.
Furthermore, SIV can also be used with conventional endogenous regressions, provided a given spatial correlation scheme exists involving the endogenous variable. This specification has also been used in Fingleton (2023).

Thanks to Prof. Kit Baum, the command is already available on SSC and can be installed by running the command:

Code:

ssc install spsiv

All comments, suggestions, and bug reports are welcome.

References:

Fingleton, B. (2022). Estimating dynamic spatial panel data models with endogenous regressors using synthetic instruments. Journal of Geographical Systems, 25, Article 1. https://doi.org/10.1007/s10109-022-00397-3
Le Gallo, J., & Páez, A. (2013). Using synthetic variables in instrumental variable estimation of spatial series models. Environment and Planning A, 45(9), 2227-2242.

Here are some examples:

Code:

    * Cross-sectional data
        copy https://www.stata-press.com/data/r19/homicide1990.dta ., replace
        copy https://www.stata-press.com/data/r19/homicide1990_shp.dta ., replace
        use homicide1990, clear
        spset
        spmat idistance m _CX _CY, id(_ID) dfunction(dhaversine) replace
        spsiv ln_population ln_pdensity gini, m(m) a(0.1)

    * Panel data
        copy https://www.stata-press.com/data/r19/homicide_1960_1990.dta ., replace
        copy https://www.stata-press.com/data/r19/homicide_1960_1990_shp.dta . , replace
        use homicide_1960_1990, clear
        xtset _ID year
        spset
        preserve
        keep if year==1990
        spmat idistance m _CX _CY, id(_ID) dfunction(dhaversine) replace
        restore
        spsiv ln_population ln_pdensity gini if year==1990, m(m) a(0.1)
        spsiv ln_population ln_pdensity gini, m(m) a(0.1)

And the results:

Code:

. use homicide1990, clear 
(S.Messner et al.(2000), U.S southern county homicide rates in 1990)

. spset 

      Sp dataset: homicide1990.dta
Linked shapefile: homicide1990_shp.dta
            Data: Cross sectional
 Spatial-unit ID: _ID
     Coordinates: _CX, _CY (planar)

. spmat idistance m _CX _CY, id(_ID) dfunction(dhaversine) replace 

. spsiv ln_population ln_pdensity gini, m(m) a(0.1) 
(S.Messner et al.(2000), U.S southern county homicide rates in 1990)

Correlation between X and synthetic intrumental variables
------------------------------------------------
Variable (X)ln_populationln_pdensity     gini
------------------------------------------------
Correlation    0.7498      0.7833      0.7985
------------------------------------------------

. copy https://www.stata-press.com/data/r19/homicide_1960_1990.dta ., replace
(file homicide_1960_1990.dta not found)

. copy https://www.stata-press.com/data/r19/homicide_1960_1990_shp.dta . , replace
(file homicide_1960_1990_shp.dta not found)

. use homicide_1960_1990, clear 
(S.Messner et al.(2000), U.S southern county homicide rate in 1960-1990)

. xtset _ID year 

Panel variable: _ID (strongly balanced)
 Time variable: year, 1960 to 1990, but with gaps
         Delta: 1 unit

. spset 

      Sp dataset: homicide_1960_1990.dta
Linked shapefile: homicide_1960_1990_shp.dta
            Data: Panel
 Spatial-unit ID: _ID
         Time ID: year (see xtset)
     Coordinates: _CX, _CY (planar)

. preserve 

. keep if year==1990 
(4,236 observations deleted)

. spmat idistance m _CX _CY, id(_ID) dfunction(dhaversine) replace 

. restore 

. spsiv ln_population ln_pdensity gini if year==1990, m(m) a(0.1) 
(S.Messner et al.(2000), U.S southern county homicide rate in 1960-1990)

Correlation between X and synthetic intrumental variables
------------------------------------------------
Variable (X)ln_populationln_pdensity     gini
------------------------------------------------
Correlation    0.7498      0.7833      0.7985
------------------------------------------------

. spsiv ln_population ln_pdensity gini, m(m) a(0.1) 
(S.Messner et al.(2000), U.S southern county homicide rate in 1960-1990)

Correlation between X and synthetic intrumental variables
------------------------------------------------
Variable (X)ln_populationln_pdensity     gini
------------------------------------------------
Correlation    0.7315      0.7789      0.8418
------------------------------------------------

How to obtain the sample size used by csdid2?

Frank Huang — Wed, 17 Jun 2026 02:23:04 GMT

Dear all:

I am using csdid2 (net install csdid2, from("https://friosavila.github.io/stpackages")) and would like to know how to obtain the sample size used in the estimation. Is there an option or stored result that reports the number of observations actually used by csdid2? Thank you.

Code:

*ssc install csdid
*ssc install drdid
*net install csdid2, from("https://friosavila.github.io/stpackages")

*ssc install frause

frause mpdta, clear

csdid lemp, ivar(countyreal) time(year) gvar(first)
estat event
estat simple

csdid2 lemp, ivar(countyreal) tvar(year) gvar(first)
estat event
estat simple