Statalist - Forums

Command multproc keeps returning "p ambiguous abbreviation"

Kay Vachova — Mon, 08 Jun 2026 14:38:07 GMT

Dear all,

my analysis is very similar to the second example here SMILEPLOT: Stata module to create plots for use with multiple significance tests but I keep running into the same error.

This is the code I'm using:
logistic event2 sodium_5 pot_5 calc_5 mag_5 pho_5 iron_5 copper_5 zinc_5 chlo_5 retinol_5 carotene_5 vitd_5 thiamin_5 ribo_5 niacin_5 trypto_5 vitc_5 vite_5 pyrid_5 b12_5 folic_tot_5 panto_5 biotin_5 iodine_5 mn_5 retequ_5 se_5 sfa_5 n6pufa_5 n3pufa_5 mfa_5 pufa_5 transfa_5 cholest_5

parmest,format(estimate min95 max95 %8.2f p %8.1e) list(,)

multproc, method(simes) puncor(.05)

But every time I run this code, I get "p ambiguous abbreviation" error message. I've also tried multproc, pvalue(p) method(simes) puncor(.05) as seen in other examples but it doesn't address the initial issue as the error message I receive is "p ambiguous abbreviation (error in option pvalue())"

I feel like I've tried everything and I'd appreciate any help, thank you.

Anderson-Rubin P-Values and Standard Errors

Zachariah Rutledge — Sun, 07 Jun 2026 15:59:48 GMT

Hello,
I am working on a revision to a manuscript where the editor asked us to report Anderson-Rubin p-values as a robustness test. When obtaining Anderson-Rubin p-values, will the standard errors also change (i.e., should my standard errors be different from the standard two-stage least squares estimates)?

save commands

Narong Kul — Fri, 05 Jun 2026 18:41:54 GMT

Can someone show me
how do you use the saved commands that vary by one variable (e.g. facilitykey), see below? Thank you

sum iss if facilitykey==, detail
tab iss_15 if facilitykey==
tab died if facilitykey==
tab died if iss_15==1 & facilitykey==
sum hlos if facilitykey==, detail
sum age if facilitykey==, detail
tab age_65 if facilitykey==
tab gender if facilitykey==
tab penetr if facilitykey==
tab gcs_8 if facilitykey==
tab eddispo if iss_15==1 & facilitykey==
tab level if facilitykey==
tab teachingstatus if facilitykey==
tab bedsize if facilitykey==
tab transfu if facilitykey==
tab hcontrol if facilitykey==
tab hhcontrol if facilitykey==
tab angio if facilitykey==

xtabond2/xtdpdgmm: Estimation of dynamic simultaneous system of equations using GMM

lakhi narayan — Fri, 05 Jun 2026 13:49:43 GMT

Hi,
I want to estimate the effect of X on Y in formal sector and then what is effect of Y of the formal sector on the Y of Informal Sector. For instance,
What is the effect of TFP (which is endogenous) on the employment of formal sector and what is the effect of employment of formal sector (due to TFP change) on the employment of informal sector. More specifically,

Array

I have industry-region panel data. Four Time points. Approx 3500 observations. Can I estimate this system of equations simultaneously using GMM by using xtabond2 or xtdpdgmm? Do I need to manually stack the instruments for the equations in order to estimate the system simultaneously using the GMM command? However, I do not think this approach will reveal the mechanism through which Y in the formal sector affects Y in the informal sector. My objective is to estimate the system jointly so that I can identify the transmission mechanism, namely:

TFP in the formal sector → Employment in the formal sector → Employment in the informal sector.

problem including never-treated with Sun & Abraham(2021) eventstudyinteract

Anna DSouza — Thu, 04 Jun 2026 21:08:52 GMT

I am using the IW estimate and the eventstudyinteract command from Sun and Abraham (2021) with 'treated' (variable turns on when unit is treated) and 'never_treated' units, and with pre and post variables, as seen:

CODE:
[eventstudyinteract DEPVAR month_pre6 month_pre5 month_pre4 month_pre3 month_pre2 month_post0 month_post1 month_post2 month_post3 month_post4 month_post5 month_post6, cohort(treated) control_cohort(never_treated) absorb(i.m_y##i.gov hhid) vce(cluster Code)]

Since the estimator uses never treated units, I assumed that they would be included in the estimation sample, however, only treated units are in the estimation sample.

Is this because there are three steps in the estimation procedure and the last step is creating a weighted sum of the first step estimates - which already incorporated the never treated units? I can't think of another reason that the never treated units woutd not be included in the estimation sample. In their paper and other papers that use the estimator, authors are clear that never treated units can be part of the control group.

Any feedback would be much appreciated.

Psacalc to estimate unobservable ability in regression capturing effect of formal vocational training

Sayoree Gooptu — Thu, 04 Jun 2026 19:02:50 GMT

I am regressing the binary variable NEET(not in education or training) on past vocational training including controls and state fixed effects. Since the vocational training variable can be affected by ability, I want to estimate the selection on unobservables, the criterion used by Oster (2019).

Code:

regress neet_t365_july i.urban i.male mpce i.unmarried i.religion i.social_group i.state i.edulevel##i.pastvoc365 [pweight = pop_weight] if majorstate==1, vce(cluster fsu)

psacalc beta 1.pastvoc365, delta(1) rmax(`rmax_val') * 4. Calculate the breakdown delta (the selection required to drive the treatment effect to zero) psacalc delta 1.pastvoc365, rmax(`rmax_val')

IS the process correct? How do I interpret the results? I am getting a negative delta

Testing parallel trends for continuous treatment

Myat Thida Win — Thu, 04 Jun 2026 06:57:00 GMT

Hello, I am trying to understand how I could show the parallel trends for continuous treatment. I have a standard DiD with treatment intensity instead of a binary treatment. As treatment intensity is zero in the pre-period, I was not clear how to show the parallel trend. Really appreciate your help in advance.

Latest (03 June 2026) updates to StataNow 19

Kristin MacDonald (StataCorp) — Wed, 03 Jun 2026 17:05:01 GMT

A new update to StataNow 19 is now available (as of 03 June 2026). This update includes new features for financial statistics and more. You can read about all of the new features in our latest blog post.

CIVREG: A Stata package for synthetic (coplanar) instrumental variables estimation

Manh Hoang Ba — Wed, 03 Jun 2026 10:36:55 GMT

Dear Statalist users,

I would like to announce civreg, a new Stata package for instrumental variables estimation based on the Synthetic Instrumental Variables (SIV) methodology.

The package implements the approach proposed by Dzhumashev and Tursunalieva (2025), which addresses endogeneity without requiring external instruments. Instead, valid instruments are constructed directly from the observed data by exploiting the coplanar structure of the regression system and a data-driven Dual Tendency (DT) condition. The method identifies both the synthetic instrument and the direction of endogeneity from the data itself

To install, type:

HTML Code:

net install civreg, from("https://raw.githubusercontent.com/ManhHB94/civreg/main/")

Key features of civreg include:

Estimation without requiring external instruments.
Support for models with or without exogenous control regressors.
Support for both homoskedastic and heteroskedastic SIV identification procedures.
Automatic determination of the direction of endogeneity.
Compatibility with fixed-effects and two-way fixed-effects settings for panel data.

The example below replicates the SIV results presented in Table 1 of Dzhumashev and Tursunalieva (2025):

HTML Code:

. . webuse mroz, clear

. . civreg hours (lwage = ) educ age kidslt6 kidsge6 nwifeinc , hete(0) reps(49) small rcode

------------------------------------------------------------------------------
Coplanar instrumental variables (CIV) regression
------------------------------------------------------------------------------
Dual Tendency:  Homoscedastic
Effects:        None
Reference:      Dzhumashev and Tursunalieva (2025)

IV (2SLS) estimation
--------------------

Estimates efficient for homoskedasticity only
Statistics consistent for homoskedasticity only

                                                      Number of obs =      428
                                                      F(  6,   421) =    18.26
                                                      Prob > F      =   0.0000
Total (centered) SS     =  257311019.9                Centered R2   =  -1.3833
Total (uncentered) SS   =    983895094                Uncentered R2 =   0.3767
Residual SS             =  613254452.5                Root MSE      =     1207

------------------------------------------------------------------------------
       hours | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       lwage |    1369.47   138.4851     9.89   0.000     1097.261    1641.678
        educ |  -159.1529   30.84987    -5.16   0.000    -219.7919   -98.51395
         age |  -10.44129    8.83987    -1.18   0.238    -27.81706    6.934493
     kidslt6 |   -225.613   160.0919    -1.41   0.159     -540.292    89.06597
     kidsge6 |  -55.12937   49.49267    -1.11   0.266    -152.4129    42.15416
    nwifeinc |  -8.687533   5.853077    -1.48   0.138    -20.19243    2.817362
       _cons |   2396.561   544.0511     4.41   0.000     1327.166    3465.956
------------------------------------------------------------------------------
Underidentification test (Anderson canon. corr. LM statistic):         167.557
                                                   Chi-sq(1) P-val =    0.0000
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic):              270.853
Stock-Yogo weak ID test critical values: 10% maximal IV size             16.38
                                         15% maximal IV size              8.96
                                         20% maximal IV size              6.66
                                         25% maximal IV size              5.53
Source: Stock-Yogo (2005).  Reproduced by permission.
------------------------------------------------------------------------------
Sargan statistic (overidentification test of all instruments):           0.000
                                                 (equation exactly identified)
------------------------------------------------------------------------------
Instrumented:         lwage
Included instruments: educ age kidslt6 kidsge6 nwifeinc
Excluded instruments: civ_lwage
------------------------------------------------------------------------------

Comments, suggestions, and bug reports are welcome.

Reference:
Dzhumashev, R., Tursunalieva, A. 2025. A synthetic instrumental variable method: Using the dual tendency condition for coplanar instruments. https://doi.org/10.48550/arXiv.2512.17301.

Performing multiple imputation regression with pre-imputated variables (IPUMS INCIMP1-5)

Lily Brusic — Tue, 02 Jun 2026 17:34:00 GMT

Hello All,

I am using the multiple imputated variables incimp1 through incimp5 for my logit regression. I am wondering if anyone is familiar with doing this in Stata and could provide feedback on my code.

It successfully generated a logit regression output, but I wanted to check and make sure that this is an appropriate use of these commands since is my first time using them. The Stata documentation is mainly geared toward data you impute yourself, not ones where imputation is already provided.

Some of the variables used in the logit regression are cleaned and named differently than the IPUMS website, so I am hoping you can tell by looking at the code:

Code:

 /* 
The variables incimp1-5 are imputated based on incfam97on2. I renamed incfam97on2 
incimp so it matches the imputated variables and removed missing data
 */
gen incimp=incfam97on2
replace incimp=. if incfam97on2>95

* Tell Stata which variables are imputated and which variable they were imputated from

mi import wide, imputed(incimp=incimp1 incimp2 incimp3 incimp4 incimp5) clear

* Check conversion - "incimp" is noted as being imputed
mi describe

* Run logit regression using multiple imputation
mi estimate: logit CVD i.birthcohort10 c.age c.age#c.age i.female i.incimp if insample==1

I had previously posted on IPUMS forum, and they directed me to Statalist: https://forum.ipums.org/t/stata-logi...=lilian_brusic

Suggestion about my model

Huseyin Gormus — Tue, 02 Jun 2026 00:19:25 GMT

Hi everyone,
I am an undergraduate economics student working on this model. I am posting here not just to get answers, but genuinely to learn and test my own understanding. Any feedback, criticism, or suggestions are welcome.
The primary objective of this model is to isolate and quantify the effect of meteorological drought on annual barley production. ΔCultivatedArea is included strictly as a control variable.
The empirical model is specified as follows:

Array
Where:
n=26(due to differencing of cultivatedarea
t= year PRODUCTION: Annual barley production (tonnes)
SPEI_7: 7-month SPEI index for August
ΔCultivatedArea: First difference of barley cultivated area (hectares)

What are the steps I should follow, in order, to properly estimate and validate this model?

So far I have completed the following steps:

ADF Unit Root Tests
Pearson Correlation Matrix (Multicollinearity Check)
OLS Estimation
Breusch-Godfrey Test (Autocorrelation)
Breusch-Pagan-Godfrey Test (Heteroskedasticity)
Jarque-Bera and Shapiro-Wilk Tests (Normality of Residuals)
Ramsey RESET Test (Functional Form)

OLS Time Series: Sufficient Diagnostics or Missing Steps

Huseyin Gormus — Mon, 01 Jun 2026 22:57:28 GMT

Hi everyone, I am an undergraduate economics student working on this model, I am posting here not just to get answers, but genuinely to learn and test my own understanding of the methodology I applied. Any feedback, criticism, or suggestions are welcome.I want to understand where I might be wrong. The primary objective of this model is to isolate and quantify the effect of meteorological drought, measured by the SPEI_7 index, on annual barley production. ΔCultivatedArea is included strictly as a control variable to prevent the drought coefficient from absorbing the effect of physical land changes, not as a variable of independent interest

Here is my setup

Model: Production_t = β0 + β1SPEI7_t + β2ΔCultivatedAreat + ε_t (n=26).(due to differencing)

Where:

PRODUCTION: Annual barley production (tonnes)
SPEI_7: 7-month SPEI index for August
ΔCultivatedArea: First difference of barley cultivated area

Steps followed:

ADF unit root tests (intercept for PRODUCTION and SPEI_7; intercept+trend for CultivatedArea due to visible deterministic trend)
First-differenced CultivatedArea to achieve stationarity
Pearson correlation matrix to check multicollinearity (r = -0.081 between SPEI_7 and ΔCultivatedArea)
OLS estimation
Breusch-Godfrey test for autocorrelation (lag=1)
Breusch-Pagan-Godfrey test for heteroskedasticity
Jarque-Bera and Shapiro-Wilk tests for normality of residuals
Ramsey RESET test for functional form (F p=0.8856)

Results:
SPEI_7: β=874,320, p=0.0021 (significant at 1%)
ΔCultivatedArea: β=1.983, p=0.0188 (significant at 5%)
R²=0.453, Adjusted R²=0.401, F p=0.0014
All diagnostic tests passed (no autocorrelation, no heteroskedasticity, normality satisfied, correct functional form

MY QUESTIONS:

Two of the diagnostic tests produced borderline results that I would like to highlight:

1. Breusch-Godfrey Test (Autocorrelation)

Chi-Square p = 0.0691
F p = 0.0874
Both values exceed the 0.05 threshold, so the null hypothesis of no autocorrelation cannot be rejected. However, the margin is relatively narrow. I am wondering whether this should be a concern or whether it is simply a consequence of the small sample size (n=26).

2. Shapiro-Wilk Test (Normality of Residuals)

p = 0.0532
The null hypothesis of normality cannot be rejected, but the result is marginally above the critical value. Again, I suspect this may be related to the limited number of observations.

With only n=26 observations, ADF unit root tests are known to have low power. Is there a more appropriate test for this sample, and should I run both for robustness?

While I argue that SPEI_7 is strictly exogenous, the same argument does not hold for ΔCultivatedArea, as annual planting decisions may be correlated with omitted socioeconomic variables such as input costs or government subsidies. However, since the correlation between SPEI_7 and ΔCultivatedArea is negligible (r=-0.081, p=0.73), I argue that even if the ΔCultivatedArea coefficient is biased, this does not contaminate the SPEI7 estimate. Is this reasoning valid, or should I be more concerned about the potential endogeneity of ΔCultivatedArea?

SPSS Setup file

Euslaner — Mon, 01 Jun 2026 16:21:36 GMT

I have downloaded several
spss setup files. Is there a straightforward way to open them in
Stata? The manual doesn't give me answer.

Ric Uslaner

stata? I tried both Stat

DRLATE: Module for doubly robust estimation of LATE and LATT

Jeff Wooldridge — Mon, 01 Jun 2026 15:51:16 GMT

Hello Statalisters! I would like to announce that drlate, a Stata module for doubly robust estimation of the local average treatment effect (LATE) and the local average treatment effect on the treated (LATT), is now available in SSC.

drlate provides estimators of the local average treatment effect (LATE) and the local average treatment effect on the treated (LATT), including inverse-probability-weighted regression adjustment (IPWRA), inverse probability weighting (IPW), augmented inverse probability weighting (AIPW), and regression adjustment (RA). Outcome and treatment models can be specified as linear, logistic, or Poisson regressions. The instrument propensity score is specified as a logistic regression (logit) and can be estimated by maximum likelihood (ML), covariate balancing propensity score (CBPS), or inverse probability tilting (IPT). The instrument must be binary.

The estimators implemented by drlate are described in the paper by Słoczyński, Uysal, and Wooldridge, "Doubly Robust Estimation of Local Average Treatment Effects Using Inverse Probability Weighted Regression Adjustment," available here: https://arxiv.org/abs/2208.01300. We expect to have a new draft of this paper in the next few months.

"Dr. Late" can provide a cure when the treatment is confounded, you want to allow heterogeneity, and you want to take functional form seriously!

Let us know if you have comments or suggestions about the package.

Unexpected result from test command used with xtgee

Simon Cousens — Sun, 31 May 2026 15:56:16 GMT

I am using xtgee to estimate risk ratios for a data set with repeated measures. I want to test an interaction between two factors, 1 with 2 levels (arm1) and the other with 3 levels (indication): i.e. 2 interaction parameters. I had expected that using the test command would yield a chi-squared statistic (2df) equal to the difference between the chi-squared statistics for models with and without the interaction. That isn't what I get. The Wald chi(2) (5 df) for the model with the interaction = 10.21. The Wald chi(2) (3df) for the model without the interaction is 8.44 a difference of 1.77. The chi squared value produced by the the test command is 1.60. Can anyone explain the difference? Output below.

xtgee prim_outcome i.arm1##i.indication if arm1!=3, family(binomial) link(log) corr(exch) vce(robust) eform

GEE population-averaged model Number of obs = 6,749
Group variable: participan~m Number of groups = 6,287
Family: Binomial Obs per group:
Link: Log min = 1
Correlation: exchangeable avg = 1.1
max = 4
Wald chi2(5) = 10.21
Scale parameter = 1 Prob > chi2 = 0.0696

(Std. err. adjusted for clustering on participantnum)
-----------------------------------------------------------------------------------------------------------------
| Semirobust
prim_outcome | exp(b) std. err. z P>|z| [95% conf. interval]
------------------------------------------------+----------------------------------------------------------------
arm1 |
Dexa 4x6mg | .8965813 .1559133 -0.63 0.530 .6376286 1.260699
|
indication |
Preterm labor with intact membranes | .6763322 .1428367 -1.85 0.064 .4470871 1.023123
Planned delivery | .9138782 .1248478 -0.66 0.510 .6992025 1.194466
|
arm1#indication |
Dexa 4x6mg#Preterm labor with intact membranes | 1.168356 .3464885 0.52 0.600 .6533442 2.089335
Dexa 4x6mg#Planned delivery | 1.278983 .2505847 1.26 0.209 .8711508 1.877745
|
_cons | .1040274 .0124361 -18.93 0.000 .0822981 .131494
-----------------------------------------------------------------------------------------------------------------

. test 2.arm1#2.indication 2.arm1#3.indication

( 1) 2.arm1#2.indication = 0
( 2) 2.arm1#3.indication = 0

chi2( 2) = 1.60
Prob > chi2 = 0.4493

. xtgee prim_outcome i.arm1 i.indication if arm1!=3, family(binomial) link(log) corr(exch) vce(robust) eform

GEE population-averaged model Number of obs = 6,749
Group variable: participan~m Number of groups = 6,287
Family: Binomial Obs per group:
Link: Log min = 1
Correlation: exchangeable avg = 1.1
max = 4
Wald chi2(3) = 8.44
Scale parameter = 1 Prob > chi2 = 0.0378

(Std. err. adjusted for clustering on participantnum)
------------------------------------------------------------------------------------------------------
| Semirobust
prim_outcome | exp(b) std. err. z P>|z| [95% conf. interval]
-------------------------------------+----------------------------------------------------------------
arm1 |
Dexa 4x6mg | 1.083794 .0823057 1.06 0.289 .9339095 1.257734
|
indication |
Preterm labor with intact membranes | .7305089 .1081895 -2.12 0.034 .5464637 .9765394
Planned delivery | 1.035454 .1013354 0.36 0.722 .8547268 1.254394
|
_cons | .0945754 .009167 -24.33 0.000 .078212 .1143624
------------------------------------------------------------------------------------------------------