Statalist

convergence error in mixed effect model for partially nested design

Hiro farabi — Thu, 19 Feb 2026 11:37:29 GMT

Dear Statalist members,

I am analyzing a partially nested design in a cost-effectiveness study (cost and QALY outcomes). The data are in long format, with the outcome variable y (containing both cost and QALY), distinguished by a binary variable type (e.g., 0 = cost, 1 = QALY), and group for arm (0 = control, 1 = intervention).

Key features of the design:

Several sites include both intervention and control participants (so sites are crossed with group).
Clinicians (identified by cid) are only present in the intervention arm (clustering via clinicians only applies to intervention).
Individuals (pid) are in both arms.
Variances differ between arms (heteroscedastic by design).
I need to account for correlation between cost and QALY at both the cluster (clinician/cid) level and the individual (pid) level.

My current command is:

text
mixed y i.type i.type#i.group || site: || cid:group || pid:, nocons ml residuals(ind, t(type) by(group)) nolog
This produces a convergence error (typically something like "convergence not achieved".
What is the recommended syntax for handling a partially nested design where clustering (cid) only applies to one arm?
How can I properly model the correlation between cost and QALY at cluster and individual levels?

Log with zero output - But in Stochastic Frontier Analysis

Armando Martins — Thu, 19 Feb 2026 00:52:57 GMT

Hi everyone,

I’m here because i'm trying to estimate a stochastic frontier model (in STATA 14), but I’ve run into an issue: some units in my sample produce zero output.

Since the most usual SFA specifications rely on log-linearized production functions (eg. cobb douglas, translog), I’m not sure how to properly handle these zero-output observations.

I've read Chen and Roth (2024) review, but it was not clear to me if their solutions apply for SFA models.

- Is there a recommended way to incorporate zero-output units in SFA?

- Would a two-stage approach make sense here? There is, first modeling the probability of positive output, and then estimating the stochastic frontier conditional on positive output?

- If so, are there references or best practices on implementing this in an SFA framework?

Thanks!

ANN: unicefdata — Access 748+ UNICEF indicators from Stata (also R and Python)

JPAzevedo — Wed, 18 Feb 2026 22:27:05 GMT

Dear Statalist,

I am happy to announce the release of unicefdata v2.2.0, a Stata module for downloading UNICEF indicator data directly from the UNICEF SDMX Data Warehouse.

unicefdata is part of the unicefData trilingual library — the same indicators, the same API, the same metadata, available in R, Python, and Stata. If you work across languages or collaborate with people who do, everyone gets the same data with the same command logic.

The package covers 748+ indicators across 69 dataflows, spanning child mortality, nutrition, immunization, education, WASH, child protection, HIV/AIDS, early childhood development, and more. You probably already know many of these indicators from UNICEF's data.unicef.org — this package lets you pull them into Stata with a single command.

What can you do with it?

1. Search and discover indicators without leaving Stata

. unicefdata, search(stunting) . unicefdata, search(mortality) dataflow(CME) . unicefdata, flows . unicefdata, info(CME_MRY0T4)
Indicators are organized into tiers — Tier 1 (verified and downloadable) shows by default. Use showtier2, showtier3, or showall if you want to explore further.

2. Download data — the dataflow figures itself out

. unicefdata, indicator(CME_MRY0T4) countries(ALB USA BRA) year(2015:2023) clear
You do not need to know which SDMX dataflow an indicator belongs to. The package resolves it automatically. If you do know, you can specify it with dataflow().

3. Disaggregation filters

. unicefdata, indicator(NT_ANT_HAZ_NE2) sex(_T M F) clear . unicefdata, indicator(NT_ANT_HAZ_NE2) wealth(Q1 Q5 _T) clear . unicefdata, indicator(NT_ANT_HAZ_NE2) sex(M F) wealth(Q1 Q5) residence(U R) clear
Filter by sex, wealth quintile, residence (urban/rural), age group, and maternal education. Filters work at the API level — the query downloads only what you ask for.

4. Output formats

. unicefdata, indicator(CME_MRY0T4) format(wide) clear // years as columns . unicefdata, indicator(CME_MRY0T4 CME_MRM0) format(wide_indicators) clear // indicators as columns . unicefdata, indicator(CME_MRY0T4) latest clear // most recent per country . unicefdata, indicator(CME_MRY0T4) mrv(3) clear // 3 most recent values . unicefdata, indicator(NT_ANT_HAZ_NE2) year(2015) circa clear // nearest available year
5. Self-documenting datasets

Downloaded datasets now embed provenance as Stata char characteristics — indicator codes, dataflow, version, timestamp. The data remembers where it came from, even after you save and reopen it.

6. Metadata sync

. unicefdata_sync, verbose // check metadata freshness . unicefdata_refresh_all, verbose // full refresh from UNICEF API
The package ships with YAML metadata files and warns you if they get stale (>30 days). A single command refreshes everything.

Under the hood:

The Stata package includes 63 automated tests across 16 families, covering data downloads, discovery, sync, transformations, edge cases, cross-platform consistency, error handling, and deterministic offline tests. The test suite follows Gould's (2001) certification methodology with rcof return-code verification.

Cross-platform alignment is a first-class concern. The R, Python, and Stata implementations share the same YAML metadata, the same indicator registry, and the same filtering logic. A validation pipeline compares outputs across all three languages.

Installation:

* From SSC (stable) ssc install unicefdata * From GitHub net install unicefdata, /// from("https://raw.githubusercontent.com/unicef-drp/unicefData/main/stata/ssc") replace * First-time setup (installs metadata files) unicefdata_setup, replace
Resources:

GitHub: https://github.com/unicef-drp/unicefData
Examples: 7 do-files covering quick start through advanced features
R and Python versions in the same repository
Bug reports: https://github.com/unicef-drp/unicefData/issues

Acknowledgments

I would like to thank Kit Baum for the SSC upload. I am also grateful to Lucas Rodrigues, Yang Liu, and Karen Avanesian at UNICEF for their technical contributions and feedback, and to Yves Jaques, Alberto Sibileau, and Daniele Olivotti for designing and maintaining the UNICEF SDMX data warehouse that makes this package possible.

Best,

Joao Pedro Azevedo

ANN: wbopendata v18 — Celebrating 15 Years of World Bank Open Data Access from Stata

JPAzevedo — Wed, 18 Feb 2026 22:17:30 GMT

Dear Statalist,

I am pleased to announce the release of wbopendata v18, a major update to the Stata module for accessing the World Bank Open Data API.

This release also marks 15 years since wbopendata was first released in February 2011 — almost as old as my daughters, which I can hardly believe. What started as a simple bridge between Stata and the newly launched World Bank Open Data Initiative has grown into a tool used by researchers, students, and policy analysts worldwide. I am grateful to everyone who has used it, reported bugs, and contributed over the years.

This is the first announcement I am making on Statalist since v16 in July 2020. A lot has changed. What's New Since v16 (July 2020)

By the numbers:

Indicators	~16,000	29,323
Data sources	~45	71
Countries/regions	~260	296
.ado files	6	34
Automated tests	0	89
Metadata format	89 .sthlp files	2 YAML files

1. Discovery commands — browse the data catalog offline from Stata

After an initial sync, you can search, browse, and explore the entire World Bank data catalog without a network connection:

. wbopendata, search(learning+poverty) . wbopendata, info(SE.LPV.PRIM) . wbopendata, sources . wbopendata, alltopics
Search supports multi-keyword queries, wildcards (NY.GDP.*), regex patterns, and filters by source, topic, or field. On Stata 16+, results return in under half a second after the first call.

2. YAML metadata architecture

The 89 per-indicator .sthlp help files have been replaced by two compact YAML files containing full metadata for all 29,323 indicators. This makes the package smaller, faster to update, and easier to maintain.

3. Redesigned sync system

. wbopendata, sync // safe preview (dry run) . wbopendata, sync detail // detailed breakdown . wbopendata, sync replace // apply changes . wbopendata, sync replace force // force full re-download
The sync command defaults to a safe dry-run preview. The replace keyword is an explicit safety gate.

4. Self-documenting datasets (char metadata)

Every downloaded dataset now embeds provenance information as Stata char characteristics — indicator codes, query parameters, timestamps, and version — following the pattern established by Drukker (2006) in freduse. This metadata persists across save/use cycles. Suppress with nochar.

5. Country context by default

Downloads now automatically include region, income level, admin region, and lending type variables (8 additional variables). Suppress with nobasic.

6. Graph-ready metadata

New linewrap(), maxlength(), and linewrapformat() options format indicator names and descriptions for use in graph titles and notes. New return values provide dynamic subtitles with country counts and average data year.

7. Community bug fixes

Thanks to @lucaslindoso, @daniel-klein, @ckrf, @randrescastaneda, and @zhaowill for bug reports and contributions that improved latest, country metadata, URL construction, and varlist handling.

8. Quality assurance

The package now includes 89 automated tests across 17 categories, including offline deterministic tests using CSV fixtures following Gould's (2001) certification methodology.

Installation

* From SSC (stable) ssc install wbopendata, replace
* From GitHub (latest) net install wbopendata, from("https://raw.githubusercontent.com/jpazvd/wbopendata/main") replace

Resources

GitHub: https://github.com/jpazvd/wbopendata
Examples gallery, FAQ, and full documentation available in the repository
Bug reports and feature requests: https://github.com/jpazvd/wbopendata/issues

Acknowledgments

I would like to thank Kit Baum for maintaining the SSC Archive and for uploading wbopendata over all these years. The SSC infrastructure has been essential for making user-written Stata packages accessible to the community, and Kit's sustained dedication to this service is deeply appreciated.

Best,

Joao Pedro Azevedo

Interpreting confidence intervals of predictive margins after logit

Wes King — Wed, 18 Feb 2026 16:34:56 GMT

Hello all,

I have a logistic regression model including an interaction term between two categorical variables (sex and race). I am using population representative data. The output (slightly edited for readability) is pasted below. My question is: How do I interpret the 95% CIs that cross 0? My understanding is that marginal predictions after logistic regression are interpreted as predicted probabilities, so I was expecting to see CIs constrained to be between 0 and 1.

Code:

. margins sex#race4, vce(unconditional)

Predictive margins

Number of strata =  76                            Number of obs   =    2,389
Number of PSUs   = 843                            Population size =    2,352.5017
Design df       =    767

Expression: Pr(outcome), predict()

        
Linearized
Margin   std. err.      t    P>t    [95% conf.    interval]
        

      0. Male   #1. White  |   .0477874   .0066719     7.16   0.000     .0346899    .0608848
      0. Male   #2. Black  |   .0869118   .0247047     3.52   0.000      .038415    .1354086
   0. Male   #3. Hispanic  |   .0127917   .0088384     1.45   0.148    -.0045588    .0301421
0. Male   #4. Other/Multi  |   .0740847   .0329425     2.25   0.025     .0094165     .138753
       1. Female#1. White  |   .0840895   .0160033     5.25   0.000      .052674     .115505
       1. Female#2. Black  |   .0266466   .0174556     1.53   0.127    -.0076199     .060913
    1. Female#3. Hispanic  |   .1230033   .0744384     1.65   0.099    -.0231239    .2691304
 1. Female#4. Other/Multi  |   .0486798   .0288685     1.69   0.092    -.0079909    .1053504

Thank you

Replicate margins results in a coefplot graph

Federico Requena — Wed, 18 Feb 2026 16:24:22 GMT

Dear all,

I am a new user of Stata, my back up is R, but now I need make some coefplot (SSC) graph. I wana replicate margins value after a logit regression. I did read that i can use post in a margins command but for others consideration I wanna use transform in a coefplot (SSC):

Code:

sysuse nlsw88, clear

logit union grade i.south
margins i.south


Predictive margins                                       Number of obs = 1,876
Model VCE: OIM

Expression: Pr(union), predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       south |
  Not south  |    .297394   .0138465    21.48   0.000     .2702553    .3245327
      South  |   .1732535   .0134751    12.86   0.000     .1468428    .1996643
------------------------------------------------------------------------------

I did read that I should use :

margins i.south, post

But I would like use some like:

Code:

coefplot (., transform(* = (exp(@)/( 1+exp(@))))), drop(_cons) mlabel baselevels

coefplot (., transform(* = invlogit(@))), drop(_cons) mlabel citype(logit) baselevels

Values not coincide with margins:

Array
Array
Thanks

2026 Northern European Stata Conference: Call for presentations

Bjarte Aagnes — Wed, 18 Feb 2026 15:00:08 GMT

Call for presentations

The 2026 Northern European Stata Conference will be held in Oslo, Norway at Oslo Cancer Cluster Innovation Park on Thursday 24 September 2026. The conference will start at 10:00, with registration from 09:30, and end at 18:00 (CEST).

This conference will provide Stata users with the opportunity to exchange ideas, experiences, and information on new applications of Stata. Representatives from StataCorp will attend and host an open panel discussion, so you can share your questions and feedback directly with Stata developers. Anyone interested in using Stata is welcome. No level of expertise is assumed for presenters or attendees.

Presentation guidelines

Presentations could focus on a topic of interest, including, but not limited to:

Using Stata for modeling and analysis
Community-contributed commands
Using Stata for data management
Using Stata for graphics
Teaching Stata or teaching statistics with Stata
Use of Stata in specific fields: Applications and criticisms
Software comparisons
Python or Java integration
AI tools for Stata development

If you are interested in presenting, please email your abstract to the scientific committee. Indicate whether you would like to give

a 20-minute talk (followed by a 10-minute discussion);
a 10-minute talk (followed by a 5-minute discussion); or
some other presentation.

Include your name and affiliation. If your presentation has multiple authors, please identify the presenter. Presenters will be asked to provide the organizers with a copy of the presentation and any programs or datasets, where applicable, so that the materials can be posted on the StataCorp website and in the Stata Users Group RePEc archive.

Abstracts should be sent no later than June 15, 2026 to StataConferenceOslo@kreftregisteret.no.

The scientific organizers look forward to hearing from you with presentation offers, or to discuss the suitability of a potential contribution.

Scientific committee

Paul C Lambert, PhD (Chair), Cancer Registry of Norway at Norwegian Institute of Public Health and Karolinska Institutet.
Arne Risa Hole, PhD, Universitat Jaume.
Christopher James Rose, PhD, Centre for Epidemic Interventions Research & Cluster for Health and Social Care Interventions, Norwegian Institute of Public Health.
Morten W. Fagerland, PhD, Oslo Centre for Biostatistics and Epidemiology (University of Oslo and Oslo University Hospital).
Peter Hedström, PhD, Linköping University.

General chairs

Bjarte Aagnes, Cancer Registry of Norway at Norwegian Institute of Public Health.
Samuel Mossberg, Metrika Consulting AB

Contact: StataConferenceOslo@kreftregisteret.no

Organizers

The 2026 Northern European Stata Conference is jointly organized by Metrika Consulting AB, the official distributor of Stata in the Nordic and Baltic countries, the Cancer Registry of Norway at the Norwegian Institute of Public Health, and Oslo Centre for Biostatistics and Epidemiology (University of Oslo and Oslo University Hospital).

Information on conference and registration

Conference web page: http://www.statanordic.com/stata-conference-2026

Problem with csdid

chris maizano — Wed, 18 Feb 2026 09:24:52 GMT

Hi, I'm trying to do a staggered DID whose treatment is at a group level (CCC_No). For context, some CCC_No have other subgroups but mostly do not (i'm mentioning it here bec i don't know if this is one of the reason). My unit of observation is household and is from a repeated cross-sectional survey. I built the unbalanced panel dataset by grouping respondents by city. A CCC_No can cover usually just one but sometimes more than one city.

I'm trying to run this command for the staggered DID but i am getting this error:

. csdid reliability ///
> educ8 H05_AGE HSIZE Min_Charge, ///
> ivar(CCC_No) ///
> time(WaveYear) ///
> gvar(cohort_strict) ///
> method(dripw) ///
> vce(cluster CCC_No)
repeated time values within panel
r(451);

end of do-file

r(451);

I am also including here a tabulation of the observations of CCC_No per WaveYear. As you can see, there are instances where there are no respondents under a CCC_No across some of the survey waves and I am also not sure if this could also be the culprit:

WaveYear
CCC_No 2019 2020 2022 2024 Total

1 406 404 394 399 1,603
2 380 365 390 406 1,541
4 46 37 14 15 112
5 402 425 412 402 1,641
6 413 408 459 422 1,702
7 372 344 389 386 1,491
11 898 1,036 1,208 1,216 4,358
12 29 53 61 23 166
13 0 35 52 0 87
17 214 250 260 273 997
18 289 332 395 372 1,388
19 324 398 371 391 1,484
21 0 24 35 0 59
23 23 87 35 73 218
24 0 0 27 40 67
25 67 75 39 82 263
26 335 359 383 386 1,463
27 21 4 0 24 49
29 28 0 65 84 177
30 222 228 250 308 1,008
32 58 55 121 50 284
33 42 35 64 77 218
35 91 80 120 101 392
36 0 26 41 0 67
39 62 0 43 50 155
41 48 35 38 27 148
47 53 34 45 31 163
49 33 37 0 14 84
50 68 53 26 53 200
51 33 24 32 50 139
53 0 43 0 40 83
61 54 14 80 88 236
65 44 61 71 57 233
69 28 28 62 27 145
70 16 11 19 33 79
71 344 372 419 411 1,546
83 29 63 93 48 233
90 46 19 0 11 76
92 62 0 23 22 107
102 35 0 18 24 77
105 13 18 22 14 67
106 0 24 46 13 83
107 85 31 88 79 283
116 45 56 82 23 206
117 0 15 0 22 37
123 65 67 93 121 346
124 39 63 44 26 172
129 28 46 0 26 100
141 50 10 81 34 175
147 100 87 144 78 409
149 50 23 33 33 139
156 62 122 43 86 313
158 53 20 40 14 127
163 10 15 32 29 86
165 0 16 15 18 49
173 29 13 75 97 214
175 134 99 162 83 478
179 28 37 31 46 142
183 7 88 0 14 109
184 62 10 49 18 139
221 23 50 29 33 135
243 33 54 39 59 185
247 58 44 48 51 201
250 21 106 13 39 179
252 13 20 0 28 61
284 74 104 86 114 378
288 13 0 35 0 48
291 102 58 95 40 295
297 15 0 44 19 78
317 0 19 11 18 48
322 317 386 396 419 1,518
324 107 101 114 137 459
328 19 27 18 0 64
330 137 139 161 150 587
333 0 29 26 0 55
343 17 23 0 0 40
370 343 381 379 375 1,478
386 14 15 15 22 66
407 0 22 0 32 54
530 0 22 23 16 61
533 15 32 16 0 63
564 22 16 14 14 66
571 13 61 0 84 158
574 9 51 0 8 68
577 8 0 4 20 32
596 15 20 64 15 114
694 23 42 47 57 169

Total 7,886 8,536 9,311 9,140 34,873

Your inputs are very much appreciated. Thank you.

Changing colour of bar graphs for one group

Ashani Abayasekara — Wed, 18 Feb 2026 03:33:17 GMT

Hi all,

I'm using the following code to graph a figure by two sets of groupings - 'alt' and 'lcclass'. How do I edit the code to display the bars for class 2 in red? I want all 4 bars under class 1 in blue and the four under class 2 in red.

Thank you.

Code:

graph hbar (mean) dth_hub_uptake, over(alt) over(lcclass) title(`"TH hub is available for mixed and TH visits (Basecase = not available)"', size(medsmall)) ytitle(`"Change in predicted uptake relative to base level"') ytitle(, size(small))

Array

Problem Using collect with spregress

Michael Evangelist — Tue, 17 Feb 2026 19:21:58 GMT

I would like to export the results of a spatial error lag model using collect in Stata 19. Example 1 below illustrates that when I restrict the number of coefficients shown in the collect table that I lose the estimated error term even though the term (e.mpg) is included in the table layout and is also a level of the colname dimension. Example 2 shows that e.mpg is included in the table layout when I make no attempt to restrict the variables displayed.

I'm not sure if this is a bug or feature of collect. I would welcome any advice on how to get the error term to display when restricting the variables shown in the table layout.

EDIT: I added an Example 3 below with a workaround using autolevels. However, it's still not clear to me why Example 1 fails. Genuinely curious if there's a bug or something I'm missing about the syntax.

Code:

clear all
spset, clear

* Import Auto Dataset
clear all
spset, clear

* Import Auto Dataset
sysuse auto, clear

* Generate ID
gen id = _n

* Generate Longitude and Latitude Coordinates
gen _CX =  40 + rnormal()
gen _CY = -70 + rnormal()

* Set the Data for Spatial Analysis
spset id, coord(_CX _CY) coordsys(latlong, miles)

* Create a Spatial Matrix Based on Inverse Distance
spmatrix create idistance W

* Estimate Regression Model
collect _r_b _r_se, tag(model[1]): spregress mpg price weight, gs2sls errorlag(W)
collect _r_b _r_se, tag(model[2]): spregress mpg price weight, ml vce(robust) errorlag(W)

* Example 1: Select Model Variables (e.mpg not displayed)
collect layout (colname[price _cons e.mpg var(e.mpg)]#result[_r_b _r_se]) (model)

* Show that e.mpg is in the colname dimension
collect levelsof colname

* Example 2: All Model Variables
collect layout (colname#result[_r_b _r_se]) (model)

* Example 3: Workaround using autolevels
collect style autolevels colname price _cons e.mpg var(e.mpg)
collect layout (colname#result[_r_b _r_se]) (model)

Identify consecutive weeks around missing survey date values

Blaise Baker — Tue, 17 Feb 2026 17:04:55 GMT

Building on a prior question (https://www.statalist.org/forums/for...rows-for-dates), I am now trying to figure out a somewhat convoluted task. I have panel survey data over 30 days and am trying to figure out how to deal with cases where respondents missed a day or two of the survey.

My end goal is to recover as many complete weeks as possible, where week is defined simply as a 7-day period (not tied to any specific days of the week).

So, what I want is to create a new variable called 'week' that denotes complete weeks, i.e., all consecutive 7 day stretches where there are no missing surveys.

Here's some data for two respondents, one of whom gets interrupted by a missing survey in the first week, another in the third week:

Code:

clear
input int pptid str9 survey_date float surveynum float num_survresponses str9 first_survey_date str9 last_survey_date float span_days float surv_missing
100 "24feb2019" 1 29 "24feb2019" "22mar2019" 30 0
100 "25feb2019" 2 29 "24feb2019" "22mar2019" 30 0
100 "26feb2019" 3 29 "24feb2019" "22mar2019" 30 0
100 "27feb2019" 4 29 "24feb2019" "22mar2019" 30 0
100 "28feb2019" 5 29 "24feb2019" "22mar2019" 30 0
100 "28feb2019" 6 29 "24feb2019" "22mar2019" 30 0
100 "28feb2019" 7 29 "24feb2019" "22mar2019" 30 0
100 "28feb2019" 8 29 "24feb2019" "22mar2019" 30 0
100 "01mar2019" 9 29 "24feb2019" "22mar2019" 30 0
100 "02mar2019" 10 29 "24feb2019" "22mar2019" 30 0
100 "03mar2019" 11 29 "24feb2019" "22mar2019" 30 0
100 "04mar2019" 12 29 "24feb2019" "22mar2019" 30 0
100 "05mar2019" 13 29 "24feb2019" "22mar2019" 30 0
100 "06mar2019" 14 29 "24feb2019" "22mar2019" 30 0
100 "07mar2019" 15 29 "24feb2019" "22mar2019" 30 0
100 "08mar2019" 16 29 "24feb2019" "22mar2019" 30 0
100 "09mar2019" 17 29 "24feb2019" "22mar2019" 30 0
100 "10mar2019" 18 29 "24feb2019" "22mar2019" 30 0
100 "11mar2019" 19 29 "24feb2019" "22mar2019" 30 0
100 "12mar2019" 20 29 "24feb2019" "22mar2019" 30 0
100 "13mar2019" . 29 "24feb2019" "22mar2019" 30 1
100 "14mar2019" 21 29 "24feb2019" "22mar2019" 30 0
100 "15mar2019" 22 29 "24feb2019" "22mar2019" 30 0
100 "16mar2019" 23 29 "24feb2019" "22mar2019" 30 0
100 "17mar2019" 24 29 "24feb2019" "22mar2019" 30 0
100 "18mar2019" 25 29 "24feb2019" "22mar2019" 30 0
100 "19mar2019" 26 29 "24feb2019" "22mar2019" 30 0
100 "20mar2019" 27 29 "24feb2019" "22mar2019" 30 0
100 "21mar2019" 28 29 "24feb2019" "22mar2019" 30 0
100 "22mar2019" 29 29 "24feb2019" "22mar2019" 30 0
200 "29apr2019" 1 29 "29apr2019" "27may2019" 30 0
200 "30apr2019" 2 29 "29apr2019" "27may2019" 30 0
200 "01may2019" 3 29 "29apr2019" "27may2019" 30 0
200 "02may2019" 4 29 "29apr2019" "27may2019" 30 0
200 "03may2019" 5 29 "29apr2019" "27may2019" 30 0
200 "04may2019" 6 29 "29apr2019" "27may2019" 30 0
200 "05may2019" . 29 "29apr2019" "27may2019" 30 1
200 "06may2019" 7 29 "29apr2019" "27may2019" 30 0
200 "07may2019" 8 29 "29apr2019" "27may2019" 30 0
200 "08may2019" 9 29 "29apr2019" "27may2019" 30 0
200 "09may2019" 10 29 "29apr2019" "27may2019" 30 0
200 "10may2019" 11 29 "29apr2019" "27may2019" 30 0
200 "11may2019" 12 29 "29apr2019" "27may2019" 30 0
200 "12may2019" 13 29 "29apr2019" "27may2019" 30 0
200 "13may2019" 14 29 "29apr2019" "27may2019" 30 0
200 "14may2019" 15 29 "29apr2019" "27may2019" 30 0
200 "15may2019" 16 29 "29apr2019" "27may2019" 30 0
200 "16may2019" 17 29 "29apr2019" "27may2019" 30 0
200 "17may2019" 18 29 "29apr2019" "27may2019" 30 0
200 "18may2019" 19 29 "29apr2019" "27may2019" 30 0
200 "19may2019" 20 29 "29apr2019" "27may2019" 30 0
200 "20may2019" 21 29 "29apr2019" "27may2019" 30 0
200 "21may2019" 22 29 "29apr2019" "27may2019" 30 0
200 "22may2019" 23 29 "29apr2019" "27may2019" 30 0
200 "23may2019" 24 29 "29apr2019" "27may2019" 30 0
200 "24may2019" 25 29 "29apr2019" "27may2019" 30 0
200 "25may2019" 26 29 "29apr2019" "27may2019" 30 0
200 "26may2019" 27 29 "29apr2019" "27may2019" 30 0
200 "27may2019" 29 29 "29apr2019" "27may2019" 30 0
end

And to help clarify what I'm looking for, here's what that new 'week' variable would ideally look like for id=100:

Code:

clear
input int pptid str9 survey_date float surveynum float num_survresponses str9 first_survey_date str9 last_survey_date float span_days float surv_missing float week
100 "24feb2019" 1 29 "24feb2019" "22mar2019" 30 0 1
100 "25feb2019" 2 29 "24feb2019" "22mar2019" 30 0 1
100 "26feb2019" 3 29 "24feb2019" "22mar2019" 30 0 1
100 "27feb2019" 4 29 "24feb2019" "22mar2019" 30 0 1
100 "28feb2019" 5 29 "24feb2019" "22mar2019" 30 0 1
100 "28feb2019" 6 29 "24feb2019" "22mar2019" 30 0 1
100 "28feb2019" 7 29 "24feb2019" "22mar2019" 30 0 1
100 "28feb2019" 8 29 "24feb2019" "22mar2019" 30 0 2
100 "01mar2019" 9 29 "24feb2019" "22mar2019" 30 0 2
100 "02mar2019" 10 29 "24feb2019" "22mar2019" 30 0 2
100 "03mar2019" 11 29 "24feb2019" "22mar2019" 30 0 2
100 "04mar2019" 12 29 "24feb2019" "22mar2019" 30 0 2
100 "05mar2019" 13 29 "24feb2019" "22mar2019" 30 0 2
100 "06mar2019" 14 29 "24feb2019" "22mar2019" 30 0 2
100 "07mar2019" 15 29 "24feb2019" "22mar2019" 30 0 .
100 "08mar2019" 16 29 "24feb2019" "22mar2019" 30 0 .
100 "09mar2019" 17 29 "24feb2019" "22mar2019" 30 0 .
100 "10mar2019" 18 29 "24feb2019" "22mar2019" 30 0 .
100 "11mar2019" 19 29 "24feb2019" "22mar2019" 30 0 .
100 "12mar2019" 20 29 "24feb2019" "22mar2019" 30 0 .
100 "13mar2019" . 29 "24feb2019" "22mar2019" 30 1 .
100 "14mar2019" 21 29 "24feb2019" "22mar2019" 30 0 3
100 "15mar2019" 22 29 "24feb2019" "22mar2019" 30 0 3
100 "16mar2019" 23 29 "24feb2019" "22mar2019" 30 0 3
100 "17mar2019" 24 29 "24feb2019" "22mar2019" 30 0 3
100 "18mar2019" 25 29 "24feb2019" "22mar2019" 30 0 3
100 "19mar2019" 26 29 "24feb2019" "22mar2019" 30 0 3
100 "20mar2019" 27 29 "24feb2019" "22mar2019" 30 0 .
100 "21mar2019" 28 29 "24feb2019" "22mar2019" 30 0 .
100 "22mar2019" 29 29 "24feb2019" "22mar2019" 30 0 .
end

Fill in missing rows for dates

Blaise Baker — Tue, 17 Feb 2026 01:50:41 GMT

Below, I have some survey data where we (wanted) respondents to answer for 30 straight days. Of course, many respondents missed a handful of days here and there.

Right now, we just have a counter for each survey someone took. But, what I want to do is identify a respondent's non-response date(s), add that row back into the dataset, and include some flag for "nonresponse" or something like that as well, so I can identify these rows later. How can I do this?

* Example generated by -dataex-. For more info, type help dataex
clear
input str3 pptid float(survey_date surveynum)
"100" 21769 1
"100" 21770 2
"100" 21771 3
"100" 21772 4
"100" 21773 5
"100" 21774 6
"100" 21775 7
"100" 21776 8
"100" 21777 9
"100" 21778 10
"100" 21779 11
"100" 21781 12
"100" 21782 13
"100" 21783 14
"100" 21784 15
"100" 21785 16
"100" 21786 17
"100" 21787 18
"100" 21788 19
"100" 21789 20
"100" 21790 21
"100" 21791 22
"100" 21793 23
"100" 21794 24
"100" 21795 25
"100" 21796 26
"100" 21797 27
"100" 21798 28
"200" 21806 1
"200" 21807 2
"200" 21808 3
"200" 21809 4
"200" 21810 5
"200" 21811 6
"200" 21812 7
"200" 21813 8
"200" 21814 9
"200" 21815 10
"200" 21816 11
"200" 21817 12
"200" 21818 13
"200" 21819 14
"200" 21820 15
"200" 21821 16
"200" 21822 17
"200" 21824 18
"200" 21825 19
"200" 21827 20
"200" 21828 21
"200" 21829 22
"200" 21830 23
"200" 21831 24
"200" 21832 25
"200" 21833 26
"200" 21834 27
"200" 21835 28
"300" 21869 1
"300" 21870 2
"300" 21871 3
"300" 21872 4
"300" 21873 5
"300" 21874 6
"300" 21875 7
"300" 21876 8
"300" 21877 9
"300" 21878 10
"300" 21879 11
"300" 21880 12
"300" 21881 13
"300" 21882 14
"300" 21883 15
"300" 21884 16
"300" 21885 17
"300" 21886 18
"300" 21887 19
"300" 21888 20
"300" 21889 21
"300" 21890 22
"300" 21891 23
"300" 21892 24
"300" 21893 25
"300" 21895 26
"300" 21896 27
"300" 21897 28
"300" 21898 29
end
format %td survey_date

Generalized ordered logit model

Luis Mijares Castaneda — Mon, 16 Feb 2026 22:58:34 GMT

Hello, I have a question about model selection, I am using the gologit2 command, and I'm getting the results below. All the variables meet the PL trend lines assumption, but the overall models fail the PL assumptions. My data is survey weighted, the models are shown below. I have three outcome variables along with the controls. Should I use a mlogit model or continue with the parallel trend lines assumption? I'm using survey data so alternative tests of model fit like LR or AIC/BIC are not avaliable, not sure what other test I can use for the PL assumption

Code:

Step  17: All explanatory variables meet the pl assumption

F( 48,  1292) =    2.30
            Prob > F =    0.0000

An insignificant test statistic indicates that the final model
does not violate the proportional odds/ parallel lines assumption

Code:


svyset [pw=weights]

local outcomes  dental_visit oral_problems food_avoidence

foreach outcome of local outcomes {
    
    local contrls i.imprisson c.age i.biosex i.maritalstatus_alt i.income ///
             i.employment_alt i.education c.householdsize i.race_alt ///
             i.military_branch
    
    
gsvy: gologit2 `outcome' `contrls', autofit(0.001) difficult waldforce or

    
    
}

Loop etable with external value test (GOF)

Rodrigo Badilla — Mon, 16 Feb 2026 22:09:47 GMT

Hi all,

Iam getting a problem buildings tables with external value test (GOF) and etable using a loop.

I always get a remaining results of a last table, I can solve partially using: clear results, example ( I only expect 2 results):

Code:

webuse nhanes2l, clear

putdocx clear
putdocx begin
clear results

etable, column(index) mstat(chi2_gof=(r(chi2)), label(GOF χ²)) mstat(p_gof=(r(p)), label(GOF p-value) nformat(%6.4f)) showstars showstarsnote

forvalues i = 1/2 {

   logit diabetes bpsystol age weight i.region if sex==`i'
   estat gof
   etable, append
}

putdocx collect
putdocx save test, replace

Array

With the same commands and If I dont include: clear results, repeat last table.

Code:

webuse nhanes2l, clear

putdocx clear
putdocx begin

etable, column(index) mstat(chi2_gof=(r(chi2)), label(GOF χ²)) mstat(p_gof=(r(p)), label(GOF p-value) nformat(%6.4f)) showstars showstarsnote

forvalues i = 1/2 {

   logit diabetes bpsystol age weight i.region if sex==`i'
   estat gof
   etable, append
}

putdocx collect
putdocx save test, replace

Array

Handling Highly Skewed Independent Variable: Oil Rents (% of GDP) in Cross-Country Analysis

Said Karaca — Mon, 16 Feb 2026 17:14:07 GMT

Hello Stata community,

I am working on a project analyzing the effect of oil rents (% of GDP) on trade. Oil rents are my independent variable, and trade is measured using a scale ranging from 0 to 100.

I am facing a problem with the distribution of oil rents. Since I am conducting a cross-sectional analysis, I have 121 observations in total. However, when I include control variables and restrict the analysis to variables without missing values, the number of observations drops to 98.

The distribution of oil rents is highly left-skewed, with many countries in the 0–10% range. Because this is a global analysis, I do not want to exclude these countries, as doing so would reduce my sample size significantly. At the same time, I am concerned whether it is academically correct to continue the analysis with such skewed data.

I tried a log transformation, but it caused two issues:

The significance of oil rents in my analysis disappeared.
For countries with oil rents below 1%, the log transformation produces negative values.

I would greatly appreciate your advice on how to deal with this issue and possible Stata solutions to handle the skewed distribution while keeping all observations in the analysis.

Thank you in advance!