<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
	<channel>
		<title>Statalist</title>
		<link>https://www.statalist.org/forums/</link>
		<description>vBulletin Forums</description>
		<language>en</language>
		<lastBuildDate>Fri, 22 May 2026 18:06:36 GMT</lastBuildDate>
		<generator>vBulletin</generator>
		<ttl>60</ttl>
		<image>
			<url>images/misc/rss.png</url>
			<title>Statalist</title>
			<link>https://www.statalist.org/forums/</link>
		</image>
		<item>
			<title>Can you stack multiple JWDID regressions?</title>
			<link>https://www.statalist.org/forums/forum/general-stata-discussion/general/1786157-can-you-stack-multiple-jwdid-regressions</link>
			<pubDate>Wed, 20 May 2026 08:45:50 GMT</pubDate>
			<description>Hi all!  
 
I find myself in a very specific situation. I am evaluating a policy, and I only have the treated units. My identification strategy...</description>
			<content:encoded><![CDATA[Hi all! <br />
<br />
I find myself in a very specific situation. I am evaluating a policy, and I only have the treated units. My identification strategy relies on comparing units treated at time g, to units treated at time g'&gt;g, so I use not-yet-treated units as controls. To account for the fact that this units entered the treatment at different times, as they selected into the treatment, have to use IPW to rebalance the traded and the yet untreated firms. This would sound like a job for csdid, but the point is that for one of my specifications, I need to construct the control sample in the following way: not yet treated units enter the pool of controls only if they have Y=0 until time g (the time of the currently treated cohort of units). this goes in for every cohort, so every treated group gets rebalanced against its own later treated groups of units: So, I have a cohort-anchored filter per-cohort: for cohort g, keep control units with Σ_{t&lt;g} Y = 0. This cannot be implemented automatically in csdid.<br />
<br />
After the cohort specific IPW step, for each cohort, I use jwdid:<br />
<br />
<b><i>How I use jwdid</i></b><i>.</i> Because the filter is g-specific, I run jwdid (ETWFE, method(reg), without the never option, so not-yet-treated are the controls) separately for each cohort g, each on its own cohort-anchored sub-panel. From each run we keep only the focal cohort's ATT(g,t), and then aggregate ATT(g,t) across cohorts into an overall ATT and an event study, using cohort-size weights. Basically I stack multiple ETWFE estimations. <br />
<br />
<b><i>The issue.</i></b> The per-cohort jwdid runs are not independent: the same later-cohort and never-treated firms serve as controls in multiple cohort runs. The analytic aggregate standard error combines the per-cohort jwdid SEs assuming independence across cohorts, and this appears to understate the true SE — a unit-level block bootstrap (resampling firms and re-running the whole pipeline) yields SEs roughly 1.7–2× larger.<br />
<br />
<i>Question.</i> Given this per-cohort jwdid design with a cohort-specific sample filter and manual cross-cohort aggregation, is a firm-level block bootstrap the appropriate inference, or is there a correct analytic / influence-function-based standard error for the aggregated ATT that we should use instead? <br />
<br />
<br />
Thank you !!<br />
<br />
<br />
<br />
<br />
 ]]></content:encoded>
			<category domain="https://www.statalist.org/forums/forum/general-stata-discussion/general">General</category>
			<dc:creator>Salvatore Fittizio</dc:creator>
			<guid isPermaLink="true">https://www.statalist.org/forums/forum/general-stata-discussion/general/1786157-can-you-stack-multiple-jwdid-regressions</guid>
		</item>
		<item>
			<title>Overidentification test after endogenous treatment effects model</title>
			<link>https://www.statalist.org/forums/forum/general-stata-discussion/general/1786140-overidentification-test-after-endogenous-treatment-effects-model</link>
			<pubDate>Mon, 18 May 2026 09:31:55 GMT</pubDate>
			<description>I am using the endogenous treatment effects model -eteffects-. Since I am using two excluded variables to model the selection equation, the reviewer...</description>
			<content:encoded><![CDATA[I am using the endogenous treatment effects model -eteffects-. Since I am using two excluded variables to model the selection equation, the reviewer is asking for an overidentification test. From what I understand, this is not possible with -eteffects-. <br />
<br />
Can I conduct a Hansen test? If not, what is the workaround, or possible response to the reviewer? I am already reporting the result of the endogeniety test.]]></content:encoded>
			<category domain="https://www.statalist.org/forums/forum/general-stata-discussion/general">General</category>
			<dc:creator>Parul Gupta</dc:creator>
			<guid isPermaLink="true">https://www.statalist.org/forums/forum/general-stata-discussion/general/1786140-overidentification-test-after-endogenous-treatment-effects-model</guid>
		</item>
		<item>
			<title>New version of mmqreg on SSC (v2.5) -- adds version statement     and ships polished mmqregplot companion</title>
			<link>https://www.statalist.org/forums/forum/general-stata-discussion/general/1786137-new-version-of-mmqreg-on-ssc-v2-5-adds-version-statement-and-ships-polished-mmqregplot-companion</link>
			<pubDate>Sun, 17 May 2026 17:47:59 GMT</pubDate>
			<description>Thanks to Kit Baum, an update of the -mmqreg- package (originally 
by Fernando Rios-Avila) is now available from SSC, as v2.5. 
 
-mmqreg- estimates...</description>
			<content:encoded><![CDATA[<br />
Thanks to Kit Baum, an update of the -mmqreg- package (originally<br />
by Fernando Rios-Avila) is now available from SSC, as v2.5.<br />
<br />
-mmqreg- estimates quantile regressions via the Method of Moments<br />
approach of Machado and Santos Silva (2019). Version 2.4<br />
(released last week) added a &quot;jknife&quot; option implementing the<br />
decomposed split-panel jackknife bias correction for short<br />
panels, and a companion plot command -mmqregplot-.<br />
<br />
WHAT'S NEW IN v2.5<br />
------------------<br />
<br />
1. Explicit &quot;version 13&quot; statement added to both mmqreg.ado and<br />
mmqregplot.ado. (Thanks to Kit for catching that v2.4 was<br />
relying on the caller's current version setting.)<br />
<br />
2. -mmqregplot- has been polished significantly (now mmqregplot<br />
v2.1):<br />
<br />
- Bug fix. The previous version produced a conformability<br />
error (r(503)) on multi-quantile plots after the per-<br />
quantile loop completed. The qtile path is now computed<br />
in a single multi-quantile -mmqreg- call (q(numlist) nols)<br />
and the plot reads e(b) once, locating each (quantile,<br />
variable) entry by matching (coleq, colname). No more<br />
row-stacking, no metadata loss. About 17x faster on a<br />
17-point quantile grid as a side benefit.<br />
<br />
- Two small pre-existing bugs fixed: a duplicate &quot;replace&quot;<br />
in the location/scale sub-routine call (r(198)), and a<br />
misspelled &quot;nopts()&quot; option in the histogram-KDE feplot<br />
branch (should be &quot;normopts()&quot;, r(198)).<br />
<br />
- New options for display and disk-saving:<br />
showall -- draw every panel as its own named graph<br />
(no hidden nodraw), with a final combined<br />
summary as the last graph<br />
keepgraphs -- keep individual panels in memory after<br />
combining<br />
nocombine -- skip the graph combine step entirely<br />
saving(prefix) -- save every panel plus the combined<br />
figure as .gph files<br />
gformat(formats) -- additionally export to png, pdf, eps,<br />
jpg, etc.<br />
<br />
- All panels are now issued as named graphs:<br />
mmqp1, mmqp2, ... -- quantile-path panels (one per var)<br />
mmqloc -- location coefplot<br />
mmqsca -- scale coefplot<br />
mmqfe -- fixed-effects panel<br />
mmqcombined -- final combined figure<br />
<br />
so users can revisit or re-export any panel after the fact<br />
(e.g. -graph display mmqloc-).<br />
<br />
- absorb() is rebuilt from e(fevlist) automatically, so the<br />
plotted path matches the user's actual FE-absorbed model.<br />
<br />
3. Updated help files. -help mmqregplot- now documents the full<br />
option set, the named-graph convention, and disk-saving usage,<br />
with new worked examples.<br />
<br />
INSTALLATION<br />
------------<br />
<br />

<div class="bbcode_container">
	<div class="bbcode_description">Code:</div>
	<pre class="bbcode_code">.
ssc install mmqreg, replace</pre>
</div>or, for users who already have it installed:<br />
<br />
. adoupdate mmqreg, update<br />
<br />
To see the new features in action:<br />
<br />
. webuse nlswork, clear<br />
. xtset idcode year<br />
. mmqreg ln_w age ttl_exp tenure not_smsa south, ///<br />
absorb(idcode) q(25 50 75) jknife<br />
. mmqregplot age ttl_exp tenure, quantile(10(10)90) ///<br />
ols label showall saving(&quot;wage_qpath&quot;) gformat(png)<br />
<br />
BACKWARD COMPATIBILITY<br />
----------------------<br />
All v2.4 (and v2.3) syntax and behavior is preserved. The new<br />
mmqregplot options and the &quot;version 13&quot; statement are additive.<br />
<br />
REFERENCES<br />
----------<br />
Dhaene, G. and Jochmans, K. (2015). Split-panel jackknife<br />
estimation of fixed-effect models. Review of Economic Studies,<br />
82(3), 991-1030.<br />
Machado, J.A.F. and Santos Silva, J.M.C. (2019). Quantiles via<br />
moments. Journal of Econometrics, 213(1), 145-173.<br />
<br />
Feedback and bug reports are very welcome.<br />
<br />
Best regards,<br />
<br />
Dr Merwan Roudane<br />
Researcher in Applied Econometrics<br />
<a href="mailto:merwanroudane920@gmail.com">merwanroudane920@gmail.com</a><br />
<br />
(in collaboration with Fernando Rios-Avila, <a href="mailto:friosa@gmail.com">friosa@gmail.com</a>,<br />
original mmqreg author, Levy Economics Institute)]]></content:encoded>
			<category domain="https://www.statalist.org/forums/forum/general-stata-discussion/general">General</category>
			<dc:creator>Merwan Roudane</dc:creator>
			<guid isPermaLink="true">https://www.statalist.org/forums/forum/general-stata-discussion/general/1786137-new-version-of-mmqreg-on-ssc-v2-5-adds-version-statement-and-ships-polished-mmqregplot-companion</guid>
		</item>
		<item>
			<title>New SSC package: rdstagger — staggered regression discontinuity with spillover diagnostics</title>
			<link>https://www.statalist.org/forums/forum/general-stata-discussion/general/1786136-new-ssc-package-rdstagger-—-staggered-regression-discontinuity-with-spillover-diagnostics</link>
			<pubDate>Sun, 17 May 2026 15:36:03 GMT</pubDate>
			<description>Dear Statalist members, 
 
I am pleased to announce that rdstagger is now available from SSC for Stata 14.1 and above. 
 
rdstagger provides tools...</description>
			<content:encoded><![CDATA[Dear Statalist members,<br />
<br />
I am pleased to announce that <b>rdstagger</b> is now available from SSC for Stata 14.1 and above.<br />
<br />
<b>rdstagger</b> provides tools for staggered regression-discontinuity designs, where treatment is assigned by a running variable crossing a cutoff and treatment timing varies across cohorts or groups.<br />
<br />
The package includes:<br />
<br />
<b>rdstagger_sim</b> — simulate staggered RD panel data<br />
<b>rdstagger</b> — estimate cohort-by-period ATT(g,t) effects<br />
<b>rdstagger_agg</b> — aggregate effects into event-study, group, calendar-time, or overall summaries<br />
<b>rdstagger_pretest</b> — run pre-treatment falsification tests<br />
<b>rdstagger_plot</b> — produce event-study plots<br />
<b>rdstagger_spillover</b> — examine possible spillover effects using near- and far-control comparisons<br />
<br />
Installation: <br />
 ssc install rdstagger, replace  <br />
Example: <br />
 rdstagger_sim, n(400) periods(8) cohorts(3) seed(42) rdstagger y x, cutoff(0) gvar(g) tvar(period) idvar(id) bw(1.5) rdstagger_pretest, method(both) rdstagger_agg, type(dynamic) rdstagger_plot, name(event_study) rdstagger_spillover  <br />
Help: <br />
 help rdstagger  <br />
GitHub:<br />
<a href="https://github.com/causalfragility-lab/rdstagger-Stata14" target="_blank">https://github.com/causalfragility-l...tagger-Stata14</a><br />
<br />
Feedback, suggestions, and bug reports are very welcome.<br />
<br />
Best regards,<br />
Subir Hait<br />
Michigan State University<br />
ORCID: <a href="https://orcid.org/0009-0004-9871-9677" target="_blank">https://orcid.org/0009-0004-9871-9677</a><br />
 ]]></content:encoded>
			<category domain="https://www.statalist.org/forums/forum/general-stata-discussion/general">General</category>
			<dc:creator>Subir Hait</dc:creator>
			<guid isPermaLink="true">https://www.statalist.org/forums/forum/general-stata-discussion/general/1786136-new-ssc-package-rdstagger-—-staggered-regression-discontinuity-with-spillover-diagnostics</guid>
		</item>
		<item>
			<title>Mixed model specification for neuroimaging data</title>
			<link>https://www.statalist.org/forums/forum/general-stata-discussion/general/1786130-mixed-model-specification-for-neuroimaging-data</link>
			<pubDate>Sun, 17 May 2026 06:41:11 GMT</pubDate>
			<description>Hi everyone, 
 
I am working on a longitudinal neuroimaging dataset and I would appreciate some advice regarding the most appropriate analytical...</description>
			<content:encoded><![CDATA[Hi everyone,<br />
<br />
I am working on a longitudinal neuroimaging dataset and I would appreciate some advice regarding the most appropriate analytical strategy.<br />
<br />
I have cortical thickness measurements from 68 different brain regions for each subject.<br />
<br />
The data are in long format, where each row represents:<br />
<br />
subject (idd)<br />
cortical area (area_id)<br />
timepoint<br />
cortical thickness (a)<br />
<br />
Each subject was measured at two timepoints only: baseline and follow-up<br />
<br />
The follow-up interval is variable across subjects and is represented by the variable &quot;Months_between_MRI&quot;<br />
<br />
At baseline Months_between_MRI is always 0<br />
<br />
The main clinical predictors are:<br />
<br />
&quot;sclerosi&quot; (presence of sclerosis)<br />
&quot;resistant&quot; (drug resistance)<br />
&quot;toniclonic&quot; (presence of seizures)<br />
<br />
Additional covariates:<br />
<br />
Age<br />
Gender<br />
ICV<br />
Epilepsy_duration<br />
<br />
A datasubste with only 7 out 0f 68 brain areas is the following:<br />

<div class="bbcode_container">
	<div class="bbcode_description">Code:</div>
	<pre class="bbcode_code">* Example generated by -dataex-. For more info, type help dataex
clear
input float idd int Months_between_MRI byte area_id double a byte(Age Gender) double(Epilepsy_duration Onset_years) float(sclerosi resistant toniclonic)
1  0 1  2.45 30 0  2 28 0 0 1
1  0 2 2.406 30 0  2 28 0 0 1
1  0 3 2.544 30 0  2 28 0 0 1
1  0 4 1.987 30 0  2 28 0 0 1
1  0 5 2.917 30 0  2 28 0 0 1
1  0 6 2.667 30 0  2 28 0 0 1
1 44 1 2.362 33 0  5 28 0 0 1
1 44 2 2.408 33 0  5 28 0 0 1
1 44 3 2.521 33 0  5 28 0 0 1
1 44 4  1.95 33 0  5 28 0 0 1
1 44 5 2.677 33 0  5 28 0 0 1
1 44 6  2.59 33 0  5 28 0 0 1
2  0 1 2.253 36 0 15 21 1 1 1
2  0 2 1.969 36 0 15 21 1 1 1
2  0 3 2.105 36 0 15 21 1 1 1
2  0 4 1.668 36 0 15 21 1 1 1
2  0 5 1.083 36 0 15 21 1 1 1
2  0 6 1.683 36 0 15 21 1 1 1
2 72 1 2.414 42 0 21 21 1 1 1
2 72 2 2.708 42 0 21 21 1 1 1
2 72 3 2.163 42 0 21 21 1 1 1
2 72 4  2.06 42 0 21 21 1 1 1
2 72 5 1.344 42 0 21 21 1 1 1
2 72 6 1.872 42 0 21 21 1 1 1
3  0 1  2.08 34 0 16 18 0 1 1
3  0 2 2.637 34 0 16 18 0 1 1
3  0 3 2.274 34 0 16 18 0 1 1
3  0 4 1.796 34 0 16 18 0 1 1
3  0 5  .944 34 0 16 18 0 1 1
3  0 6   1.3 34 0 16 18 0 1 1
3 38 1 2.039 37 0 19 18 0 1 1
3 38 2 2.516 37 0 19 18 0 1 1
3 38 3 2.092 37 0 19 18 0 1 1
3 38 4 2.457 37 0 19 18 0 1 1
3 38 5  .982 37 0 19 18 0 1 1
3 38 6 2.094 37 0 19 18 0 1 1
4  0 1 2.035 44 0  2 42 0 0 0
4  0 2 1.936 44 0  2 42 0 0 0
4  0 3 2.137 44 0  2 42 0 0 0
4  0 4 1.374 44 0  2 42 0 0 0
4  0 5   .83 44 0  2 42 0 0 0
4  0 6 1.563 44 0  2 42 0 0 0
4 36 1 1.786 47 0  5 42 0 0 0
4 36 2 1.805 47 0  5 42 0 0 0
4 36 3 1.993 47 0  5 42 0 0 0
4 36 4 2.314 47 0  5 42 0 0 0
4 36 5 1.433 47 0  5 42 0 0 0
4 36 6 1.628 47 0  5 42 0 0 0
5  0 1 2.002 49 0 15 34 0 1 1
5  0 2 1.783 49 0 15 34 0 1 1
5  0 3 2.322 49 0 15 34 0 1 1
5  0 4 1.403 49 0 15 34 0 1 1
5  0 5  .945 49 0 15 34 0 1 1
5  0 6  1.45 49 0 15 34 0 1 1
5 18 1 2.468 51 0 17 34 0 1 1
5 18 2  1.99 51 0 17 34 0 1 1
5 18 3 1.985 51 0 17 34 0 1 1
5 18 4 2.149 51 0 17 34 0 1 1
5 18 5  .942 51 0 17 34 0 1 1
5 18 6 1.599 51 0 17 34 0 1 1
6  0 1 2.484 25 0  5 20 0 0 1
6  0 2 3.064 25 0  5 20 0 0 1
6  0 3 2.445 25 0  5 20 0 0 1
6  0 4 1.973 25 0  5 20 0 0 1
6  0 5 3.265 25 0  5 20 0 0 1
6  0 6  2.84 25 0  5 20 0 0 1
6 66 1 2.072 31 0 11 20 0 0 1
6 66 2 2.731 31 0 11 20 0 0 1
6 66 3 1.656 31 0 11 20 0 0 1
6 66 4 2.286 31 0 11 20 0 0 1
6 66 5 1.459 31 0 11 20 0 0 1
6 66 6  1.88 31 0 11 20 0 0 1
7  0 1  2.11 24 0 11 13 0 1 1
7  0 2 2.386 24 0 11 13 0 1 1
7  0 3 2.184 24 0 11 13 0 1 1
7  0 4  1.28 24 0 11 13 0 1 1
7  0 5  .905 24 0 11 13 0 1 1
7  0 6 1.741 24 0 11 13 0 1 1
7 34 1 2.135 27 0 14 13 0 1 1
7 34 2 2.514 27 0 14 13 0 1 1
7 34 3 1.857 27 0 14 13 0 1 1
7 34 4 2.569 27 0 14 13 0 1 1
7 34 5   1.4 27 0 14 13 0 1 1
7 34 6 1.964 27 0 14 13 0 1 1
8  0 1 2.523 15 1  0 15 0 0 0
8  0 2 3.226 15 1  0 15 0 0 0
8  0 3 2.501 15 1  0 15 0 0 0
8  0 4 2.098 15 1  0 15 0 0 0
8  0 5 3.217 15 1  0 15 0 0 0
8  0 6 2.716 15 1  0 15 0 0 0
8 19 1 2.299 16 1  1 15 0 0 0
8 19 2 2.448 16 1  1 15 0 0 0
8 19 3 2.307 16 1  1 15 0 0 0
8 19 4 2.347 16 1  1 15 0 0 0
8 19 5 2.289 16 1  1 15 0 0 0
8 19 6 2.462 16 1  1 15 0 0 0
9  0 1 1.985 32 0 21 11 1 1 1
9  0 2 2.497 32 0 21 11 1 1 1
9  0 3 1.735 32 0 21 11 1 1 1
9  0 4 1.562 32 0 21 11 1 1 1
end</pre>
</div><br />
<br />
The cortical areas have very different absolute thickness scales, so using raw thickness values makes interpretation difficult across regions.<br />
<br />
Initially, I tried <b>z-scoring within area</b><br />
andlongitudinal mixed models such as:<br />
<br />

<div class="bbcode_container">
	<div class="bbcode_description">Code:</div>
	<pre class="bbcode_code">mixed z_area  i.sclerosi##c.Months_between_MRI i.resistant##c.Months_between_MRI  i.toniclonic##c.Months_between_MRI  Age Gender ICV Epilepsy_duration || idd: || area_id:</pre>
</div>And then:<br />

<div class="bbcode_container">
	<div class="bbcode_description">Code:</div>
	<pre class="bbcode_code">margins sclerosi#resistant#toniclonic, ///
    at(Months_between_MRI=(0 30 60 120))
marginsplot, xdimension(Months_between_MRI)</pre>
</div>However:<br />
<br />
interpretation became less intuitive, plotting trajectories was difficult and I am unsure whether this is the best strategy given that I only have two timepoints.<br />
<br />
Current idea<br />
<br />
I am now considering computing percentage change from baseline pct_change = 100*(followup - baseline)/baseline<br />
<br />
and then analyzing it by keeping only the follow-up observation.<br />
<br />
This would produce one observation per subject-area combination representing the rate of cortical thinning.<br />
<br />
Then fitting something like:<br />
<br />

<div class="bbcode_container">
	<div class="bbcode_description">Code:</div>
	<pre class="bbcode_code">mixed pct_change i.sclerosi i.responder  i.toniclonic  c.baseline_area Age Gender ICV Epilepsy_duration  || area_id:</pre>
</div><br />
Does this approach seem statistically reasonable given only two timepoints, variable follow-up duration, many cortical regions per subject?<br />
<br />
In the end...<b>Which analytical strategy would you suggest?</b> My main aim is to identify which subjects, in terms of hippocampal sclerosis, drug resistance, and presence of tonic-clonic seizures,show the greatest cortical thickness reduction over time.<br />
Thanks for your time!<br />
Gianfranco]]></content:encoded>
			<category domain="https://www.statalist.org/forums/forum/general-stata-discussion/general">General</category>
			<dc:creator>Gianfranco Di Gennaro</dc:creator>
			<guid isPermaLink="true">https://www.statalist.org/forums/forum/general-stata-discussion/general/1786130-mixed-model-specification-for-neuroimaging-data</guid>
		</item>
		<item>
			<title>Data Visualization in Sociology (Kieran Healy and James Moody)</title>
			<link>https://www.statalist.org/forums/forum/general-stata-discussion/general/1786126-data-visualization-in-sociology-kieran-healy-and-james-moody</link>
			<pubDate>Sun, 17 May 2026 01:42:53 GMT</pubDate>
			<description>Recommend a review article about Data Visualization in Sociology https://www.annualreviews.org/content/journals/10.1146/annurev-soc-071312-145551...</description>
			<content:encoded><![CDATA[Recommend a review article about <b><i>Data Visualization in Sociology</i></b> <a href="https://www.annualreviews.org/content/journals/10.1146/annurev-soc-071312-145551" target="_blank">https://www.annualreviews.org/conten...-071312-145551</a> There's some interesting observation in it.<br />
<br />
<div class="bbcode_container">
	<div class="bbcode_quote">
		<div class="quote_container">
			<div class="bbcode_quote_container vb-icon vb-icon-quote-large"></div>
			
				Given the power of statistical visualization, then, it is puzzling that quantitative sociology is so often practiced without visual referents. One need only compare a recent issue of the <i>American Sociological Review</i> or the <i>American Journal of Sociology</i> to Science, <i>Nature</i>, or the <i>Proceedings of the National Academy of Science</i> to see the radical difference in visual acuity. It is common for the premier journals in sociology to publish articles with many tables, but no figures. The opposite is true in the premier natural science journals. There, a key figure is often the heart of the article. In <i>Nature</i>, for example, the online table of contents includes a thumbnail of the central figure to serve as the link to the rest of the paper.
			
		</div>
	</div>
</div><div class="bbcode_container">
	<div class="bbcode_quote">
		<div class="quote_container">
			<div class="bbcode_quote_container vb-icon vb-icon-quote-large"></div>
			
				A common complaint about Tufte's work is that there are so few direct instructions. Busy cooks want a cookbook, not a picture of a fantastic meal. The tendency for the codification of data visualization to vacillate between overly abstract maxims and overly specific examples is characteristic of any craft where a practical sense of how to proceed—a taste or feeling for the right choice—matters for successful execution. A long-standing and plausible response to the problem is to have the designer make many of the judicious choices in advance and then embed them for users in the default settings of graphics applications. Given that graphical software aimed at regular users has been around for several decades now, however, these efforts have proven less successful than initially hoped. In the foreword to the new edition of Semiology of Graphics, Howard Wainer (2010, p. xi) reflects on the hope he and others once felt that easy-to-use graphical tools and software would lead to better general practice by way of smarter defaults. But, he argues, this has not happened. In the end, high quality graphical presentation requires crafting a deliberately designed message rather than accepting the pre-established setting. Recent theoretical work explicitly recognizes the limits of relying on defaults.
			
		</div>
	</div>
</div><div class="bbcode_container">
	<div class="bbcode_quote">
		<div class="quote_container">
			<div class="bbcode_quote_container vb-icon vb-icon-quote-large"></div>
			
				To many working statisticians, infographics are the descendants of Tufte's Ducks—those “self-promoting graphics” where “the overall design purveys Graphical Style rather than quantitative information” (Tufte 1983, p. 116). The contemporary infographic in its pure form is a supercharged megaduck incorporating not only the bells and whistles derided by Tufte but far more besides, such as a spurious quasi-narrative structure, pictographic sequencing, or excessive dynamic elements. Gelman &amp; Unwin (2013) discuss Infovis-style work from a statistical point of view. They argue that most infographics do not meet the standards normally demanded of statistical visualizations, but they concede that sometimes the goals of the latter are not those of the former.
			
		</div>
	</div>
</div>]]></content:encoded>
			<category domain="https://www.statalist.org/forums/forum/general-stata-discussion/general">General</category>
			<dc:creator>Chen Samulsion</dc:creator>
			<guid isPermaLink="true">https://www.statalist.org/forums/forum/general-stata-discussion/general/1786126-data-visualization-in-sociology-kieran-healy-and-james-moody</guid>
		</item>
		<item>
			<title>Creating a histogram that is both grouped and stacked, with another variable that is represented by a line</title>
			<link>https://www.statalist.org/forums/forum/general-stata-discussion/general/1786122-creating-a-histogram-that-is-both-grouped-and-stacked-with-another-variable-that-is-represented-by-a-line</link>
			<pubDate>Sat, 16 May 2026 21:03:02 GMT</pubDate>
			<description><![CDATA[Hello Stata people; 
 
  I'm using the version 13.1 os Stata while working with this dataset: 
 
 
 
* Example generated by -dataex-. To install: ssc...]]></description>
			<content:encoded><![CDATA[     Hello Stata people;<br />
<br />
  I'm using the version 13.1 os Stata while working with this dataset:<br />
<br />

<div class="bbcode_container">
	<div class="bbcode_description">Code:</div>
	<pre class="bbcode_code">* Example generated by -dataex-. To install: ssc install dataex
clear
input int anne float(impotssurlessocitsptroliresen droitsdedouaneen impotssurlessocitsnonptroliresen droitsdeconsommationen diversimpotsetdroitsen taxesurvaleurajouteen impotsurlerevenuen) long totaldesrecettesfiscalesenmillio
2022   4   5  8.2 10.2 15.5 28.7 28.4 35448
2023 2.5 4.8  9.4 10.2 16.8 27.7 28.6 39200
2024   3   5   10  9.5 15.9 26.9 29.6 41755
2025 1.5   5 12.6  9.6 15.6   27 28.7 44523
2026 1.3   5 12.9  9.6 15.4   27 28.5 47773
end
format %ty anne</pre>
</div>  This data shows the evolution and structure of tax revenues for a given economy. The first 7 variables are expressed in %, the 8th is expressed in monetary units. The observations are annual. <br />
<br />
  The goal is to draw a histogram that is both grouped by year and stacked for the first 7 variables, but for the last variable (expressed in monetary units), I want it drawn as a line, all in the same one graph.<br />
<br />
  Is there a solution to this please?<br />
<br />
     With many thanks!]]></content:encoded>
			<category domain="https://www.statalist.org/forums/forum/general-stata-discussion/general">General</category>
			<dc:creator>Aziz Essouaied</dc:creator>
			<guid isPermaLink="true">https://www.statalist.org/forums/forum/general-stata-discussion/general/1786122-creating-a-histogram-that-is-both-grouped-and-stacked-with-another-variable-that-is-represented-by-a-line</guid>
		</item>
		<item>
			<title>New package on SSC: -harreg- — HAR inference for time-series regression</title>
			<link>https://www.statalist.org/forums/forum/general-stata-discussion/general/1786115-new-package-on-ssc-harreg-—-har-inference-for-time-series-regression</link>
			<pubDate>Sat, 16 May 2026 09:06:36 GMT</pubDate>
			<description>A new package, harreg, is now available from SSC, and thanks to Kit Baum for posting. To install: 
 
ssc install harreg 
harreg implements HAR...</description>
			<content:encoded><![CDATA[A new package, <span style="font-family:courier new">harreg</span>, is now available from SSC, and thanks to Kit Baum for posting. To install:<br />

<div class="bbcode_container">
	<div class="bbcode_description">Code:</div>
	<pre class="bbcode_code">ssc install harreg</pre>
</div><span style="font-family:courier new">harreg</span> implements HAR (heteroskedasticity- and autocorrelation-robust) inference for time-series regressions, using the procedures recommended in <a href="https://doi.org/10.1080/07350015.2018.1506926" target="_blank">Lazarus, Lewis, Stock, and Watson (2018, JBES)</a>. The package supports a range of long-run variance estimators, including the recommended equal-weighted cosine estimator (the default) and the Newey-West kernel estimator, and all tests use fixed-b critical values. Compared to the standard approach implemented by <span style="font-family:courier new">newey</span>, this inference approach <a href="https://doi.org/10.3982/ECTA15404" target="_blank">improves</a> on size control (reducing false rejection rates), with bandwidth rules chosen to optimize the size-power tradeoff. The companion postestimation command <span style="font-family:courier new">harwald</span> performs HAR Wald tests of linear restrictions on the coefficients.<br />
<br />
For details and worked examples, see the help files:<br />

<div class="bbcode_container">
	<div class="bbcode_description">Code:</div>
	<pre class="bbcode_code">help harreg
help harwald</pre>
</div>The package is joint work with Daniel J. Lewis (UCL). Comments, suggestions, and bug reports are more than welcome.]]></content:encoded>
			<category domain="https://www.statalist.org/forums/forum/general-stata-discussion/general">General</category>
			<dc:creator>Eben Lazarus</dc:creator>
			<guid isPermaLink="true">https://www.statalist.org/forums/forum/general-stata-discussion/general/1786115-new-package-on-ssc-harreg-—-har-inference-for-time-series-regression</guid>
		</item>
		<item>
			<title><![CDATA[single_treatment_graphs can't find variable]]></title>
			<link>https://www.statalist.org/forums/forum/general-stata-discussion/general/1786113-single_treatment_graphs-can-t-find-variable</link>
			<pubDate>Fri, 15 May 2026 17:22:33 GMT</pubDate>
			<description>I am working on using synth to run a synthetic control method analysis using state-year data. I am using various predictor variables to predict a...</description>
			<content:encoded><![CDATA[I am working on using synth to run a synthetic control method analysis using state-year data. I am using various predictor variables to predict a synthetic state's unemployment rate after a treatment. When using synth on the original data, everything works great and outputs the results as expected. However, all of the weight is placed on South Carolina as the predictor state. Naturally, I removed South Carolina and ran the exact same synth, synth_runner, and single_treatment_graphs commands on a donor pool of 19 states. For some reason, although it works fine with 20 states, once I drop SC, single_treatment_graphs is unable to find the &quot;unemp&quot; variable that it is predicting. It says it is ambiguously abbreviated, but using &quot;set varabbrev off&quot; to avoid abbreviation confusion results in it saying &quot;variable unemp not found&quot; rather than ambiguous abbreviation. I also had it tabulate unemp the line before single_treatment_graphs (after synth and synth_runner) and it returns a list of all unemp values and then immediately says &quot;variable unemp not found&quot;. Any advice or suggestions would be greatly appreciated as I am very confused.<br />
<br />
I will show the whole code below this, the first section works perfectly fine and produces the treatment graph.  It is only in the second section, after preserving and dropping SC that the error occurs.  It also completes both synth and synth_runner without issue, it is the single_treatment_graphs that causes the error of &quot;variable unemp not found&quot; when abbreviations are off, and &quot;unemp ambiguous abbreviation&quot; when abbreviations are turned on.<br />
Exact code:<br />
clear all<br />
cd &quot;*********************&quot;<br />
<br />
<br />
//////////////////////////////////////////////////////////////////////<br />
<br />
// Install synth<br />
ssc install synth<br />
<br />
<br />
//Import data<br />
use synth_ca.dta<br />
<br />
//Ensure data is a time series<br />
sort state year<br />
tsset state year<br />
<br />
<br />
<br />
//Run synth<br />
synth unemp unemp(2000(1)2013) LFP poverty inc(2008,2013) college(2006,2013) ///<br />
,trunit(2) trperiod(2014) fig<br />
<br />
<br />
//Return important information about treated and synthetic variables<br />
ereturn list<br />
matlist e(Y_synthetic)<br />
matlist e(Y_treated) - e(Y_synthetic)<br />
<br />
// Locate synth_runner<br />
findit synth_runner<br />
// help synth_runner<br />
<br />
<br />
//Run synth_runner<br />
synth_runner unemp LFP poverty inc(2008,2013) college(2006,2013) unemp(2000(1)2013), trunit(2) trperiod(2014) gen_vars<br />
<br />
<br />
//Generate treatment graph<br />
single_treatment_graphs<br />
<br />
<br />
//////////////Remove South Carolina and redo the synth////////////////////////<br />
<br />
clear all<br />
use synth_ca.dta<br />
set varabbrev off<br />
rename observation_date obs<br />
<br />
preserve<br />
drop if state==16<br />
<br />
//Run synth again<br />
synth unemp unemp(2000(1)2013) LFP poverty inc(2008,2013) college(2006,2013) ///<br />
,trunit(2) trperiod(2014) fig<br />
<br />
<br />
//Return important information about treated and synthetic variables<br />
ereturn list<br />
matlist e(Y_synthetic)<br />
matlist e(Y_treated) - e(Y_synthetic)<br />
<br />
// Locate synth_runner<br />
findit synth_runner<br />
// help synth_runner<br />
<br />
<br />
//Run synth_runner<br />
synth_runner unemp LFP poverty inc(2008,2013) college(2006,2013) unemp(2000(1)2013), trunit(2) trperiod(2014) gen_vars<br />
<br />
tabulate unemp<br />
<br />
//Generate treatment graph<br />
single_treatment_graphs<br />
<br />
restore<br />
 ]]></content:encoded>
			<category domain="https://www.statalist.org/forums/forum/general-stata-discussion/general">General</category>
			<dc:creator>Thomas Jendrejack</dc:creator>
			<guid isPermaLink="true">https://www.statalist.org/forums/forum/general-stata-discussion/general/1786113-single_treatment_graphs-can-t-find-variable</guid>
		</item>
		<item>
			<title>Stata 18 graph is glitchy compared to Stata 17</title>
			<link>https://www.statalist.org/forums/forum/general-stata-discussion/general/1786093-stata-18-graph-is-glitchy-compared-to-stata-17</link>
			<pubDate>Thu, 14 May 2026 16:41:09 GMT</pubDate>
			<description>Hi, 
I encountered a weird glitch in Stata today. I was trying to rerun the code for a graph I made a few months ago in Stata 18. The code runs, but...</description>
			<content:encoded><![CDATA[Hi,<br />
I encountered a weird glitch in Stata today. I was trying to rerun the code for a graph I made a few months ago in Stata 18. The code runs, but the graph looks...weird:  <a href="filedata/fetch?filedataid=1786096">Array </a> .<br />
<br />
By contrast, I can run the code in Stata 17 and it generates the graph as desired:   <a href="filedata/fetch?filedataid=1786099">Array </a><br />
<br />
What is going on here? Any help would be much appreciated!<br />
<br />
I have checked that my Stata 18 is up to date.<br />
<br />
<br />
Data and code are provided below:<br />
<br />

<div class="bbcode_container">
	<div class="bbcode_description">Code:</div>
	<pre class="bbcode_code">* Example generated by -dataex-. For more info, type help dataex
clear
input str26 condition double(meanpr_av_ba_grad_2124_age15 meanpr_av_ba_grad_2124_age9 meanpr_av_hs_grad6_age15 meanpr_av_hs_grad6_age9) float order
&quot;always on Ind. (gr4) mover&quot;  3  3 53 51 1
&quot;moved off&quot;                  19  2 83 51 4
&quot;moved on&quot;                    3 19 52 83 3
&quot;never on Ind. (gr4) mover&quot;  22 22 86 85 2
end</pre>
</div><br />
* Set scheme<br />
set scheme s1mono<br />
    <br />
* Prep data for plotting<br />
rename *, lower<br />
rename mean mean<br />
reshape wide mean, i(cond) j(var) string<br />
drop if cond == &quot;full sample&quot;<br />
gen order = 1 if cond == &quot;always on Ind. (gr4) mover&quot;<br />
replace order = 2 if cond == &quot;never on Ind. (gr4) mover&quot;<br />
replace order = 3 if cond == &quot;moved on&quot;<br />
replace order = 4 if cond == &quot;moved off&quot;<br />
<br />
* Multiply by 100<br />
foreach var in meanpr_av_ba_grad_2124_age15 meanpr_av_ba_grad_2124_age9 meanpr_av_hs_grad6_age15 meanpr_av_hs_grad6_age9 {<br />
    replace `var' = `var' * 100<br />
    format `var' %9.0g<br />
}<br />
<br />
*** Generate bar plots<br />
* hs_grad6<br />
graph bar meanpr_av_hs_grad6_age9 meanpr_av_hs_grad6_age15, ///<br />
    over(cond, ///<br />
        relabel(1 &quot;Always-on Ind. mover&quot; 4 &quot;Never-on Ind. mover&quot; 3 &quot;Moved onto res.&quot; 2 &quot;Moved off res.&quot;) ///<br />
        sort(order) label(labsize(small))) ///<br />
    legend(order(1 &quot;Average age 9 nbhd.&quot; 2 &quot;Average age 15 nbhd.&quot;)) ///<br />
    title(&quot;High school completion rates (%) in origin and destination nbhd. by mover type&quot;, size(medsmall)) ///<br />
    blabel(bar)  ///<br />
    ylabel(0 &quot;0%&quot; 20 &quot;20%&quot; 40 &quot;40%&quot; 60 &quot;60%&quot; 80 &quot;80%&quot;, angle(0)) ///<br />
    plotregion(margin(t=5))<br />
<br />
<br />
<br />
<br />
<br />
<br />
 ]]></content:encoded>
			<category domain="https://www.statalist.org/forums/forum/general-stata-discussion/general">General</category>
			<dc:creator>Noah Spencer</dc:creator>
			<guid isPermaLink="true">https://www.statalist.org/forums/forum/general-stata-discussion/general/1786093-stata-18-graph-is-glitchy-compared-to-stata-17</guid>
		</item>
		<item>
			<title>import fred stops working randomly</title>
			<link>https://www.statalist.org/forums/forum/general-stata-discussion/general/1786087-import-fred-stops-working-randomly</link>
			<pubDate>Thu, 14 May 2026 15:10:50 GMT</pubDate>
			<description>Hello, 
 
I am using StataNow 19.5, MP. 
 
I am automating the import of various series from FRED like treasury yields or labor market indicators....</description>
			<content:encoded><![CDATA[<span style="font-size:12px">Hello,<br />
<br />
I am using StataNow 19.5, MP.<br />
<br />
I am automating the import of various series from FRED like treasury yields or labor market indicators. For some reason the import fred command stops working randomly and gives me the &quot;I/O error&quot;. Here's my code:<br />

<div class="bbcode_container">
	<div class="bbcode_description">Code:</div>
	<pre class="bbcode_code">import fred DGS1MO DGS3MO DGS6MO DGS1 DGS2 DGS3 DGS5 DGS7 DGS10 DGS20 DGS30, clear

Summary
--------------------------------------------------------------------------------------------------------------
Series ID                    Nobs    Date range                Frequency
--------------------------------------------------------------------------------------------------------------
DGS1MO                       6197    2001-07-31 to 2026-05-12  Daily
DGS3MO                       11173   1981-09-01 to 2026-05-12  Daily
DGS6MO                       11173   1981-09-01 to 2026-05-12  Daily
DGS1                         16075   1962-01-02 to 2026-05-12  Daily
DGS2                         12483   1976-06-01 to 2026-05-12  Daily
DGS3                         16075   1962-01-02 to 2026-05-12  Daily
DGS5                         16075   1962-01-02 to 2026-05-12  Daily
DGS7                         14205   1969-07-01 to 2026-05-12  Daily
DGS10                        16075   1962-01-02 to 2026-05-12  Daily
DGS20                        14386   1962-01-02 to 2026-05-12  Daily
DGS30                        12305   1977-02-15 to 2026-05-12  Daily
--------------------------------------------------------------------------------------------------------------
# of series imported: 11
   highest frequency: Daily
    lowest frequency: Daily
r; t=16.58 10:44:29

import fred DGS1MO DGS3MO DGS6MO DGS1 DGS2 DGS3 DGS5 DGS7 DGS10 DGS20 DGS30, clear
I/O error
r(691); t=0.20 10:44:51

import fred LNU01000000 LNU02000000 LNU03000000 UNRATENSA PAYNSA, clear
I/O error
r(691); t=12.41 11:00:28</pre>
</div></span>
<div class="bbcode_container">
	<div class="bbcode_description">Code:</div>
	<pre class="bbcode_code"></pre>
</div>I checked online and found that it may be an issue with Stata's Temp folder being out of space, but that is not an issue in my case (plenty of space on drive C). I wrote directly to Stata tech support about this, and they also could not provide a solution saying that this may be caused by a firewall or antivirus that is run by my work administrator. I doubt that is the issue since I am also using the 
<div class="bbcode_container">
	<div class="bbcode_description">Code:</div>
	<pre class="bbcode_code">getsymbols</pre>
</div> command to import stock price data and I have never encountered the I/O error issue with that.<br />
<br />
Has anyone run into this issue? Is there a fix to this or am I missing something?<br />
<br />
Thanks!<br />
<br />
 ]]></content:encoded>
			<category domain="https://www.statalist.org/forums/forum/general-stata-discussion/general">General</category>
			<dc:creator>Marian Manic</dc:creator>
			<guid isPermaLink="true">https://www.statalist.org/forums/forum/general-stata-discussion/general/1786087-import-fred-stops-working-randomly</guid>
		</item>
		<item>
			<title>normalize() interpreted by Stata as a variable in oaxaca_rif</title>
			<link>https://www.statalist.org/forums/forum/general-stata-discussion/general/1786081-normalize-interpreted-by-stata-as-a-variable-in-oaxaca_rif</link>
			<pubDate>Thu, 14 May 2026 09:51:00 GMT</pubDate>
			<description><![CDATA[I'm using Stata 19 on Windows 11. My dataset has 53,525 observations and 152 variables. 
I'm running an Oaxaca decomposition of gender gap in...]]></description>
			<content:encoded><![CDATA[I'm using Stata 19 on Windows 11. My dataset has 53,525 observations and 152 variables.<br />
I'm running an Oaxaca decomposition of gender gap in agriculture.<br />
Some variables of interest are groups of mutually exclusive dummy variables, for which I would like to use the normalize() option: more specifically (see code below), this concerns variables (ote_dum1-otedum8) (y2019-2023) (montagna collina pianura)<br />
When using the oaxaca command (Jann, 2008) the normalize option works smoothly, I get the results with the normalized variables coefficients.<br />
On the contrary, when using oaxaca_rif (Rios-Avila, Stata Journal 202) estimation is not performed and I receive:<br />
variable normalize not found<br />
r(111);<br />
This sounds strange, as in the Rios-Avila article he says &quot;The option normalize() and aggregation of subset of variables using the syntax ( name: varlist) are also possible&quot;.<br />
<br />
Thank you for any hint on where is the problem.<br />
Here is the code<br />
<br />

<div class="bbcode_container">
	<div class="bbcode_description">Code:</div>
	<pre class="bbcode_code">oaxaca log_rt_az_real log_capfond_real no_capfondiario log_cap_ag_fiss_real  no_capagrar_fisso //
log_cap_agr_c_real no_capagrar_circol log_ore_sal no_oresalariate log_ore_fam  inherited  //
anni_stud eta  eta2 zsva_d sau_irr_perc bio  normalize(ote_dum1 ote_dum2 ote_dum3  ote_dum4//
ote_dum5 ote_dum6 ote_dum7 ote_dum8)  normalize(montagna collina pianura) Abruzzo Alto_Adige//
Basilicata Calabria Campania Emilia_Romagna Friuli_VG Lazio Liguria Lombardia Marche Molise //
Puglia Sardegna Sicilia Toscana Trentino Umbria Valle_Aosta Veneto normalize(y2019 y2020 y2021//
y2022 y2023) , by (fem) vce (robust) noisily relax

oaxaca_rif  log_rt_az_real log_capfond_real no_capfondiario log_cap_ag_fiss_real  no_capagrar_fisso //
log_cap_agr_c_real no_capagrar_circol log_ore_sal no_oresalariate log_ore_fam  inherited //
anni_stud eta  eta2 zsva_d sau_irr_perc bio  normalize(ote_dum1 ote_dum2 ote_dum3  ote_dum4 //
ote_dum5 ote_dum6 ote_dum7 ote_dum8)  normalize(montagna collina pianura) Abruzzo Alto_Adige//
Basilicata Calabria Campania Emilia_Romagna Friuli_VG Lazio Liguria Lombardia Marche Molise //
Puglia Sardegna Sicilia Toscana Trentino Umbria Valle_Aosta Veneto normalize(y2019 y2020 y2021 //
y2022 y2023), by (fem) rif(q (.25)) noisily relax</pre>
</div> ]]></content:encoded>
			<category domain="https://www.statalist.org/forums/forum/general-stata-discussion/general">General</category>
			<dc:creator>Alessandro Corsi</dc:creator>
			<guid isPermaLink="true">https://www.statalist.org/forums/forum/general-stata-discussion/general/1786081-normalize-interpreted-by-stata-as-a-variable-in-oaxaca_rif</guid>
		</item>
		<item>
			<title>Imputation conditional on simultaneous missing values</title>
			<link>https://www.statalist.org/forums/forum/general-stata-discussion/general/1786075-imputation-conditional-on-simultaneous-missing-values</link>
			<pubDate>Wed, 13 May 2026 13:59:27 GMT</pubDate>
			<description>Hello, 
 
I have a 13-variable dataset (factor model) that is split into (factor 1: v1-v9) 9 and (factor 2: v10-v13) 4 variables, respectively. If a...</description>
			<content:encoded><![CDATA[Hello,<br />
<br />
I have a 13-variable dataset (factor model) that is split into (factor 1: v1-v9) 9 and (factor 2: v10-v13) 4 variables, respectively. If a subject has missing values (MV) in all 13 variables is excluded. If a subject has up to 2 MV on factor 1 or 1 MV on factor 2, then the mean score for their factor is imputed on the missing question. If a subject has more than 2 MV on factor 1 and 1 MV in factor 2, is excluded. I've been looking at approaches using multiple imputation but wondering how can I code that at the same time (i.e., as one big rule or sequence of logical  rules) in Stata? Thanks in advance!]]></content:encoded>
			<category domain="https://www.statalist.org/forums/forum/general-stata-discussion/general">General</category>
			<dc:creator>Miguel Simancas</dc:creator>
			<guid isPermaLink="true">https://www.statalist.org/forums/forum/general-stata-discussion/general/1786075-imputation-conditional-on-simultaneous-missing-values</guid>
		</item>
		<item>
			<title>Merging yearly datasets into one panel dataset</title>
			<link>https://www.statalist.org/forums/forum/general-stata-discussion/general/1786058-merging-yearly-datasets-into-one-panel-dataset</link>
			<pubDate>Mon, 11 May 2026 19:22:12 GMT</pubDate>
			<description>Hello all, 
 
I am currently writing a research project where I investigate the effects of parental leave on employment. For this I have access to...</description>
			<content:encoded><![CDATA[Hello all,<br />
<br />
I am currently writing a research project where I investigate the effects of parental leave on employment. For this I have access to yearly data regarding the variables of interest.<br />
However, I can only download the data per year which makes merging/appending a bit tricky. I tried to append the first year (2008) with the other years but it got very messsy. <br />
Every survey year has an unique variable code per category e.g. income2011 to denote income in 2011 or workinghours2012 to denote the amount of hours worked per week in 2012 and so on for every variable of interest from 2008-2025. When I appended the datasets I eventually got a dataset with 5000 different variables. Does someone have an idea how to merge these datasets into one clean document? So for example I would prefer to merge all income variables per year into one income variable. I pasted a part of the code I used below and I am using Stata 18.5 on windows.<br />
<br />
I hope someone can help me,<br />
Diederick<br />
<br />
<br />
use cw12e_EN_1.0p.dta, clear<br />
gen wave = 2012<br />
tempfile d12<br />
save `d12'<br />
<br />
**************************************************  **<br />
* 2013<br />
**************************************************  **<br />
use cw13f_EN_1.0p.dta, clear<br />
gen wave = 2013<br />
tempfile d13<br />
save `d13'<br />
<br />
**************************************************  **<br />
* 2014<br />
**************************************************  **<br />
use cw14g_EN_1.0p.dta, clear<br />
gen wave = 2014<br />
tempfile d14<br />
save `d14'<br />
<br />
use `d12', clear<br />
append using `d13'<br />
append using `d14'<br />
 ]]></content:encoded>
			<category domain="https://www.statalist.org/forums/forum/general-stata-discussion/general">General</category>
			<dc:creator>Diederick Heijne</dc:creator>
			<guid isPermaLink="true">https://www.statalist.org/forums/forum/general-stata-discussion/general/1786058-merging-yearly-datasets-into-one-panel-dataset</guid>
		</item>
		<item>
			<title>Sequential survival analysis with imputed panel data</title>
			<link>https://www.statalist.org/forums/forum/general-stata-discussion/general/1786049-sequential-survival-analysis-with-imputed-panel-data</link>
			<pubDate>Sat, 09 May 2026 22:23:20 GMT</pubDate>
			<description><![CDATA[Hello, 
 
I'm trying to do sequential cox proportional hazards models (5 total using one outcome and then another 5 using another outcome, so 10...]]></description>
			<content:encoded><![CDATA[Hello,<br />
<br />
I'm trying to do sequential cox proportional hazards models (5 total using one outcome and then another 5 using another outcome, so 10 total) using imputed data (5 imputations), but it structured as panel data and split into 2-year time blocks using stsplit. There's approximately ~80 variables and 2.4 million observations in the analytic sample, I imputed the variables with most missing data, and there's ~600 events for the first set of 5 models and ~2200 for the other set of 5 models. When I run the models, I get over inflated p-values (0.4-0.9) and very wide confidence intervals. I have a hunch as to why the p-values are so high and the confidence intervals so wide, but I could use some input from others on if it's because of the type of model I'm using given the number of variables, events, and sample size or if it's for another reason. Additionally, these models are clustered at the state-level. Appreciate any insight you all can provide.]]></content:encoded>
			<category domain="https://www.statalist.org/forums/forum/general-stata-discussion/general">General</category>
			<dc:creator>Leah Greene</dc:creator>
			<guid isPermaLink="true">https://www.statalist.org/forums/forum/general-stata-discussion/general/1786049-sequential-survival-analysis-with-imputed-panel-data</guid>
		</item>
	</channel>
</rss>
