Dear Statalist,
I am happy to announce the release of unicefdata v2.2.0, a Stata module for downloading UNICEF indicator data directly from the UNICEF SDMX Data Warehouse.
unicefdata is part of the unicefData trilingual library — the same indicators, the same API, the same metadata, available in R, Python, and Stata. If you work across languages or collaborate with people who do, everyone gets the same data with the same command logic.
The package covers 748+ indicators across 69 dataflows, spanning child mortality, nutrition, immunization, education, WASH, child protection, HIV/AIDS, early childhood development, and more. You probably already know many of these indicators from UNICEF's data.unicef.org — this package lets you pull them into Stata with a single command.
What can you do with it?
1. Search and discover indicators without leaving Stata
. unicefdata, search(stunting) . unicefdata, search(mortality) dataflow(CME) . unicefdata, flows . unicefdata, info(CME_MRY0T4)
Indicators are organized into tiers — Tier 1 (verified and downloadable) shows by default. Use showtier2, showtier3, or showall if you want to explore further.
2. Download data — the dataflow figures itself out
. unicefdata, indicator(CME_MRY0T4) countries(ALB USA BRA) year(2015:2023) clear
You do not need to know which SDMX dataflow an indicator belongs to. The package resolves it automatically. If you do know, you can specify it with dataflow().
3. Disaggregation filters
. unicefdata, indicator(NT_ANT_HAZ_NE2) sex(_T M F) clear . unicefdata, indicator(NT_ANT_HAZ_NE2) wealth(Q1 Q5 _T) clear . unicefdata, indicator(NT_ANT_HAZ_NE2) sex(M F) wealth(Q1 Q5) residence(U R) clear
Filter by sex, wealth quintile, residence (urban/rural), age group, and maternal education. Filters work at the API level — the query downloads only what you ask for.
4. Output formats
. unicefdata, indicator(CME_MRY0T4) format(wide) clear // years as columns . unicefdata, indicator(CME_MRY0T4 CME_MRM0) format(wide_indicators) clear // indicators as columns . unicefdata, indicator(CME_MRY0T4) latest clear // most recent per country . unicefdata, indicator(CME_MRY0T4) mrv(3) clear // 3 most recent values . unicefdata, indicator(NT_ANT_HAZ_NE2) year(2015) circa clear // nearest available year
5. Self-documenting datasets
Downloaded datasets now embed provenance as Stata char characteristics — indicator codes, dataflow, version, timestamp. The data remembers where it came from, even after you save and reopen it.
6. Metadata sync
. unicefdata_sync, verbose // check metadata freshness . unicefdata_refresh_all, verbose // full refresh from UNICEF API
The package ships with YAML metadata files and warns you if they get stale (>30 days). A single command refreshes everything.
Under the hood:
The Stata package includes 63 automated tests across 16 families, covering data downloads, discovery, sync, transformations, edge cases, cross-platform consistency, error handling, and deterministic offline tests. The test suite follows Gould's (2001) certification methodology with rcof return-code verification.
Cross-platform alignment is a first-class concern. The R, Python, and Stata implementations share the same YAML metadata, the same indicator registry, and the same filtering logic. A validation pipeline compares outputs across all three languages.
Installation:
* From SSC (stable) ssc install unicefdata * From GitHub net install unicefdata, /// from("https://raw.githubusercontent.com/unicef-drp/unicefData/main/stata/ssc") replace * First-time setup (installs metadata files) unicefdata_setup, replace
Resources:
I would like to thank Kit Baum for the SSC upload. I am also grateful to Lucas Rodrigues, Yang Liu, and Karen Avanesian at UNICEF for their technical contributions and feedback, and to Yves Jaques, Alberto Sibileau, and Daniele Olivotti for designing and maintaining the UNICEF SDMX data warehouse that makes this package possible.
Best,
Joao Pedro Azevedo
I am happy to announce the release of unicefdata v2.2.0, a Stata module for downloading UNICEF indicator data directly from the UNICEF SDMX Data Warehouse.
unicefdata is part of the unicefData trilingual library — the same indicators, the same API, the same metadata, available in R, Python, and Stata. If you work across languages or collaborate with people who do, everyone gets the same data with the same command logic.
The package covers 748+ indicators across 69 dataflows, spanning child mortality, nutrition, immunization, education, WASH, child protection, HIV/AIDS, early childhood development, and more. You probably already know many of these indicators from UNICEF's data.unicef.org — this package lets you pull them into Stata with a single command.
What can you do with it?
1. Search and discover indicators without leaving Stata
. unicefdata, search(stunting) . unicefdata, search(mortality) dataflow(CME) . unicefdata, flows . unicefdata, info(CME_MRY0T4)
Indicators are organized into tiers — Tier 1 (verified and downloadable) shows by default. Use showtier2, showtier3, or showall if you want to explore further.
2. Download data — the dataflow figures itself out
. unicefdata, indicator(CME_MRY0T4) countries(ALB USA BRA) year(2015:2023) clear
You do not need to know which SDMX dataflow an indicator belongs to. The package resolves it automatically. If you do know, you can specify it with dataflow().
3. Disaggregation filters
. unicefdata, indicator(NT_ANT_HAZ_NE2) sex(_T M F) clear . unicefdata, indicator(NT_ANT_HAZ_NE2) wealth(Q1 Q5 _T) clear . unicefdata, indicator(NT_ANT_HAZ_NE2) sex(M F) wealth(Q1 Q5) residence(U R) clear
Filter by sex, wealth quintile, residence (urban/rural), age group, and maternal education. Filters work at the API level — the query downloads only what you ask for.
4. Output formats
. unicefdata, indicator(CME_MRY0T4) format(wide) clear // years as columns . unicefdata, indicator(CME_MRY0T4 CME_MRM0) format(wide_indicators) clear // indicators as columns . unicefdata, indicator(CME_MRY0T4) latest clear // most recent per country . unicefdata, indicator(CME_MRY0T4) mrv(3) clear // 3 most recent values . unicefdata, indicator(NT_ANT_HAZ_NE2) year(2015) circa clear // nearest available year
5. Self-documenting datasets
Downloaded datasets now embed provenance as Stata char characteristics — indicator codes, dataflow, version, timestamp. The data remembers where it came from, even after you save and reopen it.
6. Metadata sync
. unicefdata_sync, verbose // check metadata freshness . unicefdata_refresh_all, verbose // full refresh from UNICEF API
The package ships with YAML metadata files and warns you if they get stale (>30 days). A single command refreshes everything.
Under the hood:
The Stata package includes 63 automated tests across 16 families, covering data downloads, discovery, sync, transformations, edge cases, cross-platform consistency, error handling, and deterministic offline tests. The test suite follows Gould's (2001) certification methodology with rcof return-code verification.
Cross-platform alignment is a first-class concern. The R, Python, and Stata implementations share the same YAML metadata, the same indicator registry, and the same filtering logic. A validation pipeline compares outputs across all three languages.
Installation:
* From SSC (stable) ssc install unicefdata * From GitHub net install unicefdata, /// from("https://raw.githubusercontent.com/unicef-drp/unicefData/main/stata/ssc") replace * First-time setup (installs metadata files) unicefdata_setup, replace
Resources:
- GitHub: https://github.com/unicef-drp/unicefData
- Examples: 7 do-files covering quick start through advanced features
- R and Python versions in the same repository
- Bug reports: https://github.com/unicef-drp/unicefData/issues
I would like to thank Kit Baum for the SSC upload. I am also grateful to Lucas Rodrigues, Yang Liu, and Karen Avanesian at UNICEF for their technical contributions and feedback, and to Yves Jaques, Alberto Sibileau, and Daniele Olivotti for designing and maintaining the UNICEF SDMX data warehouse that makes this package possible.
Best,
Joao Pedro Azevedo
