Dear Statalisters,
We developed a new open, non-proprietary, multilingual, metadata enriched, and zip-compressed data format for tabular data, the Open Data Format (ODF). It fulfills the requirements of the FAIR Guiding Principles for scientific data management and stewardship (Wilkinson et al., 2016). Data and metadata are organized in two separate files. The data is stored in CSV format, and the metadata is stored in XML. The upcoming DDI-Codebook 2.6 metadata schema serves as the basis for the specification of the XML metadata file in the ODF. More information regarding the specification can be found on Gitlab.
To work with the Open Data Format in Stata, we developed the opendf package. It can be installed from SSC using:
The package is also hosted on GitHub and can be installed using:
The package provides three main functions:
With the 'opendf read' function you can read an ODF data file.
You can display metadata for the dataset or a variable using 'opendf docu'.
You can display metadata for the dataset or a variable using 'opendf docu'.
The package requires Stata 16 and a working Python integration. For Windows it provides a function to download a portable Python installation that works with the opendf package ('opendf installpython'). The package also provides functions to generate ODF files from CSV files containing data and metadata to help data providers generating data files in ODF ('opendf csv2zip').
To find further information on all available function arguments and other functions in the opendf package you can look into the help files.
References:
We developed a new open, non-proprietary, multilingual, metadata enriched, and zip-compressed data format for tabular data, the Open Data Format (ODF). It fulfills the requirements of the FAIR Guiding Principles for scientific data management and stewardship (Wilkinson et al., 2016). Data and metadata are organized in two separate files. The data is stored in CSV format, and the metadata is stored in XML. The upcoming DDI-Codebook 2.6 metadata schema serves as the basis for the specification of the XML metadata file in the ODF. More information regarding the specification can be found on Gitlab.
To work with the Open Data Format in Stata, we developed the opendf package. It can be installed from SSC using:
Code:
ssc install opendf
Code:
net install opendf, from (https://opendataformat.github.io/stata-package-opendf/)
With the 'opendf read' function you can read an ODF data file.
Code:
* Read example data file from GitHub . opendf read "https://opendataformat.github.io/stata-package-opendf/example_data/soep_data.zip"
Code:
* Display metadata for dataset . opendf docu Dataset: soep-core v38.1: bap Label: Data from individual questionnaires 2010 Languages: de en (currently set: en) Description: The data were collected as part of the SOEP-Core study using the questionnaire "Living in Germany - Survey 2010 on the social situation - Personal questionnaire for all. This questionnaire is addressed to the individual persons in the household. A view of the survey instrument can be found here: https://www.diw.de/documents/dokumentenarchiv/17/diw_01.c.369781.de/soepfrabo_personen_2010.pdf URL: https://paneldata.org/soep-core/data/bap
Code:
* Display metadata for variable bap87 . opendf docu bap87 Label: Current Health Description: Question: How would you describe your current health? URL: https://paneldata.org/soep-core/data/bap/bap87 Variable Type: numeric Value Labels en: -2 : Does not apply -1 : No Answer 1 : Very good 2 : Good 3 : Satisfactory 4 : Poor 5 : Bad
Code:
* Save dataset as ODF . opendf write "new_odf_file.zip", replace Dataset successfully saved in opendf-format to /Users/[...]/new_odf_file.zip.
To find further information on all available function arguments and other functions in the opendf package you can look into the help files.
References:
- Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., ... & Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific data, 3(1), 1-9.