Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • OPENDF: new Stata package to work with data in the Open Data Format

    Dear Statalisters,

    We developed a new open, non-proprietary, multilingual, metadata enriched, and zip-compressed data format for tabular data, the Open Data Format (ODF). It fulfills the requirements of the FAIR Guiding Principles for scientific data management and stewardship (Wilkinson et al., 2016). Data and metadata are organized in two separate files. The data is stored in CSV format, and the metadata is stored in XML. The upcoming DDI-Codebook 2.6 metadata schema serves as the basis for the specification of the XML metadata file in the ODF. More information regarding the specification can be found on Gitlab.

    To work with the Open Data Format in Stata, we developed the opendf package. It can be installed from SSC using:
    Code:
    ssc install opendf
    The package is also hosted on GitHub and can be installed using:
    Code:
    net install opendf, from (https://opendataformat.github.io/stata-package-opendf/)
    The package provides three main functions:

    With the 'opendf read' function you can read an ODF data file.
    Code:
    * Read example data file from GitHub 
     . opendf read "https://opendataformat.github.io/stata-package-opendf/example_data/soep_data.zip"
    You can display metadata for the dataset or a variable using 'opendf docu'.

    Code:
    * Display metadata for dataset
    . opendf docu
    Dataset: soep-core v38.1: bap
    Label: Data from individual questionnaires 2010
    Languages: de en
    (currently set: en)
    Description: The data were collected as part of the SOEP-Core study using the questionnaire "Living in Germany - Survey 2010 on the social situation - Personal
    questionnaire for all. This questionnaire is addressed to the individual persons in the household. A view of the survey instrument can be found here:
    https://www.diw.de/documents/dokumentenarchiv/17/diw_01.c.369781.de/soepfrabo_personen_2010.pdf
    URL: https://paneldata.org/soep-core/data/bap
    You can display metadata for the dataset or a variable using 'opendf docu'.

    Code:
    * Display metadata for variable bap87
    . opendf docu bap87
    Label: Current Health
    Description: Question: How would you describe your current health?
    URL: https://paneldata.org/soep-core/data/bap/bap87
    Variable Type: numeric
    Value Labels en:
    -2 :  Does not apply
    -1 :  No Answer
    1 :  Very good
    2 :  Good
    3 :  Satisfactory
    4 :  Poor
    5 :  Bad
    Code:
    * Save dataset as ODF
    . opendf write "new_odf_file.zip", replace
     Dataset successfully saved in opendf-format to /Users/[...]/new_odf_file.zip.
    The package requires Stata 16 and a working Python integration. For Windows it provides a function to download a portable Python installation that works with the opendf package ('opendf installpython'). The package also provides functions to generate ODF files from CSV files containing data and metadata to help data providers generating data files in ODF ('opendf csv2zip').
    To find further information on all available function arguments and other functions in the opendf package you can look into the help files.



    References:
    • Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., ... & Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific data, 3(1), 1-9.
    Questions and suggestions welcome.
Working...
X