Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging yearly datasets into one panel dataset

    Hello all,

    I am currently writing a research project where I investigate the effects of parental leave on employment. For this I have access to yearly data regarding the variables of interest.
    However, I can only download the data per year which makes merging/appending a bit tricky. I tried to append the first year (2008) with the other years but it got very messsy.
    Every survey year has an unique variable code per category e.g. income2011 to denote income in 2011 or workinghours2012 to denote the amount of hours worked per week in 2012 and so on for every variable of interest from 2008-2025. When I appended the datasets I eventually got a dataset with 5000 different variables. Does someone have an idea how to merge these datasets into one clean document? So for example I would prefer to merge all income variables per year into one income variable. I pasted a part of the code I used below and I am using Stata 18.5 on windows.

    I hope someone can help me,
    Diederick


    use cw12e_EN_1.0p.dta, clear
    gen wave = 2012
    tempfile d12
    save `d12'

    ************************************************** **
    * 2013
    ************************************************** **
    use cw13f_EN_1.0p.dta, clear
    gen wave = 2013
    tempfile d13
    save `d13'

    ************************************************** **
    * 2014
    ************************************************** **
    use cw14g_EN_1.0p.dta, clear
    gen wave = 2014
    tempfile d14
    save `d14'

    use `d12', clear
    append using `d13'
    append using `d14'

  • #2
    As you have a variable identifying the wave, all you have to do is to rename variables in each dataset.

    Code:
    ************************************************** **
    * 2013
    ************************************************** **
    use cw13f_EN_1.0p.dta, clear
    gen wave = 2013
    rename (*2013) (*)
    tempfile d13
    save `d13'
    
    ************************************************** **
    * 2014
    ************************************************** **
    use cw14g_EN_1.0p.dta, clear
    gen wave = 2014
    rename (*2014) (*)
    tempfile d14
    save `d14'
    
    use `d12', clear
    gen wave = 2012
    rename (*2012) (*)
    append using `d13'
    append using `d14'

    Comment


    • #3
      A more compact version:

      Code:
      local i 5
      forval y= 13/25{
          local ++i    
          local p: "`:word `i' of `c(alpha)''"
          use cw`y'`p'_EN_1.0p.dta, clear
          tempfile d`y'
          rename (*20`y') (*)
          save `d`y''
      }
      use `d12', clear
      gen wave = 2012
      rename (*2012) (*)
      forval y= 13/25{
          append using `d`y''
      }
      EDIT: I just noticed that the suffix following the year changes in your file names. If this follows a regular pattern, the highlighted code will resolve this, i.e., "f" corresponds to 2013, "g" to 2014 and so on. Otherwise, if there is no pattern, you may have to revert to the code in #1. Here is what the modification generates:

      Code:
      local i 5
      forval y=13/20{
        local ++i
        local p "`:word `i' of `c(alpha)''"  
        di "For year 20`y', the suffix is `p'"
      }
      Res.:

      Code:
      . local i 5
      
      . 
      . forval y=13/20{
        2. 
      .   local ++i
        3. 
      .   local p "`:word `i' of `c(alpha)''"  
        4. 
      .   di "For year 20`y', the suffix is `p'"
        5. 
      . }
      For year 2013, the suffix is f
      For year 2014, the suffix is g
      For year 2015, the suffix is h
      For year 2016, the suffix is i
      For year 2017, the suffix is j
      For year 2018, the suffix is k
      For year 2019, the suffix is l
      For year 2020, the suffix is m
      Last edited by Andrew Musau; 11 May 2026, 13:54.

      Comment


      • #4
        Thank you for the response for some variables however the unqiue identifier is situated in the middle like cw12e525 and cw13f525 which indicates occupation in 2012 and 2013. How would you correct for this if most of the variables are in this format?

        Comment


        • #5
          As long as there are no name clashes, you can just keep the text preceding the first digit. Here is an example:

          Code:
          clear
          input float (wave income2013 workinghours2013 cw13f525)
          . . . .
          end 
          
          list
          foreach var of varlist *{
              rename `var' `=ustrregexra("`var'", "\d.*$", "")'
          }
          list
          Res.:

          Code:
            
          . list
          
               +---------------------------------------+
               | wave   inc~2013   wor~2013   cw13f525 |
               |---------------------------------------|
            1. |    .          .          .          . |
               +---------------------------------------+
          
          . 
          . foreach var of varlist *{
            2. 
          .     rename `var' `=ustrregexra("`var'", "\d.*$", "")'
            3. 
          . }
            (all newnames==oldnames)
          
          . 
          . list
          
               +-------------------------------+
               | wave   income   workin~s   cw |
               |-------------------------------|
            1. |    .        .          .    . |
               +-------------------------------+
          
          .

          Comment


          • #6
            I had this problem recently. I gave the data dictionaries to claude code and had it produce a do file that homogenized the variable names.

            make sure you check it.

            Another way to peel off the variables you need from each year and rename them.

            Comment


            • #7
              If all are in this format (cw12e525 and cw13f525) then you could subinstr and pull off "12e" and "13f".

              Comment

              Working...
              X