Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reshape chaotic text data to panel data

    Hi, I have the following text data with an unbalanced year. I want to reshape the text data into panel data. However, I have trouble dealing with the unbalanced original data. Any suggestion?


    Original format.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str1 ID str48 TXT
    "A" "2022(136)2021(145)2020(147)2019(89)"             
    "B" "2019(234)2018(177)2017(187)2016(244)"            
    "C" "2020(153)2019(184)2018(213)"                     
    "D" "2021(10)2020(12)2019(12)2018(18)2017(13)2016(14)"
    end

    The panel data that I want.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str1 ID int(Year Var1)
    "A" 2022 136
    "A" 2021 145
    "A" 2020 147
    "A" 2019  89
    "B" 2019 234
    "B" 2018 177
    "B" 2017 187
    "B" 2016 244
    "C" 2020 153
    "C" 2019 184
    "C" 2018 213
    "D" 2021  10
    "D" 2020  12
    "D" 2019  12
    "D" 2018  18
    "D" 2017  13
    "D" 2016  14
    end

  • #2
    I believe this does what you want:
    Code:
    // Create pairs of year/var strings
    replace TXT = subinstr(TXT,"(", " ", .)  
    split TXT, parse( ")" )
    // Obtain year and var as numbers
    forval i = 1/`=r(nvars)' {
       gen int year`i' = real(word(TXT`i',1))
       // "1" in "var1" could cause problems so different name
       gen YourVar`i' = real(word(TXT`i',2))
    }
    drop TXT*  // just a nuisance now
    //
    reshape long year YourVar, i(ID) j(j)
    drop if missing(year)
    drop j

    Comment

    Working...
    X