Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • merge and reshape survey data (with ordinal and numerical variables) to long data

    Hello and thank you for reading -

    I am dealing with survey data from 7 years and I need to merge them using the participants ID (nomem_encr) as the key variable. But doing so I render a wide data set. For my Fixed effect analysis I need long data. Is it possible to add the other data sets with the same content of questions just named differently so that it becomes a long data?

    The variables for 2021 are ch21n001 ch21n002 and so on and for 2020 they are ch20m001 ch20m002 so following the same logic and order

    When I appended the data and tried to -reshape long ch, i(nomem_encr) j(year) string - it showed an error saying the number of observations are not identical.

    Excuse me for maybe not using the right format here in this forum this is my first question

    Thank you so much in advance

    dataex

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(nomem_encr ch21n_m ch21n001 ch21n002 ch21n003 ch21n004 ch21n005 ch21n006 ch21n007 ch21n008 ch21n009 ch21n010 ch21n011 ch21n012 ch21n013 ch21n014 ch21n015 ch21n016 ch21n017 ch21n018 ch21n020 ch21n021 ch21n022 ch21n023)
    800009 202111 1 66 . 2 2 . 7 7 . . 1 1 5 1 4 177  97 1 1 2 1 1
    800015 202111 1 59 1 3 3 7 5 . . . 1 1 4 1 4 169  77 1 1 1 1 1
    800058 202111 2 24 0 3 4 8 8 . . . 3 4 4 4 2 168  71 2 1 3 4 1
    800085 202112 1 44 1 3 4 6 5 . . . . . . . .   .   . . . . . .
    800100 202111 2 30 1 5 5 5 5 . . . 2 2 4 1 6 160  65 2 1 1 1 1
    800119 202111 2 71 0 3 3 . . 7 7 . 3 3 5 3 4 170  90 2 3 3 3 1
    800127 202111 2 38 1 1 1 0 0 . . . 6 5 2 5 1 163 115 1 5 5 5 1
    800131 202111 2 67 0 3 5 . 7 7 . . 2 2 5 2 4 166  62 1 1 1 1 1
    800161 202112 1 51 1 4 4 9 8 . . . 1 1 4 2 5 185  89 1 1 1 1 1
    end
    label values ch21n001 ch21n001
    label def ch21n001 1 "male", modify
    label def ch21n001 2 "female", modify
    label values ch21n003 ch21n003
    label def ch21n003 0 "has no paid job", modify
    label def ch21n003 1 "has paid job", modify
    label values ch21n004 ch21n004
    label def ch21n004 1 "poor", modify
    label def ch21n004 2 "moderate", modify
    label def ch21n004 3 "good", modify
    label def ch21n004 4 "very good", modify
    label def ch21n004 5 "excellent", modify
    label values ch21n005 ch21n005
    label def ch21n005 1 "considerably poorer", modify
    label def ch21n005 2 "somewhat poorer", modify
    label def ch21n005 3 "the same", modify
    label def ch21n005 4 "somewhat better", modify
    label def ch21n005 5 "considerably better", modify
    label values ch21n006 ch21n006
    label def ch21n006 0 "no chance at all", modify
    label values ch21n007 ch21n007
    label def ch21n007 0 "no chance at all", modify
    label values ch21n008 ch21n008
    label values ch21n009 ch21n009
    label values ch21n010 ch21n010
    label values ch21n011 ch21n011
    label def ch21n011 1 "never", modify
    label def ch21n011 2 "seldom", modify
    label def ch21n011 3 "sometimes", modify
    label def ch21n011 6 "continuously", modify
    label values ch21n012 ch21n012
    label def ch21n012 1 "never", modify
    label def ch21n012 2 "seldom", modify
    label def ch21n012 3 "sometimes", modify
    label def ch21n012 4 "often", modify
    label def ch21n012 5 "mostly", modify
    label values ch21n013 ch21n013
    label def ch21n013 2 "seldom", modify
    label def ch21n013 4 "often", modify
    label def ch21n013 5 "mostly", modify
    label values ch21n014 ch21n014
    label def ch21n014 1 "never", modify
    label def ch21n014 2 "seldom", modify
    label def ch21n014 3 "sometimes", modify
    label def ch21n014 4 "often", modify
    label def ch21n014 5 "mostly", modify
    label values ch21n015 ch21n015
    label def ch21n015 1 "never", modify
    label def ch21n015 2 "seldom", modify
    label def ch21n015 4 "often", modify
    label def ch21n015 5 "mostly", modify
    label def ch21n015 6 "continuously", modify
    label values ch21n018 ch21n018
    label def ch21n018 1 "yes", modify
    label def ch21n018 2 "no", modify
    label values ch21n020 ch21n020
    label def ch21n020 1 "not at all", modify
    label def ch21n020 3 "a bit", modify
    label def ch21n020 5 "very much", modify
    label values ch21n021 ch21n021
    label def ch21n021 1 "not at all", modify
    label def ch21n021 2 "hardly", modify
    label def ch21n021 3 "a bit", modify
    label def ch21n021 5 "very much", modify
    label values ch21n022 ch21n022
    label def ch21n022 1 "not at all", modify
    label def ch21n022 3 "a bit", modify
    label def ch21n022 4 "quite a lot", modify
    label def ch21n022 5 "very much", modify
    label values ch21n023 ch21n023
    label def ch21n023 1 "without any trouble", modify
    ------------------ copy up to and including the previous line ------------------

    Listed 9 out of 9 observations



    The data for 2020 looks like this

    . dataex

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(nomem_encr ch20m_m ch20m001 ch20m002 ch20m003 ch20m004 ch20m005 ch20m006 ch20m007 ch20m008 ch20m009 ch20m011 ch20m012 ch20m013 ch20m014 ch20m015 ch20m016 ch20m017 ch20m018) str244 ch20m019 double(ch20m020 ch20m021 ch20m022 ch20m023 ch20m024)
    800009 202011 1 65 1 3 3  8  7 7 . 1 1 6 1 3 177  95 1 "Slokdarmproblemen"                              1 1 1 1 1
    800015 202011 1 58 1 3 3  8  5 . . 1 1 5 1 4 169  78 1 "Hoge bloeddruk"                                 1 1 1 1 1
    800057 202012 1 45 1 4 2  5  5 . . 1 1 5 3 4 198 105 2 " "                                              1 1 1 1 1
    800058 202011 2 23 0 3 4  8  8 . . 3 3 5 4 3 168  72 2 " "                                              4 2 5 1 1
    800100 202012 2 29 1 5 5 10 10 . . 1 1 6 1 6 160  70 2 " "                                              1 1 1 1 1
    800119 202012 2 70 0 3 2  .  7 6 5 1 1 4 1 4 170  90 2 " "                                              2 2 2 1 1
    800127 202011 2 37 1 1 2  1  0 . . 4 5 2 5 2 163   1 1 "Bloedziekte"                                    5 5 3 1 1
    800131 202011 2 66 0 3 3  . 10 7 . 3 1 4 1 5 166  62 1 "astma"                                          1 1 1 1 2
    800161 202011 1 50 1 4 4  8  7 . . 1 1 5 1 5 185  86 1 "artrose linker knie na kruisbandletsel in 1988" 2 1 1 1 1
    end
    label values ch20m004 ch20m004
    label def ch20m004 1 "poor", modify
    label def ch20m004 3 "good", modify
    label def ch20m004 4 "very good", modify
    label def ch20m004 5 "excellent", modify
    label values ch20m005 ch20m005
    label def ch20m005 2 "somewhat poorer", modify
    label def ch20m005 3 "the same", modify
    label def ch20m005 4 "somewhat better", modify
    label def ch20m005 5 "considerably better", modify
    label values ch20m006 ch20m006
    label def ch20m006 10 "absolutely certain", modify
    label values ch20m007 ch20m007
    label def ch20m007 0 "no chance at all", modify
    label def ch20m007 10 "absolutely certain", modify
    label values ch20m008 ch20m008
    label values ch20m009 ch20m009
    label values ch20m011 ch20m011
    label def ch20m011 1 "never", modify
    label def ch20m011 3 "sometimes", modify
    label def ch20m011 4 "often", modify
    label values ch20m012 ch20m012
    label def ch20m012 1 "never", modify
    label def ch20m012 3 "sometimes", modify
    label def ch20m012 5 "mostly", modify
    label values ch20m013 ch20m013
    label def ch20m013 2 "seldom", modify
    label def ch20m013 4 "often", modify
    label def ch20m013 5 "mostly", modify
    label def ch20m013 6 "continuously", modify
    label values ch20m014 ch20m014
    label def ch20m014 1 "never", modify
    label def ch20m014 3 "sometimes", modify
    label def ch20m014 4 "often", modify
    label def ch20m014 5 "mostly", modify
    label values ch20m015 ch20m015
    label def ch20m015 2 "seldom", modify
    label def ch20m015 3 "sometimes", modify
    label def ch20m015 4 "often", modify
    label def ch20m015 5 "mostly", modify
    label def ch20m015 6 "continuously", modify
    label values ch20m018 ch20m018
    label def ch20m018 1 "yes", modify
    label def ch20m018 2 "no", modify
    label values ch20m020 ch20m020
    label def ch20m020 1 "not at all", modify
    label def ch20m020 2 "hardly", modify
    label def ch20m020 4 "quite a lot", modify
    label def ch20m020 5 "very much", modify
    label values ch20m021 ch20m021
    label def ch20m021 1 "not at all", modify
    label def ch20m021 2 "hardly", modify
    label def ch20m021 5 "very much", modify
    label values ch20m022 ch20m022
    label def ch20m022 1 "not at all", modify
    label def ch20m022 2 "hardly", modify
    label def ch20m022 3 "a bit", modify
    label def ch20m022 5 "very much", modify
    label values ch20m023 ch20m023
    label def ch20m023 1 "without any trouble", modify
    label values ch20m024 ch20m024
    label def ch20m024 1 "without any trouble", modify
    label def ch20m024 2 "with some trouble", modify
    ------------------ copy up to and including the previous line ------------------

    Listed 9 out of 9 observations

    .

  • #2
    Code:
    use 2021_data, clear
    rename ch21* *
    gen year = 2021
    tempfile holding
    save `holding'
    
    use 2020_data, clear
    gen year = 2020
    rename ch20* *
    
    append using `holding'
    -merge- is not a good idea in this situation; this is really a job for -append-. The problem is that the data sets have different variable names because the year has been made part of the variable name. So you have to -rename- the variables. The easiest way is to just chop off the ch20 or ch21 prefix. If the ch part of the name is actually meaningful and needs to be preserved, then -rename ch20* ch*- and the analogous for 21 will do that.

    What the code shown here doesn't do, but you should do, is verify that the variables with corresponding names (e.g. ch20m001 and ch21m001) in the two data sets are actually the responses to the same (or corresponding) questions in the survey, and that the responses are coded the same way in both data sets.
    Last edited by Clyde Schechter; 22 Sep 2022, 15:54.

    Comment


    • #3
      Thank you very much. I used exactly your code and I was able to append the data. Now the questions are named m001 etc for all of 2021 and n001 etc for 2020 But to be panel data is is useful to have the id's with the corresponding answers listed like :

      nomem_encr 001 002
      80009 m001 m002
      80009 n001 n002
      80052 m001 m002
      80052 n001 n002


      Right now I still have it in a format in which 80009 appears two times with no observation for the other year
      nomem_encr m002 m002 year n001 n002
      80009 answer answer 2021 - -
      80052 answer answer 2021 - -
      80009 - - 2020 answer answer
      90052 - - 2020 answer answer


      -reshape- says that the data is wide already and "year" already exists
      When I use -sort- it does not solve the issue rows do not include the answers given in the years as described above

      Unfortunately since I now appended the data its too big to use dataex
      Last edited by Fa Anaira; 22 Sep 2022, 17:59.

      Comment

      Working...
      X