merge and reshape survey data (with ordinal and numerical variables) to long data

Fa Anaira

Join Date: Dec 2021
Posts: 11

merge and reshape survey data (with ordinal and numerical variables) to long data

22 Sep 2022, 14:29

Hello and thank you for reading -

I am dealing with survey data from 7 years and I need to merge them using the participants ID (nomem_encr) as the key variable. But doing so I render a wide data set. For my Fixed effect analysis I need long data. Is it possible to add the other data sets with the same content of questions just named differently so that it becomes a long data?

The variables for 2021 are ch21n001 ch21n002 and so on and for 2020 they are ch20m001 ch20m002 so following the same logic and order

When I appended the data and tried to -reshape long ch, i(nomem_encr) j(year) string - it showed an error saying the number of observations are not identical.

Excuse me for maybe not using the right format here in this forum this is my first question

Thank you so much in advance

dataex

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(nomem_encr ch21n_m ch21n001 ch21n002 ch21n003 ch21n004 ch21n005 ch21n006 ch21n007 ch21n008 ch21n009 ch21n010 ch21n011 ch21n012 ch21n013 ch21n014 ch21n015 ch21n016 ch21n017 ch21n018 ch21n020 ch21n021 ch21n022 ch21n023)
800009 202111 1 66 . 2 2 . 7 7 . . 1 1 5 1 4 177  97 1 1 2 1 1
800015 202111 1 59 1 3 3 7 5 . . . 1 1 4 1 4 169  77 1 1 1 1 1
800058 202111 2 24 0 3 4 8 8 . . . 3 4 4 4 2 168  71 2 1 3 4 1
800085 202112 1 44 1 3 4 6 5 . . . . . . . .   .   . . . . . .
800100 202111 2 30 1 5 5 5 5 . . . 2 2 4 1 6 160  65 2 1 1 1 1
800119 202111 2 71 0 3 3 . . 7 7 . 3 3 5 3 4 170  90 2 3 3 3 1
800127 202111 2 38 1 1 1 0 0 . . . 6 5 2 5 1 163 115 1 5 5 5 1
800131 202111 2 67 0 3 5 . 7 7 . . 2 2 5 2 4 166  62 1 1 1 1 1
800161 202112 1 51 1 4 4 9 8 . . . 1 1 4 2 5 185  89 1 1 1 1 1
end
label values ch21n001 ch21n001
label def ch21n001 1 "male", modify
label def ch21n001 2 "female", modify
label values ch21n003 ch21n003
label def ch21n003 0 "has no paid job", modify
label def ch21n003 1 "has paid job", modify
label values ch21n004 ch21n004
label def ch21n004 1 "poor", modify
label def ch21n004 2 "moderate", modify
label def ch21n004 3 "good", modify
label def ch21n004 4 "very good", modify
label def ch21n004 5 "excellent", modify
label values ch21n005 ch21n005
label def ch21n005 1 "considerably poorer", modify
label def ch21n005 2 "somewhat poorer", modify
label def ch21n005 3 "the same", modify
label def ch21n005 4 "somewhat better", modify
label def ch21n005 5 "considerably better", modify
label values ch21n006 ch21n006
label def ch21n006 0 "no chance at all", modify
label values ch21n007 ch21n007
label def ch21n007 0 "no chance at all", modify
label values ch21n008 ch21n008
label values ch21n009 ch21n009
label values ch21n010 ch21n010
label values ch21n011 ch21n011
label def ch21n011 1 "never", modify
label def ch21n011 2 "seldom", modify
label def ch21n011 3 "sometimes", modify
label def ch21n011 6 "continuously", modify
label values ch21n012 ch21n012
label def ch21n012 1 "never", modify
label def ch21n012 2 "seldom", modify
label def ch21n012 3 "sometimes", modify
label def ch21n012 4 "often", modify
label def ch21n012 5 "mostly", modify
label values ch21n013 ch21n013
label def ch21n013 2 "seldom", modify
label def ch21n013 4 "often", modify
label def ch21n013 5 "mostly", modify
label values ch21n014 ch21n014
label def ch21n014 1 "never", modify
label def ch21n014 2 "seldom", modify
label def ch21n014 3 "sometimes", modify
label def ch21n014 4 "often", modify
label def ch21n014 5 "mostly", modify
label values ch21n015 ch21n015
label def ch21n015 1 "never", modify
label def ch21n015 2 "seldom", modify
label def ch21n015 4 "often", modify
label def ch21n015 5 "mostly", modify
label def ch21n015 6 "continuously", modify
label values ch21n018 ch21n018
label def ch21n018 1 "yes", modify
label def ch21n018 2 "no", modify
label values ch21n020 ch21n020
label def ch21n020 1 "not at all", modify
label def ch21n020 3 "a bit", modify
label def ch21n020 5 "very much", modify
label values ch21n021 ch21n021
label def ch21n021 1 "not at all", modify
label def ch21n021 2 "hardly", modify
label def ch21n021 3 "a bit", modify
label def ch21n021 5 "very much", modify
label values ch21n022 ch21n022
label def ch21n022 1 "not at all", modify
label def ch21n022 3 "a bit", modify
label def ch21n022 4 "quite a lot", modify
label def ch21n022 5 "very much", modify
label values ch21n023 ch21n023
label def ch21n023 1 "without any trouble", modify

------------------ copy up to and including the previous line ------------------

Listed 9 out of 9 observations

The data for 2020 looks like this

. dataex

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(nomem_encr ch20m_m ch20m001 ch20m002 ch20m003 ch20m004 ch20m005 ch20m006 ch20m007 ch20m008 ch20m009 ch20m011 ch20m012 ch20m013 ch20m014 ch20m015 ch20m016 ch20m017 ch20m018) str244 ch20m019 double(ch20m020 ch20m021 ch20m022 ch20m023 ch20m024)
800009 202011 1 65 1 3 3  8  7 7 . 1 1 6 1 3 177  95 1 "Slokdarmproblemen"                              1 1 1 1 1
800015 202011 1 58 1 3 3  8  5 . . 1 1 5 1 4 169  78 1 "Hoge bloeddruk"                                 1 1 1 1 1
800057 202012 1 45 1 4 2  5  5 . . 1 1 5 3 4 198 105 2 " "                                              1 1 1 1 1
800058 202011 2 23 0 3 4  8  8 . . 3 3 5 4 3 168  72 2 " "                                              4 2 5 1 1
800100 202012 2 29 1 5 5 10 10 . . 1 1 6 1 6 160  70 2 " "                                              1 1 1 1 1
800119 202012 2 70 0 3 2  .  7 6 5 1 1 4 1 4 170  90 2 " "                                              2 2 2 1 1
800127 202011 2 37 1 1 2  1  0 . . 4 5 2 5 2 163   1 1 "Bloedziekte"                                    5 5 3 1 1
800131 202011 2 66 0 3 3  . 10 7 . 3 1 4 1 5 166  62 1 "astma"                                          1 1 1 1 2
800161 202011 1 50 1 4 4  8  7 . . 1 1 5 1 5 185  86 1 "artrose linker knie na kruisbandletsel in 1988" 2 1 1 1 1
end
label values ch20m004 ch20m004
label def ch20m004 1 "poor", modify
label def ch20m004 3 "good", modify
label def ch20m004 4 "very good", modify
label def ch20m004 5 "excellent", modify
label values ch20m005 ch20m005
label def ch20m005 2 "somewhat poorer", modify
label def ch20m005 3 "the same", modify
label def ch20m005 4 "somewhat better", modify
label def ch20m005 5 "considerably better", modify
label values ch20m006 ch20m006
label def ch20m006 10 "absolutely certain", modify
label values ch20m007 ch20m007
label def ch20m007 0 "no chance at all", modify
label def ch20m007 10 "absolutely certain", modify
label values ch20m008 ch20m008
label values ch20m009 ch20m009
label values ch20m011 ch20m011
label def ch20m011 1 "never", modify
label def ch20m011 3 "sometimes", modify
label def ch20m011 4 "often", modify
label values ch20m012 ch20m012
label def ch20m012 1 "never", modify
label def ch20m012 3 "sometimes", modify
label def ch20m012 5 "mostly", modify
label values ch20m013 ch20m013
label def ch20m013 2 "seldom", modify
label def ch20m013 4 "often", modify
label def ch20m013 5 "mostly", modify
label def ch20m013 6 "continuously", modify
label values ch20m014 ch20m014
label def ch20m014 1 "never", modify
label def ch20m014 3 "sometimes", modify
label def ch20m014 4 "often", modify
label def ch20m014 5 "mostly", modify
label values ch20m015 ch20m015
label def ch20m015 2 "seldom", modify
label def ch20m015 3 "sometimes", modify
label def ch20m015 4 "often", modify
label def ch20m015 5 "mostly", modify
label def ch20m015 6 "continuously", modify
label values ch20m018 ch20m018
label def ch20m018 1 "yes", modify
label def ch20m018 2 "no", modify
label values ch20m020 ch20m020
label def ch20m020 1 "not at all", modify
label def ch20m020 2 "hardly", modify
label def ch20m020 4 "quite a lot", modify
label def ch20m020 5 "very much", modify
label values ch20m021 ch20m021
label def ch20m021 1 "not at all", modify
label def ch20m021 2 "hardly", modify
label def ch20m021 5 "very much", modify
label values ch20m022 ch20m022
label def ch20m022 1 "not at all", modify
label def ch20m022 2 "hardly", modify
label def ch20m022 3 "a bit", modify
label def ch20m022 5 "very much", modify
label values ch20m023 ch20m023
label def ch20m023 1 "without any trouble", modify
label values ch20m024 ch20m024
label def ch20m024 1 "without any trouble", modify
label def ch20m024 2 "with some trouble", modify

------------------ copy up to and including the previous line ------------------

Listed 9 out of 9 observations

.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30189
#2

22 Sep 2022, 15:50

Code:

use 2021_data, clear rename ch21* * gen year = 2021 tempfile holding save `holding' use 2020_data, clear gen year = 2020 rename ch20* * append using `holding'

-merge- is not a good idea in this situation; this is really a job for -append-. The problem is that the data sets have different variable names because the year has been made part of the variable name. So you have to -rename- the variables. The easiest way is to just chop off the ch20 or ch21 prefix. If the ch part of the name is actually meaningful and needs to be preserved, then -rename ch20* ch*- and the analogous for 21 will do that.

What the code shown here doesn't do, but you should do, is verify that the variables with corresponding names (e.g. ch20m001 and ch21m001) in the two data sets are actually the responses to the same (or corresponding) questions in the survey, and that the responses are coded the same way in both data sets.

Last edited by Clyde Schechter; 22 Sep 2022, 15:54.
Comment
Fa Anaira

Join Date: Dec 2021

Posts: 11
#3

22 Sep 2022, 17:41

Thank you very much. I used exactly your code and I was able to append the data. Now the questions are named m001 etc for all of 2021 and n001 etc for 2020 But to be panel data is is useful to have the id's with the corresponding answers listed like :

nomem_encr 001 002

80009 m001 m002

80009 n001 n002

80052 m001 m002

80052 n001 n002

Right now I still have it in a format in which 80009 appears two times with no observation for the other year

nomem_encr m002 m002 year n001 n002

80009 answer answer 2021 - -

80052 answer answer 2021 - -

80009 - - 2020 answer answer

90052 - - 2020 answer answer

-reshape- says that the data is wide already and "year" already exists
When I use -sort- it does not solve the issue rows do not include the answers given in the years as described above

Unfortunately since I now appended the data its too big to use dataex

Last edited by Fa Anaira; 22 Sep 2022, 17:59.
Comment

nomem_encr	001	002
80009	m001	m002
80009	n001	n002
80052	m001	m002
80052	n001	n002

nomem_encr	m002	m002	year	n001	n002
80009	answer	answer	2021	-	-
80052	answer	answer	2021	-	-
80009	-	-	2020	answer	answer
90052	-	-	2020	answer	answer

Announcement

merge and reshape survey data (with ordinal and numerical variables) to long data

Comment

Comment