How to locate observations according to the specific characters in their value labels?

Geralt Ji

Join Date: May 2020
Posts: 27

How to locate observations according to the specific characters in their value labels?

23 Jul 2021, 10:11

Hi statalists,

I'm currently dealing with a survey panel dataset. This dataset only simply lists the indices for each questions, sort by questions and I find it is tricky to directly analyze it. The first column is the description of questions, the 2nd the 3rd and 4th columns are the numeric answers for 2014/2015. My aim is to generate more variables (columns) for each question so that all answers for each country will be shown only in one row.

Firstly, I tried "encode Question, gen(id_question) " to converting first column into numeric values so that I can quickly find questions. But there is one more thing need to be considered that this dataset could include more questions in the future. The id of each question might be disrupted. So I wonder if there is any command that I can locate observations according to the characters in their labels and then put the values to according columns. And the description of some questions look very similar e.g. Number of ATMs per 100,00 adults / Number of ATMs per 1,000 km2. So I hope I can locate the label accurately.

Here is the dataset example:

Question	Country	Index_2014	Index_2015
Number of deposit accounts per 1,000 adults-Q1	A	1	7
Number of bank branches per 100,000 adults-Q2	A	2	8
Number of ATMs per 100,000 adults-Q3	A	3	9
Number of deposit accounts per 1,000 adults-Q1	B	4	10
Number of bank branches per 100,000 adults-Q2	B	5	11
Number of ATM2 per 1,000 km2-Q4	C	6	12

The output I expect (the columns after country column stand for each questions )

Country	deposit_2014	branch_2014	ATMadult_2014	ATMkm_2014	deposit_2015	branch_2015	ATMadult_2015	ATMkm_2015
A	1	2	3	.	7	8	9	.
B	4	5	.	.	10	11	.	.
C	.	.	.	6	.	.	.	12

Thanks in advance!

Tags: None

Ken Chui

Join Date: Aug 2014
Posts: 1058

23 Jul 2021, 11:02

I don't know if there is a way we can process "Question" and automatically create the variable names. But even there were one, it would not solve your problem of any future Question text being similar to even a finer level. That would trigger all your automatically generated variable names to change, and that'd mean all your previous analysis codes will be ruined.

I think the most stable approach is that you keep a crosswalk file between the question text and your desired variable name, like this:

Code:

clear
input str100 Question str20 Varname
"Number of deposit accounts per 1,000 adults-Q1" "deposit"
"Number of bank branches per 100,000 adults-Q2" "branch"
"Number of ATMs per 100,000 adults-Q3" "ATMadult"
"Number of ATM2 per 1,000 km2-Q4" "ATMkm"
end
save varname_crosswalk, replace

Then, we can process your data:

Code:

clear
input str100 Question     str2 Country     Index_2014     Index_2015
"Number of deposit accounts per 1,000 adults-Q1"     A     1     7
"Number of bank branches per 100,000 adults-Q2"     A     2     8
"Number of ATMs per 100,000 adults-Q3"     A     3     9
"Number of deposit accounts per 1,000 adults-Q1"     B     4     10
"Number of bank branches per 100,000 adults-Q2"     B     5     11
"Number of ATM2 per 1,000 km2-Q4"     C     6     12
end

merge m:1 Question using varname_crosswalk
drop _merge

egen item_num = group(Question)
reshape long Index_, i(Country item_num) j(Year)

reshape wide Index_ Question Varname, i(Country Year) j(item_num)

forval x = 1/4{
gsort Index_`x'
    local templab = Question`x'[1]
    label variable Index_`x' "`templab'"
    local tempvar = Varname`x'[1]
    rename Index_`x' `tempvar'
    drop Varname`x' Question`x'
}

I saw that you'd like reshape that back to wide again, check this page out to see how to carry over the variable labels: https://www.stata.com/support/faqs/d...after-reshape/. Or use something like the following, with the crosswalk file it should be each to compile this code:

Code:

order deposit branch ATMadult ATMkm
reshape wide deposit-ATMkm, i(Country) j(Year)

foreach x in 2014 2015{
    label variable ATMkm`x' "`x': Number of ATM2 per 1,000 km2-Q4"
}

Last edited by Ken Chui; 23 Jul 2021, 11:18.

Comment

Geralt Ji

Join Date: May 2020
Posts: 27

24 Jul 2021, 11:14

Originally posted by Ken Chui View Post

Code:

clear
input str100 Question str20 Varname
"Number of deposit accounts per 1,000 adults-Q1" "deposit"
"Number of bank branches per 100,000 adults-Q2" "branch"
"Number of ATMs per 100,000 adults-Q3" "ATMadult"
"Number of ATM2 per 1,000 km2-Q4" "ATMkm"
end
save varname_crosswalk, replace

Then, we can process your data:

Code:

clear
input str100 Question str2 Country Index_2014 Index_2015
"Number of deposit accounts per 1,000 adults-Q1" A 1 7
"Number of bank branches per 100,000 adults-Q2" A 2 8
"Number of ATMs per 100,000 adults-Q3" A 3 9
"Number of deposit accounts per 1,000 adults-Q1" B 4 10
"Number of bank branches per 100,000 adults-Q2" B 5 11
"Number of ATM2 per 1,000 km2-Q4" C 6 12
end

merge m:1 Question using varname_crosswalk
drop _merge

egen item_num = group(Question)
reshape long Index_, i(Country item_num) j(Year)

reshape wide Index_ Question Varname, i(Country Year) j(item_num)

forval x = 1/4{
gsort Index_`x'
local templab = Question`x'[1]
label variable Index_`x' "`templab'"
local tempvar = Varname`x'[1]
rename Index_`x' `tempvar'
drop Varname`x' Question`x'
}

Code:

order deposit branch ATMadult ATMkm
reshape wide deposit-ATMkm, i(Country) j(Year)

foreach x in 2014 2015{
label variable ATMkm`x' "`x': Number of ATM2 per 1,000 km2-Q4"
}

Thanks Ken Chui. Sorry for late reply. Your codes work perfectly and I get expected results.

Announcement

How to locate observations according to the specific characters in their value labels?

Comment

Comment