Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to locate observations according to the specific characters in their value labels?

    Hi statalists,

    I'm currently dealing with a survey panel dataset. This dataset only simply lists the indices for each questions, sort by questions and I find it is tricky to directly analyze it. The first column is the description of questions, the 2nd the 3rd and 4th columns are the numeric answers for 2014/2015. My aim is to generate more variables (columns) for each question so that all answers for each country will be shown only in one row.

    Firstly, I tried "encode Question, gen(id_question) " to converting first column into numeric values so that I can quickly find questions. But there is one more thing need to be considered that this dataset could include more questions in the future. The id of each question might be disrupted. So I wonder if there is any command that I can locate observations according to the characters in their labels and then put the values to according columns. And the description of some questions look very similar e.g. Number of ATMs per 100,00 adults / Number of ATMs per 1,000 km2. So I hope I can locate the label accurately.

    Here is the dataset example:
    Question Country Index_2014 Index_2015
    Number of deposit accounts per 1,000 adults-Q1 A 1 7
    Number of bank branches per 100,000 adults-Q2 A 2 8
    Number of ATMs per 100,000 adults-Q3 A 3 9
    Number of deposit accounts per 1,000 adults-Q1 B 4 10
    Number of bank branches per 100,000 adults-Q2 B 5 11
    Number of ATM2 per 1,000 km2-Q4 C 6 12

    The output I expect (the columns after country column stand for each questions )
    Country deposit_2014 branch_2014 ATMadult_2014 ATMkm_2014 deposit_2015 branch_2015 ATMadult_2015 ATMkm_2015
    A 1 2 3 . 7 8 9 .
    B 4 5 . . 10 11 . .
    C . . . 6 . . . 12
    Thanks in advance!

  • #2
    I don't know if there is a way we can process "Question" and automatically create the variable names. But even there were one, it would not solve your problem of any future Question text being similar to even a finer level. That would trigger all your automatically generated variable names to change, and that'd mean all your previous analysis codes will be ruined.

    I think the most stable approach is that you keep a crosswalk file between the question text and your desired variable name, like this:

    Code:
    clear
    input str100 Question str20 Varname
    "Number of deposit accounts per 1,000 adults-Q1" "deposit"
    "Number of bank branches per 100,000 adults-Q2" "branch"
    "Number of ATMs per 100,000 adults-Q3" "ATMadult"
    "Number of ATM2 per 1,000 km2-Q4" "ATMkm"
    end
    save varname_crosswalk, replace
    Then, we can process your data:

    Code:
    clear
    input str100 Question     str2 Country     Index_2014     Index_2015
    "Number of deposit accounts per 1,000 adults-Q1"     A     1     7
    "Number of bank branches per 100,000 adults-Q2"     A     2     8
    "Number of ATMs per 100,000 adults-Q3"     A     3     9
    "Number of deposit accounts per 1,000 adults-Q1"     B     4     10
    "Number of bank branches per 100,000 adults-Q2"     B     5     11
    "Number of ATM2 per 1,000 km2-Q4"     C     6     12
    end
    
    merge m:1 Question using varname_crosswalk
    drop _merge
    
    egen item_num = group(Question)
    reshape long Index_, i(Country item_num) j(Year)
    
    reshape wide Index_ Question Varname, i(Country Year) j(item_num)
    
    forval x = 1/4{
    gsort Index_`x'
        local templab = Question`x'[1]
        label variable Index_`x' "`templab'"
        local tempvar = Varname`x'[1]
        rename Index_`x' `tempvar'
        drop Varname`x' Question`x'
    }
    I saw that you'd like reshape that back to wide again, check this page out to see how to carry over the variable labels: https://www.stata.com/support/faqs/d...after-reshape/. Or use something like the following, with the crosswalk file it should be each to compile this code:

    Code:
    order deposit branch ATMadult ATMkm
    reshape wide deposit-ATMkm, i(Country) j(Year)
    
    foreach x in 2014 2015{
        label variable ATMkm`x' "`x': Number of ATM2 per 1,000 km2-Q4"
    }
    Last edited by Ken Chui; 23 Jul 2021, 11:18.

    Comment


    • #3
      Originally posted by Ken Chui View Post
      I don't know if there is a way we can process "Question" and automatically create the variable names. But even there were one, it would not solve your problem of any future Question text being similar to even a finer level. That would trigger all your automatically generated variable names to change, and that'd mean all your previous analysis codes will be ruined.

      I think the most stable approach is that you keep a crosswalk file between the question text and your desired variable name, like this:

      Code:
      clear
      input str100 Question str20 Varname
      "Number of deposit accounts per 1,000 adults-Q1" "deposit"
      "Number of bank branches per 100,000 adults-Q2" "branch"
      "Number of ATMs per 100,000 adults-Q3" "ATMadult"
      "Number of ATM2 per 1,000 km2-Q4" "ATMkm"
      end
      save varname_crosswalk, replace
      Then, we can process your data:

      Code:
      clear
      input str100 Question str2 Country Index_2014 Index_2015
      "Number of deposit accounts per 1,000 adults-Q1" A 1 7
      "Number of bank branches per 100,000 adults-Q2" A 2 8
      "Number of ATMs per 100,000 adults-Q3" A 3 9
      "Number of deposit accounts per 1,000 adults-Q1" B 4 10
      "Number of bank branches per 100,000 adults-Q2" B 5 11
      "Number of ATM2 per 1,000 km2-Q4" C 6 12
      end
      
      merge m:1 Question using varname_crosswalk
      drop _merge
      
      egen item_num = group(Question)
      reshape long Index_, i(Country item_num) j(Year)
      
      reshape wide Index_ Question Varname, i(Country Year) j(item_num)
      
      forval x = 1/4{
      gsort Index_`x'
      local templab = Question`x'[1]
      label variable Index_`x' "`templab'"
      local tempvar = Varname`x'[1]
      rename Index_`x' `tempvar'
      drop Varname`x' Question`x'
      }
      I saw that you'd like reshape that back to wide again, check this page out to see how to carry over the variable labels: https://www.stata.com/support/faqs/d...after-reshape/. Or use something like the following, with the crosswalk file it should be each to compile this code:

      Code:
      order deposit branch ATMadult ATMkm
      reshape wide deposit-ATMkm, i(Country) j(Year)
      
      foreach x in 2014 2015{
      label variable ATMkm`x' "`x': Number of ATM2 per 1,000 km2-Q4"
      }
      Thanks Ken Chui. Sorry for late reply. Your codes work perfectly and I get expected results.

      Comment

      Working...
      X