Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Labelling values by importing information from .csv file

    Hello,
    We collected survey data on over 12,000 respondents across several hundred districts. The districts, however, are in their district id format- a 3 digit number ranging from 000 to 729. Excerpt below:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long UID int district
    331785 557
    331798 556
    331799 566
    331745 556
    331811 564
    331818 560
    331803 549
    331825 553
    331824 556
    331829 556
    331832 566
    331836 577
    331837 567
    331822 556
    331839 573
    331813 577
    278950 546
    278952 575
    278965 575
    278966 575
    278971 575
    278974 563
    278982 575
    278987 576
    278990 547
    278991 577
    278979 562
    279010 577
    279012 576
    305539 577
    305507 562
    end
    I have, in a CSV, the corresponding district names for each of these district IDs. How can I encode these? Is there a way other than la define and typing all the information into a do file?

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int district_id str26 district_name
      1 "Nicobar"                
      2 "North Middle Andaman"   
      3 "South Andaman"          
      4 "Anantapur"              
      5 "Chittoor"               
      6 "East Godavari"          
      7 "Guntur"                 
      8 "Kadapa"                 
      9 "Krishna"                
     10 "Kurnool"                
     11 "Nellore"                
     12 "Prakasam"               
     13 "Srikakulam"             
     14 "Visakhapatnam"          
     15 "Vizianagaram"           
     16 "West Godavari"          
     17 "Anjaw"                  
     18 "Central Siang"          
     19 "Changlang"              
     20 "Dibang Valley"          
     21 "East Kameng"            
     22 "East Siang"             
     23 "Kamle"                  
     24 "Kra Daadi"              
     25 "Kurung Kumey"           
     26 "Lepa Rada"              
     27 "Lohit"                  
     28 "Longding"               
     29 "Lower Dibang Valley"    
     30 "Lower Siang"            
     31 "Lower Subansiri"        
     32 "Namsai"                 
     33 "Pakke Kessang"          
     34 "Papum Pare"             
     35 "Shi Yomi"               
     36 "Tawang"                 
     37 "Tirap"                  
     38 "Upper Siang"            
     39 "Upper Subansiri"        
     40 "West Kameng"            
     41 "West Siang"             
     42 "Baksa"                  
     43 "Barpeta"                
     44 "Biswanath"              
     45 "Bongaigaon"             
     46 "Cachar"                 
     47 "Charaideo"              
     48 "Chirang"                
     49 "Darrang"                
     50 "Dhemaji"                
     51 "Dhubri"                 
     52 "Dibrugarh"              
     53 "Dima Hasao"             
     54 "Goalpara"               
     55 "Golaghat"               
     56 "Hailakandi"             
     57 "Hojai"                  
     58 "Jorhat"                 
     59 "Kamrup"                 
     60 "Kamrup Metropolitan"    
     61 "Karbi Anglong"          
     62 "Karimganj"              
     63 "Kokrajhar"              
     64 "Lakhimpur"              
     65 "Majuli"                 
     66 "Morigaon"               
     67 "Nagaon"                 
     68 "Nalbari"                
     69 "Sivasagar"              
     70 "Sonitpur"               
     71 "South Salmara-Mankachar"
     72 "Tinsukia"               
     73 "Udalguri"               
     74 "West Karbi Anglong"     
     75 "Araria"                 
     76 "Arwal"                  
     77 "Aurangabad"             
     78 "Banka"                  
     79 "Begusarai"              
     80 "Bhagalpur"              
     81 "Bhojpur"                
     82 "Buxar"                  
     83 "Darbhanga"              
     84 "East Champaran"         
     85 "Gaya"                   
     86 "Gopalganj"              
     87 "Jamui"                  
     88 "Jehanabad"              
     89 "Kaimur"                 
     90 "Katihar"                
     91 "Khagaria"               
     92 "Kishanganj"             
     93 "Lakhisarai"             
     94 "Madhepura"              
     95 "Madhubani"              
     96 "Munger"                 
     97 "Muzaffarpur"            
     98 "Nalanda"                
     99 "Nawada"                 
    100 "Patna"                  
    end
    Thanks, would really appreciate help on this.

  • #2
    See this very recent thread https://www.statalist.org/forums/for...t-value-labels with exactly the same problem. One trick is to convert your .csv file to a do-file defining the value labels. Another trick is read the labels in as a separate dataset and then merge.

    Comment


    • #3
      If you want a value label, then you will have to define that value label; there are many ways of automating this. One way is merge-ing the two files, using district_id as an identifier. Then use labmask (part of labutil on SSC; originally appeared in Cox 2008) to use the (string) values of district_name as a value label for district_id (or a copy of it). Alternatively, you could use elabel (SSC) for the second step.

      Here is a fictional example based on the original example

      Code:
      // example data; after merge
      clear
      input long UID int district_id str26 district_name
      123456  1 "Nicobar"                   
      789012 42 "Baksa"
      345678 42 "Baksa"
      901234 73 "Udalguri"                             
      end
      
      // look at data
      list
      
      // make a copy of district_id
      generate district = district_id
      
      // create and assign value label (labmask)
      *ssc install labutil
      labmask district , values(district_name)
      
      // create and assign value label (elabel)
      *ssc install elabel
      *elabel define district:district = levels(district_id district_name)
      
      // look at results
      list
      describe
      label list
      Best
      Daniel


      Cox, N. J. 2008. Speaking Stata: Between tables and graphs. The Stata Journal, 8(2): 269--289.

      Comment

      Working...
      X