Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Long to wide, converting values to label for new variable

    Hi all,

    I've got a raw dataset in the following format:

    indicatorcode indicatordescription regionid regionname bbnum unit value year
    94408 SCBs credit: Agriculture 1000 Vellore 1004269 Rs. million 1889.3 2000
    94408 SCBs credit: Agriculture 1000 Vellore 1004269 Rs. million 1925.7 2001
    94408 SCBs credit: Agriculture 1000 Vellore 1004269 Rs. million 1901.5 2002
    94408 SCBs credit: Agriculture 1000 Vellore 1004269 Rs. million 2253.7 2003
    94408 SCBs credit: Agriculture 254 Faizabad 1002577 Rs. million 9107.3 2016
    94408 SCBs credit: Agriculture 254 Faizabad 1002577 Rs. million 10302.2 2017
    94408 SCBs credit: Agriculture 254 Faizabad 1002577 Rs. million 12951.6 2018
    94408 SCBs credit: Agriculture 256 Farrukhabad 1002580 Rs. million 1070.6 2000
    where the value of indicatordescription is actually the variable (there are 50 of them in the full dataset). Ignore bbnum for now (don't know what that is). But, essentially, I'd like to get this into the following format:

    region_id region_name SCB_cr year
    1000 Vellore 1889.3 2000
    1000 Vellore 1925.7 2001
    1000 Vellore 1901.5 2002
    1000 Vellore 2253.7 2003
    254 Faizabad 9107.3 2016
    254 Faizabad 10302.2 2017
    254 Faizabad 12951.6 2018
    256 Farrukhabad 1070.6 2000
    where, if possible, the label for SCB_cr is the previous value of indicator description (i.e. "SCBs credit: Agriculture"). Although this is not super crucial, as I will probably have to edit the labels anyway.

    Would appreciate any help!

  • #2
    I see no long to wide here.

    You can't associate value labels with non-integers. Your existing variable indicatordescription looks fine. Why you do think value labels will help?

    Otherwise you are just ignoring (possibly dropping) certain variables and end with the same number of observations. If you want new variable names, reach for rename.

    Last edited by Nick Cox; 11 Mar 2019, 09:14.

    Comment


    • #3
      Hi Nick, thanks for the response. I think I may have excerpted the table misleadingly and did not clarify properly. Perhaps this may be a more representative sample of the dataset in its raw form:

      indicatorcode indicatordescription regionid regionname bbnum unit value year
      94408 SCBs credit: Agriculture 1000 Vellore 1004269 Rs. million 1889.3 2000
      94408 SCBs credit: Agriculture 1000 Vellore 1004269 Rs. million 1889.3 2001
      94408 SCBs credit: Agriculture 254 Faizabad 1002577 Rs. million 1925.7 2000
      94408 SCBs credit: Agriculture 254 Faizabad 1002577 Rs. million 1925.7 2000
      213716 Area of land holdings 1000 Vellore 1004269 thousands of hectares 19.5 2000
      213716 Area of land holdings 1000 Vellore 1004269 thousands of hectares 20.8 2001
      213716 Area of land holdings 254 Faizabad 1004269 thousands of hectares 16.2 2000
      213716 Area of land holdings 254 Faizabad 1004269 thousands of hectares 17.6 2001
      216127 Number of land holdings 1000 Vellore 1004269 numbers 495 2000
      216127 Number of land holdings 1000 Vellore 1004269 numbers 480 2001
      216127 Number of land holdings 254 Faizabad 1002577 numbers 169 2000
      216127 Number of land holdings 256 Faizabad 1004269 numbers 174 2001

      Whereas I need this to be:

      region_id region_name SCB_cr land_area land_num year
      1000 Vellore 1889.3 19.5 495 2000
      1000 Vellore 1951.6 20.8 480 2001
      254 Faizabad 1925.7 16.2 169 2000
      254 Faizabad 2014.3 17.6 174 2001

      Comment


      • #4
        That is clear. Thanks!

        I note various problems in your example. One observation is duplicated. In the last observation 256 is the id for Faizabad, and in others it is 254.

        This may help otherwise:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long indicatorcode str24 indicatordescription int regionid str8 regionname long bbnum str21 unit float value int year
         94408 "SCBs credit: Agriculture" 1000 "Vellore"  1004269 "Rs. million"           1889.3 2000
         94408 "SCBs credit: Agriculture" 1000 "Vellore"  1004269 "Rs. million"           1889.3 2001
         94408 "SCBs credit: Agriculture"  254 "Faizabad" 1002577 "Rs. million"           1925.7 2000
         94408 "SCBs credit: Agriculture"  254 "Faizabad" 1002577 "Rs. million"           1925.7 2000
        213716 "Area of land holdings"    1000 "Vellore"  1004269 "thousands of hectares"   19.5 2000
        213716 "Area of land holdings"    1000 "Vellore"  1004269 "thousands of hectares"   20.8 2001
        213716 "Area of land holdings"     254 "Faizabad" 1004269 "thousands of hectares"   16.2 2000
        213716 "Area of land holdings"     254 "Faizabad" 1004269 "thousands of hectares"   17.6 2001
        216127 "Number of land holdings"  1000 "Vellore"  1004269 "numbers"                  495 2000
        216127 "Number of land holdings"  1000 "Vellore"  1004269 "numbers"                  480 2001
        216127 "Number of land holdings"   254 "Faizabad" 1002577 "numbers"                  169 2000
        216127 "Number of land holdings"   256 "Faizabad" 1004269 "numbers"                  174 2001
        end
        
        gen which = "_" + strtoname(indicatordescription)
        duplicates drop
        drop indicator* bbnum unit
        reshape wide value, j(which) i(region* year) string
        rename (value_*) *
        
        list 
        
            +-------------------------------------------------------------+
             | regionid   region~e   year   Area_o~s   Number~s   SCBs_c~e |
             |-------------------------------------------------------------|
          1. |      254   Faizabad   2000       16.2        169     1925.7 |
          2. |      254   Faizabad   2001       17.6          .          . |
          3. |      256   Faizabad   2001          .        174          . |
          4. |     1000    Vellore   2000       19.5        495     1889.3 |
          5. |     1000    Vellore   2001       20.8        480     1889.3 |
             +-------------------------------------------------------------+
        
        .

        Comment


        • #5
          Thanks very much, Nick, despite my sloppiness. This worked perfectly in the full dataset after some initial pre-cleaning.

          Comment

          Working...
          X