Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reshape or transpose?

    Dear All, I was asked this question here (https://bbs.pinggu.org/forum.php?mod...a=#pid74912031). The data set is
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long idstd float(lat lon) str13 longname
    518601  -1.95972 30.12857 "56"           
    518601  -1.95972 30.12857 "RN3"          
    518601  -1.95972 30.12857 "Nyarugunga"   
    518602 -1.936392 30.09101 "23"           
    518602 -1.936392 30.09101 "KG 594 Street"
    518602 -1.936392 30.09101 "Kacyiru"      
    end
    The desired output is that, for each "idstd", all the information is restructured in the same row (with elements in "type" as the variable names), i.e.,
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long idstd float(lat lon) str13(street_number route political)
    518601  -1.95972 30.12857 "56" "RN3"           "Nyarugunga"
    518602 -1.936392 30.09101 "23" "KG 594 Street" "Kacyiru"   
    end
    Any suggestions are highly appreciated.
    Ho-Chuan (River) Huang
    Stata 19.0, MP(4)

  • #2
    Thanks for providing the dataex for your input and output. I think you might have missed the variable "type" in your data example, so I added it in following your cue from your desired output. Once you have that, you're only a step away.

    Code:
    clear
    input long idstd float(lat lon) str13(longname type)
    518601  -1.95972 30.12857 "56"            "street_number"
    518601  -1.95972 30.12857 "RN3"           "route"        
    518601  -1.95972 30.12857 "Nyarugunga"    "political"    
    518602 -1.936392 30.09101 "23"            "street_number"
    518602 -1.936392 30.09101 "KG 594 Street" "route"        
    518602 -1.936392 30.09101 "Kacyiru"       "political"    
    end
    
    reshape wide longname, i(idstd) j(type) string
    rename (longname*) (*)
    order idstd lat lon, first
    Res.:

    Code:
         +----------------------------------------------------------------------------+
         |  idstd         lat        lon    political           route   street_number |
         |----------------------------------------------------------------------------|
      1. | 518601    -1.95972   30.12857   Nyarugunga             RN3              56 |
      2. | 518602   -1.936392   30.09101      Kacyiru   KG 594 Street              23 |
         +----------------------------------------------------------------------------+

    Comment


    • #3
      Dear Leonardo, My bad (forgot the type variable, and thanks very much. Could you have a look at the similar data set
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long idstd float(lat lon) str106 longname str27 type
      518601  -1.95972 30.12857 "56"                     "street_number"              
      518601  -1.95972 30.12857 "RN3"                    "route"                      
      518601  -1.95972 30.12857 "Nyarugunga"             "political"                  
      518601  -1.95972 30.12857 "Kigali"                 "locality"                   
      518601  -1.95972 30.12857 "Kicukiro"               "administrative_area_level_2"
      518601  -1.95972 30.12857 "Kigali City"            "administrative_area_level_1"
      518601  -1.95972 30.12857 "Rwanda"                 "country"                    
      518602 -1.936392 30.09101 "23"                     "street_number"              
      518602 -1.936392 30.09101 "KG 594 Street"          "route"                      
      518602 -1.936392 30.09101 "Kacyiru"                "political"                  
      518602 -1.936392 30.09101 "Kigali"                 "locality"                   
      518602 -1.936392 30.09101 "Gasabo"                 "administrative_area_level_2"
      518602 -1.936392 30.09101 "Kigali City"            "administrative_area_level_1"
      518602 -1.936392 30.09101 "Rwanda"                 "country"                    
      518603  -1.93923 30.04487 "Nyabugogo Taxi Park"    "bus_station"                
      518603  -1.93923 30.04487 "Boulevard de Nyabugoro" "route"                      
      518603  -1.93923 30.04487 "Muhima"                 "political"                  
      518603  -1.93923 30.04487 "Kigali"                 "locality"                   
      518603  -1.93923 30.04487 "Nyarugenge"             "administrative_area_level_2"
      518603  -1.93923 30.04487 "Kigali City"            "administrative_area_level_1"
      518603  -1.93923 30.04487 "Rwanda"                 "country"                    
      end
      Using your suggestion is fine with the shortened data posted above, but incurred an error message like
      Code:
      . reshape wide longname, i(idstd) j(type) string
      (note: j = administrative_area_level_1 administrative_area_level_2 bus_station country locality political route street_num
      > ber)
      longnameadministrative_area_level_1 invalid variable name
      r(198);
      How can we deal with this problem?
      Ho-Chuan (River) Huang
      Stata 19.0, MP(4)

      Comment


      • #4
        It looks like it's a problem with too long variable names after transformation. It would be most straight forward to make sure the j() variable and the type variable don't sum to more than 32 characters. Depending on the real data, it might be ok to rename just the few of the longest -type- values.

        Comment


        • #5
          Dear Leonardo, Thanks. I will try to shorten the name to see if it works.
          Ho-Chuan (River) Huang
          Stata 19.0, MP(4)

          Comment


          • #6
            This got me thinking about using a more general approach. Suppose that we have a dataset similar to yours:
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input byte obs_id str36 name_key str12 value
            1 "Administrative District Capital City" "Toronto"    
            1 "Street address"                       "123 Fake St."
            1 "Region"                               "ON"          
            2 "Administrative District Capital City" "Vancouver"  
            2 "Street address"                       "789 Real St."
            2 "Region"                               "BC"          
            end
            The -name_key- here happens to have a human-readable label but is not suitable as a legal, Stata variable name (because it contains spaces, and some values are too long). The approach is then to gather all the key names in their own dataset and process them into a an integer representation and a suitable version that can be a variable name. The key id is then used for the -reshape-, after which I use the defined labels to create appropriate labels and variable names.

            Code:
            /* Create a map of key number, a suitable variable name and the (original) key name. */
            frame put name_key, into(Keys)
            frame change Keys
            bysort name_key : keep if _n==1
            gen int key_id = _n
            
            gen vkey = strtoname(name_key)
            
            forval i = 1/`=_N' {
                label define key_names `=key_id[`i']' `"`=name_key[`i']'"', modify
              label define key_vars `=key_id[`i']' `"`=vkey[`i']'"', modify
            }
            list
            lab list
            
            tempfile keylbls
            label save key_names key_vars using `keylbls', replace
            
            /* prepare data for reshape */
            frame change default
            frlink m:1 name_key, frame(Keys) gen(lkey)
            frget key_id, from(lkey)
            drop lkey
            drop name_key
            
            reshape wide value, i(obs_id) j(key_id)
            qui do `keylbls'
            
            /* add labels and rename variables */
            foreach v of varlist value* {
                local keyid = subinstr("`v'", "value", "", .)
              local vkey = "`: label key_vars `keyid''"  
              rename (`v') (`vkey')
              local keylbl = "`: label key_names `keyid''"  
              label var `vkey' `"`keylbl'"'
            }
            
            desc, full
            list, abbrev(32)
            Result:

            Code:
            . desc, full
            
            Contains data
              obs:             2                          
             vars:             4                          
            --------------------------------------------------------------------------------------------
                          storage   display    value
            variable name   type    format     label      variable label
            --------------------------------------------------------------------------------------------
            obs_id          byte    %8.0g                
            Administrative_District_Capital_
                            str12   %12s                  Administrative District Capital City
            Region          str12   %12s                  Region
            Street_address  str12   %12s                  Street address
            --------------------------------------------------------------------------------------------
            Sorted by: obs_id
                 Note: Dataset has changed since last saved.
            
            . list, abbrev(32)
            
                 +---------------------------------------------------------------------+
                 | obs_id   Administrative_District_Capital_   Region   Street_address |
                 |---------------------------------------------------------------------|
              1. |      1                            Toronto       ON     123 Fake St. |
              2. |      2                          Vancouver       BC     789 Real St. |
                 +---------------------------------------------------------------------+
            Last edited by Leonardo Guizzetti; 28 Feb 2021, 11:57.

            Comment


            • #7
              Dear Leonardo, Many thanks for this more general method.
              Ho-Chuan (River) Huang
              Stata 19.0, MP(4)

              Comment


              • #8
                You're welcome. I hope it helps.

                Comment

                Working...
                X