Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping from multi dimensional long to regular long

    I have a multidimensional panel structure - panel units are states, time variable is year, and there's another variable called valuename which has a corresponding metric value.
    Dataex:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str32 valuename double metric int year str20 state_name
    "Number of volunteers"        971.2646819 2004 "Alabama"
    "Number of volunteers"        1113.977571 2005 "Alabama"
    "Number of volunteers"        885.8076744 2006 "Alabama"
    "Number of volunteers"        842.0982383 2007 "Alabama"
    "Number of volunteers"        891.8058441 2008 "Alabama"
    "Number of volunteers"         927.275827 2009 "Alabama"
    "Number of volunteers"        974.2450833 2010 "Alabama"
    "Number of volunteers"        910.4783982 2011 "Alabama"
    "Number of volunteers"        900.4874115 2012 "Alabama"
    "Number of volunteers"        921.3599062 2013 "Alabama"
    "Number of volunteers"        905.0239872 2014 "Alabama"
    "Number of volunteers"         833.897401 2015 "Alabama"
    "Volunteer Hours in millions" 110.3327496 2004 "Alabama"
    "Volunteer Hours in millions" 174.2784784 2005 "Alabama"
    "Volunteer Hours in millions" 117.3132626 2006 "Alabama"
    "Volunteer Hours in millions"  93.3599999 2007 "Alabama"
    "Volunteer Hours in millions" 122.7068316 2008 "Alabama"
    "Volunteer Hours in millions" 91.95548496 2009 "Alabama"
    "Volunteer Hours in millions" 106.8859298 2010 "Alabama"
    "Volunteer Hours in millions" 100.8100441 2011 "Alabama"
    "Volunteer Hours in millions" 102.9366705 2012 "Alabama"
    "Volunteer Hours in millions" 116.4178635 2013 "Alabama"
    "Volunteer Hours in millions" 125.9858016 2014 "Alabama"
    "Volunteer Hours in millions" 98.70755004 2015 "Alabama"
    end
    I want each observation in the data to be:
    state - year - number of volunteers - volunteer hours, as such:
    Code:
    . list
    
         +---------------------------------------+
         | state_~e   year         NV         NH |
         |---------------------------------------|
      1. | Alabama    2004   971.2647   110.3327 |
      2. | Alabama    2005   1113.978   174.2785 |
      3. | Alabama    2006   885.8077   117.3133 |
      4. | Alabama    2007   842.0983      93.36 |
      5. | Alabama    2008   891.8058   122.7068 |
         |---------------------------------------|
      6. | Alabama    2009   927.2758   91.95548 |
      7. | Alabama    2010   974.2451   106.8859 |
      8. | Alabama    2011   910.4784     100.81 |
      9. | Alabama    2012   900.4874   102.9367 |
     10. | Alabama    2013   921.3599   116.4179 |
         |---------------------------------------|
     11. | Alabama    2014    905.024   125.9858 |
     12. | Alabama    2015   833.8974   98.70755 |
         +---------------------------------------+
    I accomplished this by manually creating and subscripting NH and NV. I have many more possible value for valuename, so this is not a realistic option, and I'm looking for a more elegant solution.

  • #2
    Code:
    generate byte indicator=1 if valuename=="Number of volunteers"
    replace indicator=2 if valuename=="Volunteer Hours in millions"
    drop valuename
    
    reshape wide metric, i(year state_name) j(indicator)
    rename metric1 NV
    rename metric2 NH
    
    label variable NV "Number of volunteers"
    label variable NH "Volunteer Hours in millions"
    
    list, clean
    results in
    Code:
           year   state_~e          NV          NH  
      1.   2004    Alabama   971.26468   110.33275  
      2.   2005    Alabama   1113.9776   174.27848  
      3.   2006    Alabama   885.80767   117.31326  
      4.   2007    Alabama   842.09824       93.36  
      5.   2008    Alabama   891.80584   122.70683  
      6.   2009    Alabama   927.27583   91.955485  
      7.   2010    Alabama   974.24508   106.88593  
      8.   2011    Alabama    910.4784   100.81004  
      9.   2012    Alabama   900.48741   102.93667  
     10.   2013    Alabama   921.35991   116.41786  
     11.   2014    Alabama   905.02399    125.9858  
     12.   2015    Alabama    833.8974    98.70755

    Comment


    • #3
      Well, I will illustrate it for just NH and NV, but it's going to take some work to generalize it to more possibilities for valuename.

      Code:
      replace valuename = "NV" if valuename == "Number of volunteers"
      replace valuename = "NH" if valuename == "Volunteer Hours in millions"
      
      reshape wide metric, i(state_name year) j(valuename) string
      rename metric* *
      So, you can see that you will need to create a long series of commands to replace valuename by whatever you want the corresponding new variable name to be. Since you apparently have something fairly non-systematic in mind for this (and it's hard to see how you could do anything systematic with the values of valuename--so that's not a criticism) I suggest doing it when your fingers are particularly limber.

      Added: Crossed with #2. Essentially these are the same solution.

      Comment


      • #4
        #3 (Clyde's code) is a more elegant solution, shorter and easier to maintain. (I keep on forgetting that reshape supports strings in the j() option)

        Ariel may also look at the stack command. Essentially you are doing the opposite of what stack is doing, unstacking. So there can easily be a standard or user-written alternative for generic case of many indicators that Clyde has mentioned.

        Ariel, if your data is coming from the Corporation for National & Community Service, see if you can make their API export the data in a different way closer to your desired layout.

        Best, Sergiy

        Comment


        • #5
          Here is another approach which is entirely automated:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str32 valuename double metric int year str20 state_name
          "Number of volunteers"        971.2646819 2004 "Alabama"
          "Number of volunteers"        1113.977571 2005 "Alabama"
          "Number of volunteers"        885.8076744 2006 "Alabama"
          "Number of volunteers"        842.0982383 2007 "Alabama"
          "Number of volunteers"        891.8058441 2008 "Alabama"
          "Number of volunteers"         927.275827 2009 "Alabama"
          "Number of volunteers"        974.2450833 2010 "Alabama"
          "Number of volunteers"        910.4783982 2011 "Alabama"
          "Number of volunteers"        900.4874115 2012 "Alabama"
          "Number of volunteers"        921.3599062 2013 "Alabama"
          "Number of volunteers"        905.0239872 2014 "Alabama"
          "Number of volunteers"         833.897401 2015 "Alabama"
          "Volunteer Hours in millions" 110.3327496 2004 "Alabama"
          "Volunteer Hours in millions" 174.2784784 2005 "Alabama"
          "Volunteer Hours in millions" 117.3132626 2006 "Alabama"
          "Volunteer Hours in millions"  93.3599999 2007 "Alabama"
          "Volunteer Hours in millions" 122.7068316 2008 "Alabama"
          "Volunteer Hours in millions" 91.95548496 2009 "Alabama"
          "Volunteer Hours in millions" 106.8859298 2010 "Alabama"
          "Volunteer Hours in millions" 100.8100441 2011 "Alabama"
          "Volunteer Hours in millions" 102.9366705 2012 "Alabama"
          "Volunteer Hours in millions" 116.4178635 2013 "Alabama"
          "Volunteer Hours in millions" 125.9858016 2014 "Alabama"
          "Volunteer Hours in millions" 98.70755004 2015 "Alabama"
          end
          
          encode valuename, gen(which)
          su which, meanonly
          local max `r(max)'
          forval j = 1/`max' {
               local label`j' : label (which) `j'
          }
          drop valuename
          reshape wide metric, i(state_name year) j(which)
          forval j = 1/`max' {
                label var metric`j' `"`label`j''"'
          }
              
          describe
          list
          
          . describe
          
          Contains data
            obs:            12                          
           vars:             4                          
           size:           456                          
          -------------------------------------------------------------------------------------------
                        storage   display    value
          variable name   type    format     label      variable label
          -------------------------------------------------------------------------------------------
          year            int     %8.0g                
          state_name      str20   %20s                  
          metric1         double  %10.0g                Number of volunteers
          metric2         double  %10.0g                Volunteer Hours in millions
          -------------------------------------------------------------------------------------------
          Sorted by: state_name  year
          
          . list
          
               +-----------------------------------------+
               | year   state_~e     metric1     metric2 |
               |-----------------------------------------|
            1. | 2004    Alabama   971.26468   110.33275 |
            2. | 2005    Alabama   1113.9776   174.27848 |
            3. | 2006    Alabama   885.80767   117.31326 |
            4. | 2007    Alabama   842.09824       93.36 |
            5. | 2008    Alabama   891.80584   122.70683 |
               |-----------------------------------------|
            6. | 2009    Alabama   927.27583   91.955485 |
            7. | 2010    Alabama   974.24508   106.88593 |
            8. | 2011    Alabama    910.4784   100.81004 |
            9. | 2012    Alabama   900.48741   102.93667 |
           10. | 2013    Alabama   921.35991   116.41786 |
               |-----------------------------------------|
           11. | 2014    Alabama   905.02399    125.9858 |
           12. | 2015    Alabama    833.8974    98.70755 |
               +-----------------------------------------+
          If you don't like the new variable names, you can loop over them like this:

          Code:
          forval j = 1/`max' {
              rename metric`j' `=strtoname("`: variable label metric`j''")'
          }
          Last edited by Nick Cox; 26 Jan 2018, 18:07.

          Comment


          • #6
            Great suggestions everybody, didn't think it was this easy. the "Data is already in long" error by reshape threw me off
            I especially like Nick's solution as it easily accommodates many levels for valuename, and from now on I'll remember that reshape can take multiple variables for i !

            Comment

            Working...
            X