Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need help reshaping data / getting it into a different format, and I haven't been successful using the reshape command.*


    ​​​​​Hey folks, I need some help on how to manipulate/reshape a dataset, and I'm not sure how to do this using the standard reshape command. Any help would be appreciated.

    I have data on natural hazard events for, in this example code, 5 sites. Each observation is a site-hazard combination, so a site can occupy multiple lines if it experienced multiple hazards. I also have data on the number of days each site experienced each hazard, as well as the number of 1-, 2- and 3- day gaps between hazards at each site. The data presently looks like this:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str15 site str12 type byte(total_days sum_day1 sum_day2 sum_day3)
    "Jefferson Park"  "Closure"      12  0  0  0
    "Washington Park" "Bac Advisory"  4  5  4  4
    "Washington Park" "HAB Advisory" 25  9  9  7
    "Madison Park"    "Bac Advisory"  7 11  4  0
    "Eisenhower Park" "Closure"      15  0  0  0
    "Eisenhower Park" "Bac Advisory"  5  3  3  3
    "Eisenhower Park" "HAB Advisory" 19  7  6  5
    "Adams Park"      "Closure"      24  0  0  0
    "Adams Park"      "HAB Advisory"  8 14 12 11
    end
    Notice that I don't have data on the sum_day1 through sun_day3 variables for closures.
    I need to get the data into a format in which each observation is a site (instead of a site-hazard combination), and for each site the variables include the number of hazard days for each hazard, as well as the number of "gap" days for the non-closure hazards.
    Here is an example of the format I need, hich corresponds to the above code:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str15 site byte(closuredays habdays hab_day1 hab_day2 hab_day3 bacdays bac_day1 bac_day2 bac_day3)
    "Jefferson Park"  12  0  0  0  0 0  0 0 0
    "Washington Park"  0 25  9  9  7 4  5 4 4
    "Madison Park"     0  0  0  0  0 7 11 4 0
    "Eisenhower Park" 15 19  7  6  5 5  3 3 3
    "Adams Park"      24  8 14 12 11 0  0 0 0
    end
    Please let me know if you are able to figure out how to do this. Thank you!



  • #2
    This is a bit more complicated than just a -reshape-. The variable names and the content of variable type do not lend themselves to -reshape-ing. On top of that, observations with type == "Closure" require completely different handling from the others.

    Code:
    replace type = lower(type)
    replace type = substr(type, 1, 3) if type != "closure"
    rename total_days _days
    rename sum_* _*
    
    preserve
    keep if type == "closure"
    keep site _days
    collapse (sum) _days, by(site)
    rename _days closure_days
    tempfile holding
    save `holding'
    restore
    
    drop if type == "closure"
    
    reshape wide @_days @_day1 @_day2 @_day3, i(site) j(type) string
    merge 1:1 site using `holding', nogenerate
    mvencode *day*, mv(0) override
    Note: For type Bac Advisory and HAB Advisory, we can get the variable names you want in the result by just using the first three letters, switched to lower case, which is what I have done. But if the full data set has other values of type for which this first three letters in lower case rule doesn't work, then you will have to write some more complicated code to replace the values of variable type with whatever prefix you want for their variables in the result.

    All of that said, I have to ask whether you have a good reason for doing this. Apart from the fact that it is complicated, and undoing it would be similarly complicated, keep in mind that there are not many things in data management and analysis that are best done, or even doable at all, with the wide layout you are creating. Nearly all Stata commands are optimized for long data layouts. I fear that you will regret doing this, that the wide layout will just get in the way of whatever you need to do with the data from this point on. So think carefully about what you will be doing with this data and make sure you will definitely be doing some of the few things in Stata that work best (or at all) in wide layout.

    Comment

    Working...
    X