Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Transposing and Combining

    Hello,

    I am updating a .dta file that looks like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str14 statename int(et_gits1999 et_gits2000 et_gits2001)
    "Alabama"      .   .   .
    "Alaska"     185 185 185
    "Arizona"    185 185 185
    "Arkansas"     .   .   .
    "California"   .   .   .
    end
    I am updating pre 1999 and post 2010, and currently have a .csv in state, year, v1 formula. It looks like this when imported:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str110 state str85 year str4 et_gits
    "Alabama" "1997" "n.a."
    "Alabama" "1998" "n.a."
    "Alabama" "1999" "n.a."
    "Alabama" "2000" "n.a."
    "Alaska"  "1997" "185"
    "Alaska"  "1998" "185"
    "Alaska"  "1999" "185"
    "Alaska"  "2000" "185"
    end

    How can I use stata to transpose/reformat the .csv I have so it can be added to the file I am updating? In other words, I need years to become the variables and have just one instance of each state.

    Thank you!

    Note: I realize I haven't changed "n.a." to . yet.
    Last edited by Louisa Abel; 14 Jun 2016, 11:46.

  • #2
    This is a task for -reshape-, one of Stata's most useful commands. Time spend reading the -help reshape- file and the corresponding section in the manual will be amply repaid. In addition, to make these data sets compatible, you have to convert the string representations of some of your variables to numeric and harmonize variable names. So:

    Code:
    // UPDATE FILE
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str110 state str85 year str4 et_gits
    "Alabama" "1997" "n.a."
    "Alabama" "1998" "n.a."
    "Alabama" "1999" "n.a."
    "Alabama" "2000" "n.a."
    "Alaska"  "1997" "185" 
    "Alaska"  "1998" "185" 
    "Alaska"  "1999" "185" 
    "Alaska"  "2000" "185" 
    end
    tempfile updates
    // MAKE year & et_gits NUMERIC TO MATCH THE OTHER DATA FILE
    replace et_gits = "" if et_gits == "n.a."
    destring et_gits, replace
    destring year, replace
    reshape wide et_gits, i(state) j(year)
    quietly compress // SHRINK DOWN LENGTHY FIELD FOR state
    save `updates'
    
    // ORIGINAL FILE
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str14 statename int(et_gits1999 et_gits2000 et_gits2001)
    "Alabama"      .   .   .
    "Alaska"     185 185 185
    "Arizona"    185 185 185
    "Arkansas"     .   .   .
    "California"   .   .   .
    end
    
    // PUT THEM TOGETHER
    rename statename state // TO MATCH UPDATE DATA
    merge 1:1 state using `updates', update
    This gets you what you asked for. But a couple of things to think about:

    1. Almost anything you are going to eventually do with this data will be more easily done in long layout than wide. So consider following this up with -reshape long et_gits, i(state) j(year)-.

    2. The years in the update file overlap the years in the original data set. In your example, the overlaps agree on the value of et_gits. But in your real data they may not. The present code keeps the values that appear in the original data, only replacing an originally missing value with a non-missing value from the update data set. But if you want to replace originally non-missing values with new conflicting values in the update data set, you have to add the -replace- option to the end of the -merge- command.

    -merge- is another really useful data management command in Stata and is well worth learning about, too.

    And thank you very much for using -dataex- in your very first post!

    Comment


    • #3
      Wow! I cannot thank you enough - you helped beyond what I'd hoped for, the work should proceed much more easily. The code worked beautifully and you made it easy to understand. Your other suggestions make sense as well.

      Comment

      Working...
      X