Transposing and Combining

Louisa Abel

Join Date: Jun 2016

Posts: 11
#1

Transposing and Combining

14 Jun 2016, 11:32

Hello,

I am updating a .dta file that looks like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str14 statename int(et_gits1999 et_gits2000 et_gits2001) "Alabama" . . . "Alaska" 185 185 185 "Arizona" 185 185 185 "Arkansas" . . . "California" . . . end

I am updating pre 1999 and post 2010, and currently have a .csv in state, year, v1 formula. It looks like this when imported:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str110 state str85 year str4 et_gits "Alabama" "1997" "n.a." "Alabama" "1998" "n.a." "Alabama" "1999" "n.a." "Alabama" "2000" "n.a." "Alaska" "1997" "185" "Alaska" "1998" "185" "Alaska" "1999" "185" "Alaska" "2000" "185" end

How can I use stata to transpose/reformat the .csv I have so it can be added to the file I am updating? In other words, I need years to become the variables and have just one instance of each state.

Thank you!

Note: I realize I haven't changed "n.a." to . yet.

Last edited by Louisa Abel; 14 Jun 2016, 11:46.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 28608
#2

14 Jun 2016, 12:02

This is a task for -reshape-, one of Stata's most useful commands. Time spend reading the -help reshape- file and the corresponding section in the manual will be amply repaid. In addition, to make these data sets compatible, you have to convert the string representations of some of your variables to numeric and harmonize variable names. So:

Code:

// UPDATE FILE * Example generated by -dataex-. To install: ssc install dataex clear input str110 state str85 year str4 et_gits "Alabama" "1997" "n.a." "Alabama" "1998" "n.a." "Alabama" "1999" "n.a." "Alabama" "2000" "n.a." "Alaska" "1997" "185" "Alaska" "1998" "185" "Alaska" "1999" "185" "Alaska" "2000" "185" end tempfile updates // MAKE year & et_gits NUMERIC TO MATCH THE OTHER DATA FILE replace et_gits = "" if et_gits == "n.a." destring et_gits, replace destring year, replace reshape wide et_gits, i(state) j(year) quietly compress // SHRINK DOWN LENGTHY FIELD FOR state save `updates' // ORIGINAL FILE * Example generated by -dataex-. To install: ssc install dataex clear input str14 statename int(et_gits1999 et_gits2000 et_gits2001) "Alabama" . . . "Alaska" 185 185 185 "Arizona" 185 185 185 "Arkansas" . . . "California" . . . end // PUT THEM TOGETHER rename statename state // TO MATCH UPDATE DATA merge 1:1 state using `updates', update

This gets you what you asked for. But a couple of things to think about:

1. Almost anything you are going to eventually do with this data will be more easily done in long layout than wide. So consider following this up with -reshape long et_gits, i(state) j(year)-.

2. The years in the update file overlap the years in the original data set. In your example, the overlaps agree on the value of et_gits. But in your real data they may not. The present code keeps the values that appear in the original data, only replacing an originally missing value with a non-missing value from the update data set. But if you want to replace originally non-missing values with new conflicting values in the update data set, you have to add the -replace- option to the end of the -merge- command.

-merge- is another really useful data management command in Stata and is well worth learning about, too.

And thank you very much for using -dataex- in your very first post!
Comment
Louisa Abel

Join Date: Jun 2016

Posts: 11
#3

14 Jun 2016, 12:44

Wow! I cannot thank you enough - you helped beyond what I'd hoped for, the work should proceed much more easily. The code worked beautifully and you made it easy to understand. Your other suggestions make sense as well.
Comment

Announcement

Transposing and Combining

Comment

Comment