Extract Rownames in a Given Order

Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#1

Extract Rownames in a Given Order

03 Jul 2022, 13:16

I wanna replicate the results of this paper. Consider their dataset

Code:

copy "http://qed.econ.queensu.ca/jae/datasets/ke001/kh-data.zip" "kh-data.zip", replace qui unzipfile kh-data.zip, replace erase kh-data.zip import delimited "kh-data.txt", clear

As we can see, the esteemed authors for some unknown reason left the dataset wide and didn't bother to give the columns meaningful names. The names of the provinces themselves, the units of interest, are in this file

Code:

import delimited "http://qed.econ.queensu.ca/jae/datasets/ke001/readme.kh.txt", rowrange(45:75) clear delim(")")

Here's what I want:

I want to rename each of the columns from the first dataset (v2-v32) to something like province_Beijing .... and so on and so forth.

However, I do not want to have to do

Code:

rename (v1 v2) (province_Beijing province_Tianjin) //.... and so on.

I want to automate the renaming, such that the variables from the first dataset are renamed with the values (in order) of the second dataset so that I may reshape it and get down to business.

How might I do this? My initial thought was levelsof, but that orders it alphabetically, which would mess up the names. Would there be a better, more efficient way of renaming the variables from my first dataset so that I can reshape it?
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2449
#2

03 Jul 2022, 13:53

An alternative to -levelsof- here is a loop over observations, which while not something one reaches for first, can actually be useful. Here's an illustration with some simulated data:

Code:

clear input str6 province "gamma" "alpha" "delta" "beta" end // Get a list of values in observation order. local plist "" forval i = 1/`=_N' { local next = province[`i'] local plist "`plist' `next'" } di "`plist'" // clear // As though from spreadsheet columns. input int A B C D 1 8 7 9 4 3 5 8 5 2 7 8 end // local i 1 foreach col of varlist A-D { local next: word `i' of `plist' rename `col' province_`next' local ++i }

Many people would integrate what I did with "next" into one of the other lines of code, but I like giving it its own line. And, I suppose there might be some way to use the -rename- group syntax, but the loop here is easy and transparent.
Comment

Jared Greathouse

Join Date: Sep 2021
Posts: 2172

03 Jul 2022, 14:18

I actually may have figured an even better solution that doesn't involve any ninjutsu with the loops. Here's what I did, following the first dataset above. Note that I use greshape from ssc

Code:

cls

import delimited "kh-data.txt", clear

rename v1 time

greshape long v, i(time) j(id)

rename v gdp

sort id time

replace id = id-1

Okay so now I can many-to-one-merge on the numerical IDs these using the variable names from the second dataset no problem. However, there is still one problem.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str6 time long id float gdp
"2012q1" 1  3416.429
"2012q2" 1  4113.423
"2012q3" 1 3854.3616
"2012q4" 1  4610.921
"2013q1" 1  3589.759
"2013q2" 1 4357.5444
"2013q3" 1  4018.098
"2013q4" 1  5187.173
"2014q1" 1 3772.0745
"2014q2" 1  4569.022
"2014q3" 1 4259.0146
"2014q4" 1  5598.663
"2015q1" 1  4030.519
"2015q2" 1 4862.4087
"2015q3" 1 4507.6787
"2015q4" 1  5874.013
"2016q1" 1  4541.294
"2016q2" 1  4944.575
"2016q3" 1  4887.746
"2016q4" 1  6817.533
"2017q1" 1  4949.091
"2017q2" 1  5161.233
"2017q3" 1  5780.082
"2017q4" 1  6799.653
"2018q1" 1  5796.737
"2018q2" 1  6372.873
"2018q3" 1   6580.56
"2018q4" 1  7417.869
"2019q1" 1  6126.183
"2019q2" 1  6725.564
"2019q3" 1  6894.786
"2019q4" 1   7576.19
"2020q1" 1  5659.532
"2020q2" 1   6664.27
"2020q3" 1  7253.663
"2020q4" 1  7879.889
"2012q1" 2 2324.9495
"2012q2" 2   2936.11
"2012q3" 2 2957.5896
"2012q4" 2  3337.592
"2013q1" 2 2560.0974
"2013q2" 2   3191.65
"2013q3" 2  3140.239
"2013q4" 2  3659.858
"2014q1" 2  2722.121
"2014q2" 2  3410.433
"2014q3" 2  3334.466
"2014q4" 2 3963.7314
"2015q1" 2  3129.262
"2015q2" 2  3497.508
"2015q3" 2 3698.0986
"2015q4" 2  3561.492
"2016q1" 2 3340.1096
"2016q2" 2 3685.8804
"2016q3" 2  3940.827
"2016q4" 2   3740.65
"2017q1" 2 3768.7224
"2017q2" 2  3799.166
"2017q3" 2  3255.705
"2017q4" 2  4122.099
"2018q1" 2 2390.5432
"2018q2" 2 2659.2346
"2018q3" 2  2645.334
"2018q4" 2  2863.757
"2019q1" 2 2413.8425
"2019q2" 2 2665.4275
"2019q3" 2  2747.271
"2019q4" 2  3021.268
"2020q1" 2 2149.9924
"2020q2" 2 2591.1194
"2020q3" 2 2842.3694
"2020q4" 2  3050.968
"2012q1" 3 4389.8115
"2012q2" 3  6539.175
"2012q3" 3  6588.034
"2012q4" 3  6260.407
"2013q1" 3  4594.247
"2013q2" 3  6825.585
"2013q3" 3  6774.175
"2013q4" 3  6525.343
"2014q1" 3 4621.0327
"2014q2" 3  6985.266
"2014q3" 3  6895.662
"2014q4" 3  6642.124
"2015q1" 3  5108.638
"2015q2" 3  6572.169
"2015q3" 3  6311.231
"2015q4" 3  7259.247
"2016q1" 3  5404.794
"2016q2" 3  6664.523
"2016q3" 3  6698.429
"2016q4" 3  7997.273
"2017q1" 3  6182.479
"2017q2" 3  7311.767
"2017q3" 3  7536.577
"2017q4" 3   6869.12
"2018q1" 3  5715.938
"2018q2" 3   6640.27
"2018q3" 3  6689.953
"2018q4" 3  6967.902
end

My dates are now strings. They're correctly formatted for some reason, but Stata won't recognize them. How would I destring these such that Stata knows these are actual quarterly dates?

NEVERMIND, here's the code for the full example

Code:

cls

import delimited "kh-data.txt", clear

rename v1 date

qui greshape long v, i(date) j(id)

rename v gdp

sort id date

replace id = id-1

gen date2 = quarterly(date, "YQ")

drop date

format %tq date2

rename date2 date

xtset id date, q

Last edited by Jared Greathouse; 03 Jul 2022, 14:24.

Announcement

Extract Rownames in a Given Order

Comment

Comment