Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extract Rownames in a Given Order

    I wanna replicate the results of this paper. Consider their dataset
    Code:
    copy "http://qed.econ.queensu.ca/jae/datasets/ke001/kh-data.zip" "kh-data.zip", replace
    
    qui unzipfile kh-data.zip, replace
    
    erase kh-data.zip
    
    import delimited "kh-data.txt", clear
    As we can see, the esteemed authors for some unknown reason left the dataset wide and didn't bother to give the columns meaningful names. The names of the provinces themselves, the units of interest, are in this file
    Code:
    import delimited "http://qed.econ.queensu.ca/jae/datasets/ke001/readme.kh.txt", rowrange(45:75) clear delim(")")
    Here's what I want:

    I want to rename each of the columns from the first dataset (v2-v32) to something like province_Beijing .... and so on and so forth.

    However, I do not want to have to do

    Code:
    rename (v1 v2) (province_Beijing province_Tianjin) //.... and so on.
    I want to automate the renaming, such that the variables from the first dataset are renamed with the values (in order) of the second dataset so that I may reshape it and get down to business.

    How might I do this? My initial thought was levelsof, but that orders it alphabetically, which would mess up the names. Would there be a better, more efficient way of renaming the variables from my first dataset so that I can reshape it?

  • #2
    An alternative to -levelsof- here is a loop over observations, which while not something one reaches for first, can actually be useful. Here's an illustration with some simulated data:
    Code:
    clear
    input str6 province
    "gamma"
    "alpha"
    "delta"
    "beta"
    end
    // Get a list of values in observation order.
    local plist ""
    forval i = 1/`=_N' {
       local next = province[`i']
       local plist "`plist' `next'"
    }
    di "`plist'"
    //
    clear
    // As though from spreadsheet columns.
    input int A B C D
    1 8 7 9
    4 3 5 8
    5 2 7 8
    end
    //
    local i 1
    foreach col of varlist A-D {
       local next: word `i' of `plist'
       rename `col' province_`next'
       local ++i
    }
    Many people would integrate what I did with "next" into one of the other lines of code, but I like giving it its own line. And, I suppose there might be some way to use the -rename- group syntax, but the loop here is easy and transparent.

    Comment


    • #3
      I actually may have figured an even better solution that doesn't involve any ninjutsu with the loops. Here's what I did, following the first dataset above. Note that I use greshape from ssc
      Code:
      cls
      
      import delimited "kh-data.txt", clear
      
      rename v1 time
      
      greshape long v, i(time) j(id)
      
      rename v gdp
      
      sort id time
      
      replace id = id-1
      Okay so now I can many-to-one-merge on the numerical IDs these using the variable names from the second dataset no problem. However, there is still one problem.
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str6 time long id float gdp
      "2012q1" 1  3416.429
      "2012q2" 1  4113.423
      "2012q3" 1 3854.3616
      "2012q4" 1  4610.921
      "2013q1" 1  3589.759
      "2013q2" 1 4357.5444
      "2013q3" 1  4018.098
      "2013q4" 1  5187.173
      "2014q1" 1 3772.0745
      "2014q2" 1  4569.022
      "2014q3" 1 4259.0146
      "2014q4" 1  5598.663
      "2015q1" 1  4030.519
      "2015q2" 1 4862.4087
      "2015q3" 1 4507.6787
      "2015q4" 1  5874.013
      "2016q1" 1  4541.294
      "2016q2" 1  4944.575
      "2016q3" 1  4887.746
      "2016q4" 1  6817.533
      "2017q1" 1  4949.091
      "2017q2" 1  5161.233
      "2017q3" 1  5780.082
      "2017q4" 1  6799.653
      "2018q1" 1  5796.737
      "2018q2" 1  6372.873
      "2018q3" 1   6580.56
      "2018q4" 1  7417.869
      "2019q1" 1  6126.183
      "2019q2" 1  6725.564
      "2019q3" 1  6894.786
      "2019q4" 1   7576.19
      "2020q1" 1  5659.532
      "2020q2" 1   6664.27
      "2020q3" 1  7253.663
      "2020q4" 1  7879.889
      "2012q1" 2 2324.9495
      "2012q2" 2   2936.11
      "2012q3" 2 2957.5896
      "2012q4" 2  3337.592
      "2013q1" 2 2560.0974
      "2013q2" 2   3191.65
      "2013q3" 2  3140.239
      "2013q4" 2  3659.858
      "2014q1" 2  2722.121
      "2014q2" 2  3410.433
      "2014q3" 2  3334.466
      "2014q4" 2 3963.7314
      "2015q1" 2  3129.262
      "2015q2" 2  3497.508
      "2015q3" 2 3698.0986
      "2015q4" 2  3561.492
      "2016q1" 2 3340.1096
      "2016q2" 2 3685.8804
      "2016q3" 2  3940.827
      "2016q4" 2   3740.65
      "2017q1" 2 3768.7224
      "2017q2" 2  3799.166
      "2017q3" 2  3255.705
      "2017q4" 2  4122.099
      "2018q1" 2 2390.5432
      "2018q2" 2 2659.2346
      "2018q3" 2  2645.334
      "2018q4" 2  2863.757
      "2019q1" 2 2413.8425
      "2019q2" 2 2665.4275
      "2019q3" 2  2747.271
      "2019q4" 2  3021.268
      "2020q1" 2 2149.9924
      "2020q2" 2 2591.1194
      "2020q3" 2 2842.3694
      "2020q4" 2  3050.968
      "2012q1" 3 4389.8115
      "2012q2" 3  6539.175
      "2012q3" 3  6588.034
      "2012q4" 3  6260.407
      "2013q1" 3  4594.247
      "2013q2" 3  6825.585
      "2013q3" 3  6774.175
      "2013q4" 3  6525.343
      "2014q1" 3 4621.0327
      "2014q2" 3  6985.266
      "2014q3" 3  6895.662
      "2014q4" 3  6642.124
      "2015q1" 3  5108.638
      "2015q2" 3  6572.169
      "2015q3" 3  6311.231
      "2015q4" 3  7259.247
      "2016q1" 3  5404.794
      "2016q2" 3  6664.523
      "2016q3" 3  6698.429
      "2016q4" 3  7997.273
      "2017q1" 3  6182.479
      "2017q2" 3  7311.767
      "2017q3" 3  7536.577
      "2017q4" 3   6869.12
      "2018q1" 3  5715.938
      "2018q2" 3   6640.27
      "2018q3" 3  6689.953
      "2018q4" 3  6967.902
      end
      My dates are now strings. They're correctly formatted for some reason, but Stata won't recognize them. How would I destring these such that Stata knows these are actual quarterly dates?



      NEVERMIND, here's the code for the full example
      Code:
      cls
      
      import delimited "kh-data.txt", clear
      
      rename v1 date
      
      qui greshape long v, i(date) j(id)
      
      rename v gdp
      
      sort id date
      
      replace id = id-1
      
      gen date2 = quarterly(date, "YQ")
      
      drop date
      
      format %tq date2
      
      rename date2 date
      
      xtset id date, q
      Last edited by Jared Greathouse; 03 Jul 2022, 14:24.

      Comment

      Working...
      X