Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reshape / transpose large dataset for merging

    I am constructing a large timeseries dataset, with cases arranged around WDI_code (World Development Index) and year (variable names bold).


    WDI_code year Variable a...
    AFG 1975 n
    AFG 1976 n




    [apologies for not using 'dataex' to share, I got an error message 'input statement exceeds linesize limit. Try specifying fewer variables
    r(1000);
    ' when I tried]

    There are currently around 200 variables in Dataset A, and I'm looking to add in around 100 more from another source (Dataset B). My problem comes from the arrangement of Dataset B:

    WDI_code Indicator 1975 1976...
    AFG Variable x n n
    AFG Variable y n n
    ALG Variable x n n
    ALG... Variable y n n
    I've tried (and failed) to do a 'normal' transposition in excel, and in stata 'xpose, clear' is getting error message 'no room to add more variables
    Up to 5,000 variables are currently allowed, although you could reset the maximum using set maxvar; see help memory.
    r(900);
    '.

    I think that I need to do something through reshape to change Dataset B, so that in an 'add variables' merger with Dataset A I can match according to key variables WDI_code and year. But I don't know what, and can't find sufficient guidance...

    Any pointers would be hugely appreciated!

  • #2
    Hello and welcome!

    Data 1 is easy to replicate but the nature of the variables in data 2 would really affect how the codes would look like. I understand that -dataex- was giving you a hard time about row and column limit, would you be will to try again? It's possible to lower the amount of data extracted, for example, if we just want to extract wdi_code, year, and var1, first 20 cases, we can use:
    Code:
    dataex wid_code year var1 in 1/20
    Same goes for the data 2, the way how the indicator variable was formatted and how the year column was named is important. Make sure the sample data has at least two or more countries, indicators, and 2 more more year variables. To learn what other options -dataex- provides, use:
    Code:
    help dataex

    Comment


    • #3
      Thanks for this, Ken.

      I tried to dataex the first 236 cases of Dataset 2 (the set that needs a reshape), so that you would have the full 118 indicators for two countries (AFG, ALB). However, the 'count(#)' extension of dataex (as suggested from the help command) wasn't being recognised (I went, unsuccessfully, with 'dataex WDI_code IndicatorName IndicatorCode _v1 _v2 _v47 _v48 in 1/236 count(236)').

      So, below I've pasted three dataex's for Dataset 2, which together hopefully give you the info you require. The original file is excel - the 1960, 1961... 2017 labels have been adjusted to _v1 etc. in stata. Huge amounts of missing data, I've given _v1 _v2 (1960 and 1961) that are mostly or entirely empty, and _v47 _v48 (which I guess are 2007, 2008) that are more populated.

      I've also pasted the dataex of the first 100 cases of Dataset 1 below that.

      Apologies for the volume of stuff I'm throwing at you! Hope it's enough/the right stuff...

      Liam
      *********

      dataex WDI_code IndicatorName IndicatorCode _v1 _v2 _v47 _v48 in 1/2

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str3 WDI_code str110 IndicatorName str14 IndicatorCode double(_v1 _v2 _v47 _v48)
      "AFG" "5-bank asset concentration"                            "GFDD.OI.06" . . 100 100
      "AFG" "Account at a formal financial institution (% age 15+)" "GFDD.AI.05" . .   .   .
      end

      dataex WDI_code IndicatorName IndicatorCode _v1 _v2 _v47 _v48 in 119/120

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str3 WDI_code str110 IndicatorName str14 IndicatorCode double(_v1 _v2 _v47 _v48)
      "ALB" "Account at a formal financial institution (% age 15+)" "GFDD.AI.05" . . . .
      "ALB" "Account used for business purposes (% age 15+)"        "GFDD.AI.08" . . . .
      end
      dataex WDI_code IndicatorName IndicatorCode _v1 _v2 _v47 _v48 in 1/100

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str3 WDI_code str110 IndicatorName str14 IndicatorCode double(_v1 _v2 _v47 _v48)
      "AFG" "5-bank asset concentration"                                                                                     "GFDD.OI.06"             .         .       100       100
      "AFG" "Account at a formal financial institution (% age 15+)"                                                          "GFDD.AI.05"             .         .         .         .
      "AFG" "Account used for business purposes (% age 15+)"                                                                 "GFDD.AI.08"             .         .         .         .
      "AFG" "Account used to receive government payments (% age 15+)"                                                        "GFDD.AI.09"             .         .         .         .
      "AFG" "Account used to receive remittances (% age 15+)"                                                                "GFDD.AI.10"             .         .         .         .
      "AFG" "Account used to receive wages (% age 15+)"                                                                      "GFDD.AI.11"             .         .         .         .
      "AFG" "ATMs per 100,000 adults"                                                                                        "GFDD.AI.25"             .         .   .115723   .205091
      "AFG" "Bank accounts per 1,000 adults"                                                                                 "GFDD.AI.01"             .         .         .         .
      "AFG" "Bank branches per 100,000 adults"                                                                               "GFDD.AI.02"             .         .   .918548   1.23054
      "AFG" "Bank capital to total assets (%)"                                                                               "GFDD.SI.03"             .         .         .         .
      "AFG" "Bank concentration (%)"                                                                                         "GFDD.OI.01"             .         .   97.3211   89.1844
      "AFG" "Bank cost to income ratio (%)"                                                                                  "GFDD.EI.07"             .         .   61.6096   94.5866
      "AFG" "Bank credit to bank deposits (%)"                                                                               "GFDD.SI.04"        615.47   498.675   52.9056   56.8219
      "AFG" "Bank deposits to GDP (%)"                                                                                       "GFDD.OI.02"       .315881    .38455   8.99636   9.92974
      "AFG" "Bank lending-deposit spread"                                                                                    "GFDD.EI.02"             .         .         .         .
      "AFG" "Bank net interest margin (%)"                                                                                   "GFDD.EI.01"             .         .   9.86467   8.21465
      "AFG" "Bank noninterest income to total income (%)"                                                                    "GFDD.EI.03"             .         .   39.0535   20.9122
      "AFG" "Bank non-performing loans to gross loans (%)"                                                                   "GFDD.SI.02"             .         .         .         .
      "AFG" "Bank overhead costs to total assets (%)"                                                                        "GFDD.EI.04"             .         .   7.55613   7.62319
      "AFG" "Bank regulatory capital to risk-weighted assets (%)"                                                            "GFDD.SI.05"             .         .         .         .
      "AFG" "Bank return on assets (%, after tax)"                                                                           "GFDD.EI.05"             .         .   1.64735  -.468667
      "AFG" "Bank return on assets (%, before tax)"                                                                          "GFDD.EI.09"             .         .   2.47054  -.546617
      "AFG" "Bank return on equity (%, after tax)"                                                                           "GFDD.EI.06"             .         .   5.61671  -1.94585
      "AFG" "Bank return on equity (%, before tax)"                                                                          "GFDD.EI.10"             .         .   8.42339  -2.26949
      "AFG" "Bank Z-score"                                                                                                   "GFDD.SI.01"             .         .   32.4048   20.6923
      "AFG" "Banking crisis dummy (1=banking crisis, 0=none)"                                                                "GFDD.OI.19"             .         .         0         0
      "AFG" "Boone indicator"                                                                                                "GFDD.OI.05"             .         .  -.010206   .026995
      "AFG" "Central bank assets to GDP (%)"                                                                                 "GFDD.DI.06"       5.59686   6.18166         .         .
      "AFG" "Checks used to make payments (% age 15+)"                                                                       "GFDD.AI.19"             .         .         .         .
      "AFG" "Consolidated foreign claims of BIS reporting banks to GDP (%)"                                                  "GFDD.OI.14"             .         .         .         .
      "AFG" "Consumer price index (2010=100, average)"                                                                       "GFDD.OE.02"             .         .   76.4387    83.074
      "AFG" "Consumer price index (2010=100, December)"                                                                      "GFDD.OE.01"             .         .    78.278   90.8742
      "AFG" "Corporate bond average maturity (years)"                                                                        "GFDD.DM.15"             .         .         .         .
      "AFG" "Corporate bond issuance volume to GDP (%)"                                                                      "GFDD.DM.13"             .         .         .         .
      "AFG" "Credit card (% age 15+)"                                                                                        "GFDD.AI.20"             .         .         .         .
      "AFG" "Credit to government and state-owned enterprises to GDP (%)"                                                    "GFDD.EI.08"             .         .         .         .
      "AFG" "Debit card (% age 15+)"                                                                                         "GFDD.AI.21"             .         .         .         .
      "AFG" "Deposit money bank assets to deposit money bank assets and central bank assets (%)"                             "GFDD.DI.04"       25.7811   23.6768         .         .
      "AFG" "Deposit money banks'' assets to GDP (%)"                                                                        "GFDD.DI.02"       1.94415   1.91766   4.97169   6.02222
      "AFG" "Depositing/withdrawing at least once in a typical month (% age 15+)"                                            "GFDD.AI.26"             .         .         .         .
      "AFG" "Domestic credit to private sector (% of GDP)"                                                                   "GFDD.DI.14"       9.27273   8.75304   4.84286   6.83998
      "AFG" "Electronic payments used to make payments (% age 15+)"                                                          "GFDD.AI.22"             .         .         .         .
      "AFG" "External loans and deposits of reporting banks vis-à-vis all sectors (% of domestic bank deposits)"            "GFDD.OI.12"             .         .   163.594   165.093
      "AFG" "External loans and deposits of reporting banks vis-à-vis the banking sector (% of domestic bank deposits)"     "GFDD.OI.10"             .         .   145.098   144.534
      "AFG" "External loans and deposits of reporting banks vis-à-vis the nonbanking sectors (% of domestic bank deposits)" "GFDD.OI.11"             .         .    18.496   20.5591
      "AFG" "Financial system deposits to GDP (%)"                                                                           "GFDD.DI.08"       .315881    .38455   8.99636   9.92974
      "AFG" "Firms identifying access to finance as a major constraint (%)"                                                  "GFDD.AI.36"             .         .         .         .
      "AFG" "Firms not needing a loan (%)"                                                                                   "GFDD.AI.32"             .         .         .         .
      "AFG" "Firms using banks to finance investments (%)"                                                                   "GFDD.AI.28"             .         .         .         .
      "AFG" "Firms using banks to finance working capital (%)"                                                               "GFDD.AI.29"             .         .         .         .
      "AFG" "Firms whose recent loan application was rejected (%)"                                                           "GFDD.AI.33"             .         .         .         .
      "AFG" "Firms with a bank loan or line of credit (%)"                                                                   "GFDD.AI.03"             .         .         .         .
      "AFG" "Firms with a checking or savings account (%)"                                                                   "GFDD.AI.27"             .         .         .         .
      "AFG" "Foreign bank assets among total bank assets (%)"                                                                "GFDD.OI.16"             .         .         .         .
      "AFG" "Foreign banks among total banks (%)"                                                                            "GFDD.OI.15"             .         .         .         .
      "AFG" "GDP at market prices (constant 2005 US$)"                                                                       "NY.GDP.MKTP.KD"         .         .         .         .
      "AFG" "GDP at market prices (current US$)"                                                                             "NY.GDP.MKTP.CD" 5.400e+08 5.500e+08 7.000e+09 9.700e+09
      "AFG" "GDP per capita (constant 2005 US$)"                                                                             "NY.GDP.PCAP.KD"         .         .   365.284   405.549
      "AFG" "GDP per capita (current US$)"                                                                                   "NY.GDP.PCAP.CD"         .         .         .         .
      "AFG" "Global leasing volume to GDP (%)"                                                                               "GFDD.OI.17"             .         .         .         .
      "AFG" "Gross National Income (current US$)"                                                                            "NY.GNP.MKTP.CD" 5.500e+08 5.600e+08 7.100e+09 9.900e+09
      "AFG" "Gross portfolio debt assets to GDP (%)"                                                                         "GFDD.DM.11"             .         .         .         .
      "AFG" "Gross portfolio debt liabilities to GDP (%)"                                                                    "GFDD.DM.10"             .         .         .         .
      "AFG" "Gross portfolio equity assets to GDP (%)"                                                                       "GFDD.DM.09"             .         .         .         .
      "AFG" "Gross portfolio equity liabilities to GDP (%)"                                                                  "GFDD.DM.08"             .         .         .         .
      "AFG" "H-statistic"                                                                                                    "GFDD.OI.03"             .         .         .         .
      "AFG" "Insurance company assets to GDP (%)"                                                                            "GFDD.DI.11"             .         .         .         .
      "AFG" "International debt issues to GDP (%)"                                                                           "GFDD.DM.07"             .         .         .         .
      "AFG" "Investments financed by banks (%)"                                                                              "GFDD.AI.34"             .         .         .         .
      "AFG" "Investments financed by equity or stock sales (%)"                                                              "GFDD.AM.04"             .         .         .         .
      "AFG" "Lerner index"                                                                                                   "GFDD.OI.04"             .         .   .008098   .168216
      "AFG" "Life insurance premium volume to GDP (%)"                                                                       "GFDD.DI.09"             .         .         .         .
      "AFG" "Liquid assets to deposits and short term funding (%)"                                                           "GFDD.SI.06"             .         .   73.1833   42.1558
      "AFG" "Liquid liabilities in millions USD (2000 constant)"                                                             "GFDD.OI.07"       1414.42   1549.59   1770.21   2488.87
      "AFG" "Liquid liabilities to GDP (%)"                                                                                  "GFDD.DI.05"       5.68063    5.8107   22.9425   22.7642
      "AFG" "Loan from a financial institution in the past year (% age 15+)"                                                 "GFDD.AI.07"             .         .         .         .
      "AFG" "Loan from a private lender in the past year (% age 15+)"                                                        "GFDD.AI.15"             .         .         .         .
      "AFG" "Loan from an employer in the past year (% age 15+)"                                                             "GFDD.AI.16"             .         .         .         .
      "AFG" "Loan from family or friends in the past year (% age 15+)"                                                       "GFDD.AI.18"             .         .         .         .
      "AFG" "Loan in the past year (% age 15+)"                                                                              "GFDD.AI.14"             .         .         .         .
      "AFG" "Loan through store credit in the past year (% age 15+)"                                                         "GFDD.AI.17"             .         .         .         .
      "AFG" "Loans from nonresident banks (amounts outstanding) to GDP (%)"                                                  "GFDD.OI.09"             .         .   .566765   1.09713
      "AFG" "Loans from nonresident banks (net) to GDP (%)"                                                                  "GFDD.OI.08"             .         .         .         .
      "AFG" "Loans requiring collateral (%)"                                                                                 "GFDD.AI.30"             .         .         .         .
      "AFG" "Market capitalization excluding top 10 companies to total market capitalization (%)"                            "GFDD.AM.02"             .         .         .         .
      "AFG" "Mobile phone used to pay bills (% age 15+)"                                                                     "GFDD.AI.23"             .         .         .         .
      "AFG" "Mobile phone used to send money (% age 15+)"                                                                    "GFDD.AI.24"             .         .         .         .
      "AFG" "Mutual fund assets to GDP (%)"                                                                                  "GFDD.DI.07"             .         .         .         .
      "AFG" "Nonbank financial institutions’ assets to GDP (%)"                                                            "GFDD.DI.03"             .         .         .         .
      "AFG" "Nonfinancial corporate bonds to total bonds and notes outstanding (%)"                                          "GFDD.AM.03"             .         .         .         .
      "AFG" "Non-life insurance premium volume to GDP (%)"                                                                   "GFDD.DI.10"             .         .         .         .
      "AFG" "Number of listed companies per 1,000,000 people"                                                                "GFDD.OM.01"             .         .         .         .
      "AFG" "Outstanding domestic private debt securities to GDP (%)"                                                        "GFDD.DM.03"             .         .         .         .
      "AFG" "Outstanding domestic public debt securities to GDP (%)"                                                         "GFDD.DM.04"             .         .         .         .
      "AFG" "Outstanding international private debt securities to GDP (%)"                                                   "GFDD.DM.05"             .         .         .         .
      "AFG" "Outstanding international public debt securities to GDP (%)"                                                    "GFDD.DM.06"             .         .         .         .
      "AFG" "Pension fund assets to GDP (%)"                                                                                 "GFDD.DI.13"             .         .         .         .
      "AFG" "Population, total"                                                                                              "SP.POP.TOTL"      9000000   9200000  26000000  27000000
      "AFG" "Private credit by deposit money banks and other financial institutions to GDP (%)"                              "GFDD.DI.12"       1.94415   1.91766   4.75958   5.64227
      "AFG" "Private credit by deposit money banks to GDP (%)"                                                               "GFDD.DI.01"       1.94415   1.91766   4.75958   5.64227
      end


      And here is the dataex for Dataset 1, the master that I want to merge DS2 with on key variables WDI_code and year.

      dataex WDI_code year system yrsoffc in 1/100

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str3 WDI_code int(year system yrsoffc)
      "AFG" 1975    0    2
      "AGO" 1975 -999 -999
      "ALB" 1975    1   30
      "ARE" 1975    0    4
      "ARG" 1975    0    1
      "ARM" 1975 -999 -999
      "AUS" 1975    2    3
      "AUT" 1975    2    5
      "AZE" 1975 -999 -999
      "BDI" 1975    0    9
      "BEL" 1975    2    1
      "BEN" 1975    0    3
      "BFA" 1975    0    1
      "BGD" 1975    2    3
      "BGR" 1975    1   21
      "BHR" 1975    0    4
      "BHS" 1975    2    7
      "BIH" 1975 -999 -999
      "BLR" 1975 -999 -999
      "BLZ" 1975 -999 -999
      "BOL" 1975    0    4
      "BRA" 1975    1    1
      "BRB" 1975    2   14
      "BRN" 1975    0    4
      "BTN" 1975    0    5
      "BWA" 1975    2   10
      "CAF" 1975    0   10
      "CAN" 1975    2    7
      "CHE" 1975    2    1
      "CHL" 1975    0    2
      "CHN" 1975    1   26
      "CIV" 1975    0   17
      "CMR" 1975    0   17
      "COG" 1975    0    6
      "COL" 1975    0    1
      "COM" 1975 -999 -999
      "CPV" 1975 -999 -999
      "CRI" 1975    0    1
      "CSK" 1975    1    6
      "CUB" 1975    1    0
      "CYP" 1975    0   15
      "DDR" 1975    0    2
      "DEU" 1975    2    1
      "DJI" 1975 -999 -999
      "DNK" 1975    2    2
      "DOM" 1975    0    9
      "DZA" 1975    0   10
      "ECU" 1975    0    3
      "EGY" 1975    1    5
      "ERI" 1975 -999 -999
      "ESP" 1975    0    1
      "EST" 1975 -999 -999
      "ETH" 1975    0    1
      "FIN" 1975    2    .
      "FJI" 1975    2    8
      "FRA" 1975    2    1
      "GAB" 1975    0    8
      "GBR" 1975    2    1
      "GEO" 1975 -999 -999
      "GHA" 1975    0    3
      "GIN" 1975    0   17
      "GMB" 1975    1   12
      "GNB" 1975    0    1
      "GNQ" 1975    0    7
      "GRC" 1975    1    1
      "GRD" 1975    2    1
      "GTM" 1975    0    1
      "GUY" 1975    2    9
      "HND" 1975    0    3
      "HRV" 1975 -999 -999
      "HTI" 1975    0    4
      "HUN" 1975    1   20
      "IDN" 1975    1    8
      "IND" 1975    2    9
      "IRL" 1975    2    2
      "IRN" 1975    0   34
      "IRQ" 1975    0    7
      "ISL" 1975    2    1
      "ISR" 1975    2    1
      "ITA" 1975    2    1
      "JAM" 1975    2    3
      "JOR" 1975    0   22
      "JPN" 1975    2    1
      "KAZ" 1975 -999 -999
      "KEN" 1975    0   12
      "KGZ" 1975 -999 -999
      "KHM" 1975    1    5
      "KOR" 1975    1   14
      "KWT" 1975    0   10
      "LAO" 1975    2   13
      "LBN" 1975    0    5
      "LBR" 1975    0    4
      "LBY" 1975    0    6
      "LCA" 1975 -999 -999
      "LKA" 1975    2    5
      "LSO" 1975    0   10
      "LTU" 1975 -999 -999
      "LUX" 1975    2    1
      "LVA" 1975 -999 -999
      "MAR" 1975    0   14
      end
      label values system labels0
      label def labels0 0 "Presidential", modify
      label def labels0 1 "Assembly-Elected President", modify
      label def labels0 2 "Parliamentary", modify

      Comment


      • #4
        Thanks, try this on your data 2. I had to use the indicator code as the variable name, but you should be able to add those longer descriptions back as variable labels:

        Code:
        gen ind_code = subinstr(IndicatorCode, ".", "_", .)
        drop IndicatorCode IndicatorName
        reshape long _v, i(WDI_code ind_code) j(year)
        reshape wide _v, i(WDI_code year) j(ind_code, string)
        rename _v* *
        This should create a data 2 that has unique country/year, and each indicator its own variable. Then, with data 1 open, you should be able to use -merge 1:1 WDI_code year using data2- to merge them. But remember to fix the year in data 2 first. Right now they are 1 through 48... which may need to be converted.

        Comment


        • #5
          Wow - that seems to have worked! Huge thanks for sharing your expertise on this, Ken. I'll hopefully merge tomorrow.

          Thanks again,

          Liam

          Comment

          Working...
          X