Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping data from wide to long.

    Dear Statalist,

    I'm trying to reshape a dataset provided by the World Economic Forum, from wide to long. However, I'm having some difficulty.

    Currently the variables in my dataset look as follows.
    Code:
    year variable calbania calgeria cangola cargentina
    2015-2016 1.01 Property rights, 1-7 (best)Value 3.0484 3.74812  2.87602
    2015-2016 1.01 Property rights, 1-7 (best)Rank 130 106  134
    2015-2016 1.02 Intellectual property protection, 1-7 (best)Value 3.2145 3.27317  2.98785
    2015-2016 1.02 Intellectual property protection, 1-7 (best)Rank 110 105  125
    2015-2016 1. Property rightsValue 3.10376 3.5898  2.9133
    2015-2016 1. Property rightsRank 127 106  133
    2015-2016 1.03 Diversion of public funds, 1-7 (best)Value 3.21978 3.01587  1.88643
    2015-2016 1.03 Diversion of public funds, 1-7 (best)Rank 75 86  136
    2015-2016 1.04 Public trust in politicians, 1-7 (best)Value 2.56159 2.90653  1.61751
    2015-2016 1.04 Public trust in politicians, 1-7 (best)Rank 92 75  137
    2015-2016 1.05 Irregular payments and bribes, 1-7 (best)Value 3.73597 3.09924  2.89357
    2015-2016 1.05 Irregular payments and bribes, 1-7 (best)Rank 79 110  120
    2015-2016 2. Ethics and corruptionValue 3.17245 3.00721  2.13251

    and so on. I would like to reshape it into panel data so it would look like the following,

    year country variable1 variable2 variable3

    A guide is provided in the Sata journal (http://www.stata-journal.com/sjpdf.h...iclenum=dm0031) however my data is wide by county not year.

    So when I try the following I get the following error.

    Code:
     reshape long c, i(year variable)
    variable _j contains all missing values
    r(498);
    Does anyone have any tips for to overcome this?

  • #2
    I am confused by the data sample you presented here, as I don't see how your list of 6 variable names fits with the data you display, In particular, I don't see how you are parsing each line to get just 6 items. For example, I see 3 numeric values, but 4 variables of the form c*. If you would post your data sample in some more friendly fashion, say using -dataex- per item 12.2 of the FAQ, that would be helpful.
    One guess I have is that you will need to use the string option on reshape, since your stubs are not differentiated by a numeric suffix.

    Comment


    • #3
      Your first step should be something like:
      Code:
      reshape long c, i(variable) j(ctry) string
      Before you do that you should get a unique variable identifier. These are now repeated.
      After that, reshape wide.

      Would have been easier to provide proper code if you had posted data examples using dataex (http://www.statalist.org/forums/help#stata), something that you could have been expected to know after 80 posts on this list.

      Comment


      • #4
        Apologies Mike and Jorrit. I tried to edit my post to include the dataex, but it appears you can not edit after someone has commented below?

        Anyway, here is a sample of the data, and thank you both for your input.

        [CODE]
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str9 year str83 variable float(calbania calgeria)
        "2015-2016" "           1.01 Property rights, 1-7 (best)Value"                                      3.048397   3.748118
        "2015-2016" "           1.01 Property rights, 1-7 (best)Rank"                                            130        106
        "2015-2016" "           1.02 Intellectual property protection, 1-7 (best)Value"                    3.2145004  3.2731705
        "2015-2016" "           1.02 Intellectual property protection, 1-7 (best)Rank"                           110        105
        "2015-2016" "           1. Property rightsValue"                                                    3.103765   3.589802
        "2015-2016" "           1. Property rightsRank"                                                          127        106
        "2015-2016" "           1.03 Diversion of public funds, 1-7 (best)Value"                            3.219779  3.0158715
        "2015-2016" "           1.03 Diversion of public funds, 1-7 (best)Rank"                                   75         86
        "2015-2016" "           1.04 Public trust in politicians, 1-7 (best)Value"                          2.561594    2.90653
        "2015-2016" "           1.04 Public trust in politicians, 1-7 (best)Rank"                                 92         75
        "2015-2016" "           1.05 Irregular payments and bribes, 1-7 (best)Value"                        3.735972  3.0992396
        "2015-2016" "           1.05 Irregular payments and bribes, 1-7 (best)Rank"                               79        110
        "2015-2016" "           2. Ethics and corruptionValue"                                             3.1724484  3.0072136
        "2015-2016" "           2. Ethics and corruptionRank"                                                     81         90
        "2015-2016" "           1.06 Judicial independence, 1-7 (best)Value"                               2.6354206   3.337148
        "2015-2016" "           1.06 Judicial independence, 1-7 (best)Rank"                                      120         95
        "2015-2016" "           1.07 Favoritism in decisions of government officials, 1-7 (best)Value"      3.363081   3.074755
        "2015-2016" "           1.07 Favoritism in decisions of government officials, 1-7 (best)Rank"             51         72
        "2015-2016" "           3. Undue influenceValue"                                                    2.999251   3.205951
        "2015-2016" "           3. Undue influenceRank"                                                           95         88
        "2015-2016" "           1.08 Wastefulness of government spending, 1-7 (best)Value"                 3.2821956  3.0641334
        "2015-2016" "           1.08 Wastefulness of government spending, 1-7 (best)Rank"                         64         76
        "2015-2016" "           1.09 Burden of government regulation, 1-7 (best)Value"                      4.520034  3.3042295
        "2015-2016" "           1.09 Burden of government regulation, 1-7 (best)Rank"                              9         83
        "2015-2016" "           1.10 Efficiency of legal framework in settling disputes, 1-7 (best)Value"   2.808616   3.378237
        "2015-2016" "           1.10 Efficiency of legal framework in settling disputes, 1-7 (best)Rank"         121         85
        "2015-2016" "           1.11 Efficiency of legal framework in challenging regs., 1-7 (best)Value"  2.9270916   3.266026
        "2015-2016" "           1.11 Efficiency of legal framework in challenging regs., 1-7 (best)Rank"         108         85
        "2015-2016" "           1.12 Transparency of government policymaking, 1-7 (best)Value"              4.298673  3.3605504
        "2015-2016" "           1.12 Transparency of government policymaking, 1-7 (best)Rank"                     53        122
        "2015-2016" "           4. Government efficiencyValue"                                              3.567322   3.274635
        "2015-2016" "           4. Government efficiencyRank"                                                     68         89
        "2015-2016" "           1.13 Business costs of terrorism, 1-7 (best)Value"                          5.667157  4.2200046
        "2015-2016" "           1.13 Business costs of terrorism, 1-7 (best)Rank"                                 48        122
        "2015-2016" "           1.14 Business costs of crime and violence, 1-7 (best)Value"                 4.753675  4.2600746
        "2015-2016" "           1.14 Business costs of crime and violence, 1-7 (best)Rank"                        63         83
        "2015-2016" "           1.15 Organized crime, 1-7 (best)Value"                                     4.3888936   4.463378
        "2015-2016" "           1.15 Organized crime, 1-7 (best)Rank"                                             94         89

        Comment


        • #5
          Alright.
          For the reshape long part, code as suggested does the trick:
          Code:
          reshape long c, i(variable) j(ctry) string
          You would then want to reshape wide, doing
          Code:
          reshape wide c, i(ctry) j(variable) string
          Stata complains at this point saying "variable variable has leading or trailing blanks".
          What would be best at this point, is that you come up with a more sensible naming convention for you variables.
          You dont want to be typing something like "1.07 Favoritism in decisions of government officials, 1-7 (best)Rank" when you do your regressions.
          Youll have to come up with a better way to rename these.

          Comment


          • #6
            Sean #4 The limit to editing is 1 hour after initial posting. That's why you couldn't edit.

            In any case, once someone has commented on a major point, it's usually better to re-post. Otherwise the thread is difficult to read through.

            Comment


            • #7
              Hi Jorrit,

              Thank you for your helpful comments, but perhaps I could enquire further about two things, as I'm slightly confused?

              Firstly, you say -
              Before you do that you should get a unique variable identifier.
              [CODE] dataex year variable calbania calgeria

              ----------------------- copy starting from the next line -----------------------
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str9 year str22 variable float(calbania calgeria)
              "2006-2007" "1.01Rank"                      112         71
              "2006-2007" "1.01Value"               3.1320925   4.111848
              "2006-2007" "1.02Rank"                      117         86
              "2006-2007" "1.02Value"               2.0472245  2.9194174
              "2006-2007" "1.03Rank"                      109         75
              "2006-2007" "1.03Value"               2.3704517   3.144576
              "2006-2007" "1.04Rank"                      104         51
              "2006-2007" "1.04Value"               1.6739587   2.679776
              "2006-2007" "1.06Rank"                       96         72
              "2006-2007" "1.06Value"                2.585554   3.333062
              "2006-2007" "1.07Rank"                      113         28
              "2006-2007" "1.07Value"               2.1653826   3.769033
              "2006-2007" "1.08Rank"                       92         41
              "2006-2007" "1.08Value"               2.6463504   3.500622
              "2006-2007" "1.09Rank"                       95         87
              "2006-2007" "1.09Value"                2.591196  2.6803365
              "2006-2007" "1.12Rank"                      115         53
              "2006-2007" "1.12Value"                 2.78481  4.2352943
              "2006-2007" "1.13Rank"                       98        114
              "2006-2007" "1.13Value"                4.393044   3.628695
              "2006-2007" "1.14Rank"                       81         69
              "2006-2007" "1.14Value"                3.651295   3.950249
              "2006-2007" "1.15Rank"                      105         65
              "2006-2007" "1.15Value"                3.435655  4.6576004
              "2006-2007" "1.16Rank"                       82         38
              "2006-2007" "1.16Value"                3.528822   4.893181
              "2006-2007" "1.17Rank"                       95         72
              "2006-2007" "1.17Value"                3.640357   3.890075
              "2006-2007" "1.18Rank"                       96        110
              "2006-2007" "1.18Value"                3.739976   3.470503
              "2006-2007" "1.19Rank"                       78        116
              "2006-2007" "1.19Value"               4.3073025   3.720161
              "2006-2007" "1.21Rank"                      110         41
              "2006-2007" "1.21Value"                     2.7        5.3
              "2006-2007" "1.2Rank"                       104         46
              "2006-2007" "1.2Value"                  3.40839  4.6313415
              "2006-2007" "10.01Rank"                      86         48
              "2006-2007" "10.01Value"               3.322037  4.3134747
              When you say this do mean that for instance "2006-2007" "1.01 Rank" "2007-2008" "1.01 Rank".... would all be one with others as 2 and so on?

              Code:
              reshape long c, i(variable) j(ctry) string
              What does
              Code:
               ctry
              stand for?

              Was this generated through the encode command?

              Comment


              • #8
                To reshape wide (after the first long reshape) try:

                Code:
                *clean up variable
                    replace variable=trim(variable)  
                  
                *first reshape long
                    reshape long c, i(variable year) j(ctry) string     
                
                *now extract the type of variable (rank or value) in a new variable "type"
                    gen type=""
                    replace type="value" if strmatch(variable, "*Value*")
                    replace type="rank" if strmatch(variable, "*Rank*")
                
                *get rid of type from variable so it constant within country/year
                    replace variable=subinstr(variable, "Rank", "", .)
                    replace variable=subinstr(variable, "Value", "", .)
                
                *now reshape wide
                    reshape wide c, i(variable ctry year) j(type) string
                I added year in the reshape command because I assume you will have multiple year (it won't hurt if you don't).
                Stata/MP 14.1 (64-bit x86-64)
                Revision 19 May 2016
                Win 8.1

                Comment


                • #9
                  The comment on unique variable identifier was because I thought only the codes 1.01, 1.02 etc were the variable names, which are repeated twice in the example in #1. This is oen of the reasosn to use dataex: it is very difficult to read the data otherwise.
                  ctry was short for country.

                  Other than that, I believe Carole provides a helpful suggestion for cleaning up your variable names a lot.

                  Comment


                  • #10
                    Carole,

                    Thank you for your helpful code, as it appears to have gotten me 90% of the way I wish to go. When I run the commands in #8 on the data example provided in #7 I am left with the following.

                    Code:
                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input str22 variable str9 year str7 ctry float(crank cvalue)
                    "1.01"  "2006-2007" "albania" 112 3.1320925
                    "1.01"  "2006-2007" "algeria"  71  4.111848
                    "1.02"  "2006-2007" "albania" 117 2.0472245
                    "1.02"  "2006-2007" "algeria"  86 2.9194174
                    "1.03"  "2006-2007" "albania" 109 2.3704517
                    "1.03"  "2006-2007" "algeria"  75  3.144576
                    "1.04"  "2006-2007" "albania" 104 1.6739587
                    "1.04"  "2006-2007" "algeria"  51  2.679776
                    "1.06"  "2006-2007" "albania"  96  2.585554
                    "1.06"  "2006-2007" "algeria"  72  3.333062
                    "1.07"  "2006-2007" "albania" 113 2.1653826
                    "1.07"  "2006-2007" "algeria"  28  3.769033
                    "1.08"  "2006-2007" "albania"  92 2.6463504
                    "1.08"  "2006-2007" "algeria"  41  3.500622
                    "1.09"  "2006-2007" "albania"  95  2.591196
                    "1.09"  "2006-2007" "algeria"  87 2.6803365
                    "1.12"  "2006-2007" "albania" 115   2.78481
                    "1.12"  "2006-2007" "algeria"  53 4.2352943
                    "1.13"  "2006-2007" "albania"  98  4.393044
                    "1.13"  "2006-2007" "algeria" 114  3.628695
                    "1.14"  "2006-2007" "albania"  81  3.651295
                    "1.14"  "2006-2007" "algeria"  69  3.950249
                    "1.15"  "2006-2007" "albania" 105  3.435655
                    "1.15"  "2006-2007" "algeria"  65 4.6576004
                    "1.16"  "2006-2007" "albania"  82  3.528822
                    "1.16"  "2006-2007" "algeria"  38  4.893181
                    "1.17"  "2006-2007" "albania"  95  3.640357
                    "1.17"  "2006-2007" "algeria"  72  3.890075
                    "1.18"  "2006-2007" "albania"  96  3.739976
                    "1.18"  "2006-2007" "algeria" 110  3.470503
                    "1.19"  "2006-2007" "albania"  78 4.3073025
                    "1.19"  "2006-2007" "algeria" 116  3.720161
                    "1.2"   "2006-2007" "albania" 104   3.40839
                    "1.2"   "2006-2007" "algeria"  46 4.6313415
                    "1.21"  "2006-2007" "albania" 110       2.7
                    "1.21"  "2006-2007" "algeria"  41       5.3
                    "10.01" "2006-2007" "albania"  86  3.322037
                    "10.01" "2006-2007" "algeria"  48 4.3134747
                    end
                    ------------------ copy up to and including the previous line ------------------
                    While the countries are now in long format, what I wish to have is the various variables in wide format. i.e. 1.01rank as a variable 1.01value as a variable and so on.

                    Comment


                    • #11
                      The decimal points need to be replaced before the values of 'variable' can be used as varibale names
                      Consider something like:

                      Code:
                      gen varcode = subinstr(variable, ".","_",.)
                      drop variable
                      reshape wide crank cvalue, i(ctry year) j(varcode) string

                      Comment


                      • #12
                        Another way to do it is

                        Code:
                         
                        replace variable = strtoname(variable) 
                        reshape wide crank cvalue, i(ctry year) j(variable) string
                        gen start = real(substr(year, 1, 4))
                        I added a command to get a numeric year variable, which is needed in Stata for many panel data purposes.

                        Comment


                        • #13
                          Jorrit and Carole thank you for your helpful comments. I eventually managed to get the data in the format I needed.

                          For reference here is the code I used in the end.

                          Code:
                          *first reshape long
                          reshape long c, i(variable year) j(ctry) string
                          gen varcode = subinstr(variable, ".","_",.)
                          drop variable
                          reshape wide c, i(ctry year) j(varcode) string
                          [CODE]. dataex year ctry c10_01Rank c10_01Value

                          ----------------------- copy starting from the next line -----------------------
                          Code:
                          * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input str9 year str31 ctry float(c10_01Rank c10_01Value)
                          "2006-2007" "CôtedIvoire"                      .         .
                          "2007-2008" "CôtedIvoire"                      .         .
                          "2008-2009" "CôtedIvoire"                     95 2.6971736
                          "2009-2010" "CôtedIvoire"                     94 2.7600024
                          "2010-2011" "CôtedIvoire"                     94  2.827854
                          "2011-2012" "CôtedIvoire"                     96 2.7897184
                          "2012-2013" "CôtedIvoire"                     94  2.807671
                          "2013-2014" "CôtedIvoire"                     97 2.8651316
                          "2014-2015" "CôtedIvoire"                     94  2.937273
                          "2015-2016" "CôtedIvoire"                     83  3.204947
                          "2006-2007" "advancedeconomies"                 . 4.4947357
                          "2007-2008" "advancedeconomies"                 .  4.261576
                          "2008-2009" "advancedeconomies"                 .   4.29011
                          "2009-2010" "advancedeconomies"                 . 4.2960324
                          "2010-2011" "advancedeconomies"                 .  4.280821
                          "2011-2012" "advancedeconomies"                 .  4.259641
                          "2012-2013" "advancedeconomies"                 .  4.283554
                          "2013-2014" "advancedeconomies"                 .  4.272146
                          "2014-2015" "advancedeconomies"                 . 4.2799287
                          "2015-2016" "advancedeconomies"                 . 4.3158064
                          "2006-2007" "albania"                          86  3.322037
                          "2007-2008" "albania"                         100    2.5798
                          "2008-2009" "albania"                          99   2.59233
                          "2009-2010" "albania"                          98 2.6576295
                          "2010-2011" "albania"                          99  2.735817
                          "2011-2012" "albania"                          99  2.701206
                          "2012-2013" "albania"                          97   2.74602
                          "2013-2014" "albania"                         105 2.7177415
                          "2014-2015" "albania"                         103  2.730009
                          "2015-2016" "albania"                         102 2.7337904
                          "2006-2007" "algeria"                          48 4.3134747
                          "2007-2008" "algeria"                          44  4.063035
                          "2008-2009" "algeria"                          52  3.901169
                          "2009-2010" "algeria"                          52  3.959521
                          "2010-2011" "algeria"                          51 4.0081244
                          "2011-2012" "algeria"                          45  4.186123
                          "2012-2013" "algeria"                          47  4.164145
                          "2013-2014" "algeria"                          49 4.1321626
                          --more--

                          Comment

                          Working...
                          X