Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping Global Competitiveness Index data

    I can not reshape the Global Competitiveness Data. It's an excel file which has year and variable in rows and country names in columns. The file can be downloaded here https://docs.google.com/viewer?url=h...2006-2015.xlsx . I want to convert the data to panel. Please help.

  • #2
    Please post an informative extract from your dataset using dataex (in SSC) -- see the forum FAQ for the reasons why you are recommended to this in order to get answers.
    In addition, there seem 2 potential issues , and I recommend that you distinguish them when you re-post. (1) How to read your spreadsheet into Stata (have you looked at import excel?); (2) how to convert the data from whatever format it is in into a panel format -- by which I suppose you mean a long format organisation which could be xtset country year (or something like that.

    Comment


    • #3
      I imported the excel file into Stata by (import excel "C:\Users\Dang Quang Vinh\Dropbox (Personal)\DATA\GCI_Dataset_2006-2015.xlsx", sheet("Sheet4") firstrow)
      . I want to convert it to panel format with xtset
      This is what I have with dataex:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str9 year str78 Series str39(Albania Algeria Angola)
      "2015" "           1.01 Property rights, 1-7 (best)"                  "3.04839706754351"  "3.748118037386879" ""
      "2015" "           1.02 Intellectual property protection, 1-7 (best)" "3.214500489434996" "3.273170439196374" ""
      "2015" "           1. Property rights"                                "3.103764874840672" "3.589802171323377" ""
      "2015" "           1.03 Diversion of public funds, 1-7 (best)"        "3.21977904293087"  "3.015871495657017" ""
      "2015" "           1.04 Public trust in politicians, 1-7 (best)"      "2.561593987724998" "2.906529824844914" ""
      end
      To clarify, the current data format is : year, series (labels should be variables), country1, country2, ...
      and I want to have country_id, year, vars.
      Thanks

      Comment


      • #4
        There are two issues here. One is that you can't get from what you have to what you want with a single -reshape-. You need to first -reshape long- and then -reshape wide- with a different -j()- option.

        But it is also complicated because the variable Series, that you want to turn into variable names, contains strings that are not admissible Stata variable names. A Stata variable name can only contain letters, numbers, and underscore (_) characters. And it cannot exceed 32 characters. These strings fail on both counts. There is a function -strtoname()- that converts a string to the nearest equivalent Stata variable name. The results are not all that pretty, but they are serviceable. You might, however, want to rename these variables to something more convenient after you're done. There is one more complication here: -strtoname()- creates candidate variable names that can be as long as 32 characters (and in your case some of them are). When -reshape- tries to prefix them, you are now at 33 characters, which is not allowed. So after -strtoname()- you have to cut out a character. The code below removes the last character.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str9 year str78 Series str39(Albania Algeria Angola)
        "2015" "           1.01 Property rights, 1-7 (best)"                  "3.04839706754351"  "3.748118037386879" ""
        "2015" "           1.02 Intellectual property protection, 1-7 (best)" "3.214500489434996" "3.273170439196374" ""
        "2015" "           1. Property rights"                                "3.103764874840672" "3.589802171323377" ""
        "2015" "           1.03 Diversion of public funds, 1-7 (best)"        "3.21977904293087"  "3.015871495657017" ""
        "2015" "           1.04 Public trust in politicians, 1-7 (best)"      "2.561593987724998" "2.906529824844914" ""
        end
        
        rename (Albania Algeria Angola) _=
        reshape long _, i(year Series) j(country) string
        replace Series = trim(itrim(Series))
        gen series = strtoname(Series, 0)
        replace series = substr(series, 1, strlen(series)-1)
        drop Series
        reshape wide _, i(country year) j(series) string
        Notes:

        1. Your variables have been read in as strings, but I think you would want year, and the various series to be numeric for most purposes. So you can use -destring-.

        2. This code gives you the arrangement you asked for. It may or may not be the most suitable arrangement for what you want to do. For most purposes, Stata handles long layout data better than wide. So you might want to consider stopping after the -reshape long- command above and keeping your data in that arrangement. It will probably make your analyses easier than what you have asked for.

        Comment


        • #5
          Thank you Clyde. Your code did the job well. It takes a bit of time to find the variables I need but I am happy with it. I am happy to share the panel data with anybody how needs it.

          Comment


          • #6
            Hi, I'm having trouble reshaping the Global Competitiveness Data. Could you please share the data with me? I'd be much appreciated!

            Comment

            Working...
            X