Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • importing data into stata

    Hello,

    I have a problem importing a certain excel file into stata. It is a study of countries whereby every country is assessed according to different parameters (variables) for over a period of 2008-2014. The countries are assess each year according to the same criteria. The year is on the top of the excel sheet and the variables are one row below. How can i import this file into stata so the program recognizes the variables (which are the same each year) and also the years?

    Thank you very much in advance for your help!

    Best regards,
    Martin
    Last edited by Martin; 24 Oct 2014, 11:24. Reason: I have uploaded the excel sheet now, i hope this makes my problem a bit more clear.

  • #2
    Please show us exactly what this looks like. Use CODE delimiters and copy and paste the top few lines of the file in between. If the previous sentence makes no sense, you should read the FAQ as requested. In fact, please read it any way.

    EDIT: Images attached as photos are unreadable on small monitors or by many people. Please note the suggestions above.
    Last edited by Nick Cox; 24 Oct 2014, 11:18.

    Comment


    • #3


      here is the excel file. I hope this makes my problem a bit more clear.
      Attached Files

      Comment


      • #4
        The sheet covers several years. It needs restructuring to conform Stata's rectangular structure with unique variable names that match Stata's syntax rules in row 1. In Excel I would:

        1. copy-and paste so that you have one sheet for each year. Let the sheet name be the year. It is probably easiest to make a number of copies of the Indicator Scores sheet and next delete the columns not belonging to that year. For each year, keep the country code and name columns.

        2. For each sheet, delete row 6, and next row 1-4 (in that order)

        3. Look carefully. Does it now match Stata's requirement? If yes -> 4.

        4. In Stata, use the -import excel- command for each sheet. Generate the variable -year- (value 2008 for year 2008).

        5. -save- the Stata file with, for example, the name 2008.dta for year 2008.

        6. Use the -append- command to combine the files.

        Comment


        • #5
          Alternatively,
          1. Delete the first four rows. Un-merge what is now cell A1, and fix A1 and A2 to have variable names, presumably ISO and Country.
          2. take what are now rows 1 and 2, and copy/transpose into a new sheet.
          3. Use that to add a column for year.
          4. Bring that into a text editor to create variable names including year by converting table to text and/or search-and-replace.
          5. Paste that (transposed) back into your substantive data as row 1.
          6. Delete row 2 (the descriptions).
          7. Now you can import with proper, unique variable names.
          8. As a bonus extra-credit activity, you might use the text editor to generate nice variable label commands at this point.
          If you do want your data long, you are now in position to simply use reshape.

          Comment


          • #6
            Ah, just for the heck of it, to make sure I got the steps right, I was doing it anyway. So see attached.
            Attached Files

            Comment


            • #7
              Martin --

              As an FYI, it is customary to use first and last names on the list, so we know who we are working with on an issue (see the FAQs item 6). Also it is nice to know if advice brought you closure on a problem. Maybe I went beyond advice...

              Comment


              • #8
                Hello. I can confirm that the following steps make the document work.
                It was also nice that the missing observations were already marked as "." in the excel file. How would you do that in stata if it hadn't already been done?

                Also, I noted from the new file that there is a crucial difference between the two files - in the first file, each variable had the same name independently of the year, so that you could have them already "summed" and you could create new variables by specifying year==x. In the new file, they are already divided per year making it so that you have to merge them every time. Am I correct? How would you work with that and how could it be changed so that year is already included in the variables?

                Sorry if my questions weren't exactly clear, I'm still a newbie with the program. Thank you in advance.

                Alberto

                Comment


                • #9
                  Thank you very much for your help, I am very greatful! It is working now I much appreciate your quick advice!

                  Comment


                  • #10
                    Special thanks to ben earnhart It was very kind to post the files.

                    Comment


                    • #11
                      I mentioned -reshape- . See this for an example of the reshape command in action. Unfortunately, I used "_" when naming things, which looks nicer (in my opinion) in wide form, but is just dangling out there in long. So check out the -rename- command, not too bad to get rid of the "_"

                      Code:
                      reshape long OVERALL_ DIST_ POLI_ HOMI_ JAIL_ WMID_ INCO_ DEMO_ CRIM_ INST_ REHR_ ///
                      WEIM_ GTDX_ NDIC_ MEXP_ ARMY_ UNFU_ HEWE_ WEEX_ REFU_ RELA_ CONF_ NDEC_, i(ISO) j(year)

                      Comment

                      Working...
                      X