Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • data preparation and merging data

    Hello everyone,

    thank you in advance for your help! Is there any easy or simple way Is there any way I can quickly prepare these two data sets and merge them into one in stata?


  • #2
    Your attachments did not come through--or at least I can't access them on my setup. Even if I could, why make it hard for forum participants to see what you are talking about? The best way to show us some data is to -list- representative observations and paste Stata's output into a code block. That way there are no additional clicks required, nothing to download (with associated risks) and the information will always be legibile (unlike screen shots which almost never are.)

    Comment


    • #3
      sorry i have some trouble with uploading the files.
      here are the links to the data sets:

      "women in development" - data set: http://wdi.worldbank.org/table/1.5
      "growth of output" - data set: http://wdi.worldbank.org/table/4.1

      Comment


      • #4
        Martin: You seem to have missed Clyde's point completely. He didn't ask for links.

        The best way to show us some data is to list representative observations and paste Stata's output into a CODE block.
        Please see also http://www.stata-journal.com/sjpdf.h...iclenum=dm0031 and the FAQ Advice, especially Section 6.

        Comment


        • #5
          ooh... you picked a kinda nasty one. Even if you download as excel, you have two hurdles: their "Excel" is actually still .html. Excel can fix it up, but Stata's import function is totally confused. Also, the first few lines are merged columns, headings broken out by year or gender. three approaches, none nice.
          1. Copy the data portions into Stata directly via copy-and-paste, rename columns. Destring as needed.
          2. Read into Excel to create true excel spreadsheet, chop out the first few lines, and rename columns. Save, then import excel. Destring as needed.
          3. Read into Excel to create true excel spreadsheet, save, them import into Stata. Then drop the first few lines, and rename columns and such to arrive at a usable dataset. Destring as needed.
          Nasty, but doable.

          Comment


          • #6
            He can't even get it into Stata. I suppose he could copy-and-paste from the website... it has merged header lines, and even once you get it into Excel w/out merged header lines, you still need to rename and *then* it has footer lines. Really nasty for a beginner. I'm at least uploading Excel files for you -- it barfs on .dta files for me.

            Martin -- assuming the possible approaches I describe make sense, then it becomes as simple as:

            Code:
            . cd c:\data\merging
            c:\data\merging
            
            . 
            . import excel "C:\data\merging\growth1.xlsx", sheet("4.1_Growth_of_output") firstrow clear
            
            . 
            . save growth1, replace
            file growth1.dta saved
            
            . 
            . import excel "C:\data\merging\women1.xlsx", sheet("1.5_Women_in_development") firstrow clear
            
            . 
            . save women1, replace
            file women1.dta saved
            
            . 
            . merge 1:1 country using growth1
            
                Result                           # of obs.
                -----------------------------------------
                not matched                             0
                matched                               228  (_merge==3)
                -----------------------------------------
            
            . 
            . save womeneconomy, replace
            (note: file womeneconomy.dta not found)
            file womeneconomy.dta saved
            Attached Files

            Comment


            • #7
              ps. This is a chunk of what he's dealing with, and their export excel function is partially broken. Scroll around a bit to get the headers and such. Nasty, nasty formatting for Stata import. Wierd! Upon pasting, the headers went away. Huh. Nod that I advocate copy-and-paste, but if it works a whole lot better than any other approach, maybe forgivable.
              Afghanistan 49.3 59 62 .. .. .. .. .. .. 4 28
              Albania 49.9 74 80 .. .. 29 18
              Algeria 49.4 69 73 .. .. 2.1 2.2 .. .. 2 32
              American Samoa .. .. .. .. .. .. .. .. .. .. .. ..
              Andorra .. .. .. .. .. .. .. .. .. .. .. 50
              Angola 50.4 50 53 .. .. .. .. .. 15 34
              Antigua and Barbuda 52.2 73 78 .. .. .. .. .. .. 0 11
              Argentina 51.1 72 80 .. .. 44 .. 6
              Armenia 48.6 71 78 .. .. 36 11
              Aruba 52.4 73 78 .. .. .. .. .. .. ..
              Australia .. 0.2 0.3 6 26
              Austria 51.2 78 84 .. .. 48 12 33
              Azerbaijan 68 74 .. .. 43 .. .. .. .. 16
              Bahamas, The 51.1 72 78 .. 0.2 0.4 .. 4 13
              Bahrain 37.8 76 77 .. 21 .. .. .. 10
              Bangladesh 49.4 70 71 .. .. .. .. 10 20
              Barbados 50.1 73 78 .. .. .. .. .. 4 17
              Belarus 53.6 67 78 .. .. .. .. .. .. 27
              Belgium 50.9 78 83 .. .. 48 9 38
              Last edited by ben earnhart; 23 Nov 2014, 12:11.

              Comment


              • #8
                Thank you so much for your help and all the effort!

                Comment


                • #9
                  Hello -- regarding the same dataset, how does one change the ".." missing observations into "."? Stata doesn't seem to work either with "rename" or "mvdecode/encode" commands as it doesn't read the ".." observations. Thank you!

                  Comment


                  • #10
                    Several possible ways.

                    First if while you have it in excel you do a search-and-replace all for ".." to "".

                    Second, during the destring process, use the "force" option. This will convert any string to missing, not just ".." so be careful!

                    Third, prior to running destring, run "replace varname="" if varname=".."

                    Comment


                    • #11
                      Destringing with force command worked. Thank you! (I also tried prior to running destring the replace command -- it didn't work. Really strange)

                      Comment

                      Working...
                      X