Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difficulty with importing from excel data sheet

    hello Stata users,

    I have a problem with importing excel files and merging data. and was hoping if anyone could help me.

    The excel datasheet (attached) with 5 blocks of data (coloured columns) labelled as

    1)row_variable, surgery_1_count, surgery_1_%
    2)row_variable, surgery_2_count, surgery_2_%
    3)row_variable, surgery_3_count, surgery_3_%
    4)row_variable, surgery_4_count, surgery_4_%
    5)row_variable, surgery_5_count, surgery_5_%

    I need to merge them as a stata data set with the column varaibles as follows

    row_variable, surgery_1_count,surgery_2_count,surgery_3_count,su rgery_4_count and surgery_5_count.


    I have a problem and was hoping to see if any stata experts could help me.

    The row variables do not have same number of subdivisions

    Eg:
    1)Race in Surgery_1 has 7 subdivisions but Race in Surgery_2 has 6 subdivisions
    2)ASOURCE in surgery_1 has 5 subdivisions but has 4 subdivisions in surgery_3


    I have included an excel sheet and a PDF of the same excel sheet(for those who dont use excel) for clarification purpose.


    Thank you

    Anwar
    Attached Files

  • #2
    This is going to be really tricky to import and then work with.
    I think my strategy would probably be to import each of the blocks of data separately and then merge them together, using row_variable as the id. This assumes that the category labels that appear in the row_variable column are always consistent across blocks. If they aren't you'll likely have to do some additional cleaning.

    Here's a rough example for the first two blocks
    Code:
    import excel using "path/filename", cellrange(A1:C40) firstrow  clear
    save "path/firstblock", replace
    
    import excel using "path/filename", cellrange(D1:F40) firstrow clear
    merge 1:1 row_variable using "path/firstblock"
    This won't work as currently written because you're going to need to do some cleaning to deal with duplicate entries in row_variable. You have two sources of duplicates: blank rows, which you can safely drop and duplicate label names, which you can't. You'll need to either change the text for the labels "missing" and "total" before importing to make them unique. Or you can do it after you import. If you're really only doing this for 5 blocks of data, that's not going to be a problem. If this is really a much larger task than what you've shown, things will be more complicated.

    This is going to result in a dataset with a really weird file structure, though. Currently you have what I would think of as the variables as rows (observations) and the observations as columns. Depending on your goals with the final data you may want to look at the help for the command -xpose- to get to a more useable structure.

    Comment


    • #3
      It's a messy spreadsheet. Before importing the data to Stata, I would attempt to transpose the data in Excel to get rid of the blank lines and all that mess.
      Attached Files
      Kind regards,
      Konrad
      Version: Stata/IC 13.1

      Comment


      • #4
        Also you may also want to have a look at this tool:
        http://vis.stanford.edu/papers/wrangler
        Kind regards,
        Konrad
        Version: Stata/IC 13.1

        Comment


        • #5
          Hello Konrad and Sarah ,
          Thank you very much for the replies. I didnt think ahead of time about data analysis and thereby resulted in such a poorly formatted /structured excel file!. It is a good learning lesson.
          I have been trying struggling for the past 2 days and was able to get it in a shape where I can work with stata .
          Transpose was an excellent idea and helped me a great deal.
          I am trying to understand wrangler ....
          Thank you again for your help .
          Sincerely,
          Anwar

          Comment

          Working...
          X