Difficulty with importing from excel data sheet

anwar dudekula

Join Date: Apr 2014

Posts: 61
#1

Difficulty with importing from excel data sheet

30 Jul 2014, 15:16

hello Stata users,

I have a problem with importing excel files and merging data. and was hoping if anyone could help me.

The excel datasheet (attached) with 5 blocks of data (coloured columns) labelled as

1)row_variable, surgery_1_count, surgery_1_%
2)row_variable, surgery_2_count, surgery_2_%
3)row_variable, surgery_3_count, surgery_3_%
4)row_variable, surgery_4_count, surgery_4_%
5)row_variable, surgery_5_count, surgery_5_%

I need to merge them as a stata data set with the column varaibles as follows

row_variable, surgery_1_count,surgery_2_count,surgery_3_count,su rgery_4_count and surgery_5_count.

I have a problem and was hoping to see if any stata experts could help me.

The row variables do not have same number of subdivisions

Eg:
1)Race in Surgery_1 has 7 subdivisions but Race in Surgery_2 has 6 subdivisions
2)ASOURCE in surgery_1 has 5 subdivisions but has 4 subdivisions in surgery_3

I have included an excel sheet and a PDF of the same excel sheet(for those who dont use excel) for clarification purpose.

Thank you

Anwar
Attached Files

Stata_question_example_file.pdf (64.4 KB, 1 view)

Stata_question_example_file.xlsx (12.6 KB, 1 view)
Tags: None
Sarah Edgington

Join Date: Apr 2014

Posts: 284
#2

30 Jul 2014, 15:58

This is going to be really tricky to import and then work with.
I think my strategy would probably be to import each of the blocks of data separately and then merge them together, using row_variable as the id. This assumes that the category labels that appear in the row_variable column are always consistent across blocks. If they aren't you'll likely have to do some additional cleaning.

Here's a rough example for the first two blocks

Code:

import excel using "path/filename", cellrange(A1:C40) firstrow clear save "path/firstblock", replace import excel using "path/filename", cellrange(D1:F40) firstrow clear merge 1:1 row_variable using "path/firstblock"

This won't work as currently written because you're going to need to do some cleaning to deal with duplicate entries in row_variable. You have two sources of duplicates: blank rows, which you can safely drop and duplicate label names, which you can't. You'll need to either change the text for the labels "missing" and "total" before importing to make them unique. Or you can do it after you import. If you're really only doing this for 5 blocks of data, that's not going to be a problem. If this is really a much larger task than what you've shown, things will be more complicated.

This is going to result in a dataset with a really weird file structure, though. Currently you have what I would think of as the variables as rows (observations) and the observations as columns. Depending on your goals with the final data you may want to look at the help for the command -xpose- to get to a more useable structure.
Comment
Konrad Zdeb

Join Date: Apr 2014

Posts: 496
#3

31 Jul 2014, 05:18

It's a messy spreadsheet. Before importing the data to Stata, I would attempt to transpose the data in Excel to get rid of the blank lines and all that mess.
Attached Files

t.xlsx (28.4 KB, 1 view)

Kind regards,
Konrad
Version: Stata/IC 13.1
Comment
Konrad Zdeb

Join Date: Apr 2014

Posts: 496
#4

31 Jul 2014, 06:41

Also you may also want to have a look at this tool:
http://vis.stanford.edu/papers/wrangler

Kind regards,
Konrad
Version: Stata/IC 13.1
Comment
anwar dudekula

Join Date: Apr 2014

Posts: 61
#5

03 Aug 2014, 13:31

Hello Konrad and Sarah ,
Thank you very much for the replies. I didnt think ahead of time about data analysis and thereby resulted in such a poorly formatted /structured excel file!. It is a good learning lesson.
I have been trying struggling for the past 2 days and was able to get it in a shape where I can work with stata .
Transpose was an excellent idea and helped me a great deal.
I am trying to understand wrangler ....
Thank you again for your help .
Sincerely,
Anwar
Comment

Announcement

Difficulty with importing from excel data sheet

Comment

Comment

Comment

Comment