Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to Import Excel Data When Variable Names Span Multiple Rows

    Hello Statalist members,

    I’m trying to import an Excel dataset into Stata, but my variable names span multiple rows instead of being in a single row. I want to keep the information for each disease. My dataset looks like below.
    Click image for larger version

Name:	2025-02-17 214526.png
Views:	1
Size:	88.4 KB
ID:	1772930

    My questions are:
    • How can I combine multiple rows into a single row of variable names in Stata?
    • How can I skip unnecessary rows and import the dataset correctly?
    What’s the best way to handle this situation? Thank you for your help!


  • #2
    When I face this situation, I copy the sheet and then create my own variable names that Stata can accurately read. You can then label the variables to track back.

    There may be better ways.

    af_1415_register chd_1415_register, that sort of thing.
    Last edited by George Ford; 17 Feb 2025, 15:07.

    Comment


    • #3
      I usually read in the entire dataset as strings. Some of the header information may be close to legal and appropriate variable names; some of the rest may be more like variable label content. Once you've used or ignored the first few rows, the last step is to destring what needs to be destrung.

      Comment


      • #4
        I usually do it the way Nick Cox described in #3. But in this case, it seems that columns A-K are naturally string variables and have their variable names in just one row (row 3), whereas everything from L onward is numeric. So I would probably do it in steps.

        First, import columns A-K starting in cell A3 going to the last row in column K in the -cellrange()- option and also specifying the -firstrow- option. Then generate a variable marking the observation number: -gen obs_no = _n-. Save that data set.

        Next, import columns L through the last column starting in cell L1. Create useful variable names from the information in rows 1-3 and -rename- the variables accordingly. Then drop rows 1-3. Again -gen obs_no = _n- Now -destring _all, replace- to get the numerical variables as numerical data storage types. Merge that with the previously saved data set, and save the merged result.



        Comment

        Working...
        X