Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape wide error

    I'm using Stata 15.1 on a Windows 10 machine

    I'm struggling to get my head around -reshape- (again).

    I have data set up in long format as in the example below but want to reshape it wide so that I have male1, male2, male3 etc with one line per 'org_code'

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str6 org_code byte age long number_of_patients str9 ons_ccg_code str4 sex
    "A81001" 0 21 "E38000075" "MALE" 
    "A81001" 1 22 "E38000075" "MALE" 
    "A81001" 2 19 "E38000075" "MALE" 
    "A81001" 3 24 "E38000075" "MALE" 
    "A81001" 4 29 "E38000075" "MALE" 
    "A81001" 5  8 "E38000075" "MALE" 
    "A81001" 6 25 "E38000075" "MALE" 
    "A81001" 7 26 "E38000075" "MALE" 
    "A81001" 8 24 "E38000075" "MALE" 
    "A81001" 9 19 "E38000075" "MALE"
    end
    I have tried

    Code:
    reshape wide number, i(org_code) j(age)
    Which nearly gets me there, but I need 'MALE' instead of number so I try

    Code:
    reshape wide sex, i(org_code) j(age)
    But I get this error message:
    variable number_of_patients not constant within org_code
    Your data are currently long. You are performing a reshape wide. You typed something
    like

    . reshape wide a b, i(org_code) j(age)

    There are variables other than a, b, org_code, age in your data. They must be constant
    within org_code because that is the only way they can fit into wide data without loss of
    information.

    The variable or variables listed above are not constant within org_code. Perhaps the
    values are in error. Type reshape error for a list of the problem observations.

    Either that, or the values vary because they should vary, in which case you must either
    add the variables to the list of xij variables to be reshaped, or drop them.
    r(9);


    Any help greatly appreciated

  • #2
    I'm struggling to get my head around what you want the result to look like. What values are supposed to be in the variables male1 male2 etc. that you want? The numbers that are currently in the variable called number_of_patients? And are there also observations in your data set with sex = "FEMALE"?

    I'm making a guess here as to what you want. I've expanded your example data to include some observations with sex = "FEMALE". If this is not what you had in mind, post back with an example of a hand-worked data set that looks like the results you want to get.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str6 org_code byte age long number_of_patients str9 ons_ccg_code str6 sex
    "A81001" 0 21 "E38000075" "MALE"  
    "A81001" 1 22 "E38000075" "MALE"  
    "A81001" 2 19 "E38000075" "MALE"  
    "A81001" 3 24 "E38000075" "MALE"  
    "A81001" 4 29 "E38000075" "MALE"  
    "A81001" 5  8 "E38000075" "MALE"  
    "A81001" 6 25 "E38000075" "MALE"  
    "A81001" 7 26 "E38000075" "MALE"  
    "A81001" 8 24 "E38000075" "MALE"  
    "A81001" 9 19 "E38000075" "MALE"  
    "A81001" 0 14 "E38000075" "FEMALE"
    "A81001" 1 31 "E38000075" "FEMALE"
    "A81001" 2 12 "E38000075" "FEMALE"
    "A81001" 3  6 "E38000075" "FEMALE"
    "A81001" 4 15 "E38000075" "FEMALE"
    "A81001" 5  7 "E38000075" "FEMALE"
    "A81001" 6 18 "E38000075" "FEMALE"
    "A81001" 7 22 "E38000075" "FEMALE"
    "A81001" 8 19 "E38000075" "FEMALE"
    "A81001" 9 24 "E38000075" "FEMALE"
    end
    
    gen stratum = lower(sex) + string(age)
    drop sex age
    rename number_of_patients _
    reshape wide _, i(org_code) j(stratum) string
    rename _* *
    Added: Of course, I will caution you to be careful what you wish for. Most Stata data management and analysis is easier, and sometimes only possible, in long layout. So before you do this, make sure you have a really good reason to go wide here: it will be a waste of effort if your next steps begin with -reshape long-.

    Comment


    • #3
      You need to mention all variables which are not constant within your i() variable.

      And I am not on clear what you want to get, or similarly what is your long structure

      is it

      org_code
      age

      that is, is the i(org_code) and the j(age)?

      Comment


      • #4
        EDITED TO ADD: This crossed with #2 and #3 and I share their concerns.

        So, I will show you this with the caveat that for most analysis keeping data in the LONG format is usually better.

        Code:
        bysort org_code: gen obs_no = _n  // This is so you can have sex1, sex2, sex3, etc  (or male1, male2, male3, if you would prefer)
        reshape wide sex age number_of_patients , i(org_code) j(obs_no)
        
        * To allow the -list- command to display more easily, I renamed number_of_patients to np
        rename number_of_patients1 np1
        rename number_of_patients2 np2
        rename number_of_patients3 np3
        rename number_of_patients4 np4
        
        . list org_code- sex4, noobs
        
          +------------------------------------------------------------------------------------------+
          | org_code   age1   np1   sex1   age2   np2   sex2   age3   np3   sex3   age4   np4   sex4 |
          |------------------------------------------------------------------------------------------|
          |   A81001      0    21   MALE      1    22   MALE      2    19   MALE      3    24   MALE |
          +------------------------------------------------------------------------------------------+

        Comment


        • #5
          Clyde Schechter Thanks for your suggestion and apologies for the vagueness of my post! What you have provided seems just what I need. I wont be reshape long after this!

          Comment

          Working...
          X