Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variable suffix for long-to-wide reshaped data

    Hello!

    I am working with data on women's contraceptive histories. In long form, cases are episodes of contraceptive use or non-use. In wide form, cases are women. I have reshaped my data from long to wide. No problems with the transformation, per se. However, many of the variable names (which will become the stubs) already end in a numeral, and my j variable is numeric, ranging from 1-10. I would like to avoid having variable "ev902", for example, from becoming ev9021, ev9022, ev9023,...ev90210 in wide form. I would much prefer to have the suffix to my variables be _j, rather than just the j variable (i.e. producing ev902_1, ev902_2, ev902_3, ... ev902_10 in the above example).

    Is there a way to specify that in the reshape command? Does the key to doing this lie in making my j variable a string variable? If so, how should I approach that?

    My current code is as follows:
    Code:
    reshape wide ev004 ev9* cmcclock eventorder endmo startmo startstate, i(survey v001 v002 v003 caseid) j(evid)
    evid ranges in values from 1 to about 10.
    ev9* is a list of variable stubs that go from ev900-ev917. There are 3 or 4 instances where there is ev902 and ev902a.

    It also may be helpful (or not) to know that:
    • I am working with standard recode files in which the variables named ev004-ev917 follow a standard and documented naming convention, with which I and my co-researchers are quite familiar. I am, therefore, reluctant to rename the variables ending in numerals. (The variables with more descriptive variable names are ones we've created specifically for this analysis.)
    • The above code is contained within a loop that is performing this reshaping on a set of 12 datasets. The number of women ranges from about 6,000-21,000 and the values on evid range from 1-5 to 1-10 in these datasets.
    Thank you for any guidance!
    Kerry MacQuarrie

  • #2
    You will find the tool you need in the rename group command; see help rename group for details on this impressively powerful command. In the following example I add an underscore to the ends of the variables names so that when they become stubs, the original variable name remains obvious.
    Code:
    . describe, simple
    x91  x92
    
    . rename (x9*) =_
    
    . describe, simple
    x91_  x92_

    Comment


    • #3
      Contrary to what the help file for reshape seems to suggest in the basic syntax section:

      reshape wide stubnames, i(varlist) [options]
      going from long to wide requires variable names (in fairness, this is pointed out later in the notes). So there is no problem reshaping variables ev902 and ev902a.

      In terms of the suffix you would like to use, it sometimes helps to think outside the box. The j variable does not have to be numeric. In the example below, I generate a string version that starts with an underscore and uses a format with leading zeros:

      Code:
      clear
      input id evid ev902
      1 1 11
      1 2 12
      1 15 15
      end
      
      gen ev902a = ev902 + 10
      
      gen jevid = "_" + string(evid,"%02.0f")
      drop evid
      
      reshape wide ev902 ev902a, i(id) j(jevid) string
      
      list, abbrev(10)
      and the results
      Code:
      . list, abbrev(10)
      
           +-------------------------------------------------------------------------+
           | id   ev902_01   ev902a_01   ev902_02   ev902a_02   ev902_15   ev902a_15 |
           |-------------------------------------------------------------------------|
        1. |  1         11          21         12          22         15          25 |
           +-------------------------------------------------------------------------+

      Comment


      • #4
        Of course, thank you!

        I was so focused on working within the reshape command to produce variables in the form I wanted that I didn't even consider the other tools already at my disposal. Easy peasy! I adapted my code as follows:

        Code:
        ********************
        ***RESHAPE: long to wide EV file
        ********************
        
        *Drop unnecessary vars
        drop v000 v005 v007 v008 v011 v017 v018 v019 v101 v102 v106
        
        **Add suffix "_" to var stubs
        rename (ev* cmcclock numevents endmo startmo1 startmo obsmo obsmoend obsdur mfporno start startstate) =_
        rename (evid_) (evid)
        
        **Reshape
        reshape wide ev004_ ev9* cmcclock_ eventorder_ evinwindow_ numevents_ endmo_ startmo1_ startmo_ obsmo_ obsmoend_ obsdur_ mfporno_ start_ startstate_, i(survey v001 v002 v003 caseid) j(evid)
        And it works beautifully. This is what I get, just as I wanted:
        Data long -> wide
        Number of obs. 37750 -> 21762
        Number of variables 38 -> 325
        j variable (10 values) ev004 -> (dropped)
        xij variables:
        ev900_ -> ev900_1 ev900_2 ... ev900_10
        ev901_ -> ev901_1 ev901_2 ... ev901_10
        ev901a_ -> ev901a_1 ev901a_2 ... ev901a_10
        ev902_ -> ev902_1 ev902_2 ... ev902_10
        ev903_ -> ev903_1 ev903_2 ... ev903_10
        ev904_ -> ev904_1 ev904_2 ... ev904_10
        ev905_ -> ev905_1 ev905_2 ... ev905_10
        etc...
        Lovely! Thank you, William, for helping me to see this from a new perspective.

        Cheers,
        Kerry

        Comment


        • #5
          Thanks, Robert. I'm sure your strategy would work equally well. That's along the lines of what I was thinking in terms of converting the j variable to a string variable. I saw your post after William's and went with that strategy.

          Comment

          Working...
          X