Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "Decode" Command on Numeric Variables - Grave Issues

    Hello,

    I am appending several rounds/datasets from different demographic and health surveys. The variables are identical across samples, but the value labels are likely to be different. One dataset may have 10 regions, another may have two different ones. In order to append these and synchronize the value labels, i was advised to convert the variables that have value labels to string, and then encode them to numeric after the append.

    But I am running into issues with some variables that have labels attached only to some numbers. Here is a label list for one of such variables:

    Code:
    v852a -- how long ago first had sex with most recent partner
    
             101 days: 1
             199 days: number missing
             201 weeks: 1
             299 weeks: number missing
             301 months: 1
             399 months: number missing
             401 years: 1
             499 years: number missing

    This variable is actually an integer, with values ranging from 102 through 198 days; 202 - 298 weeks, etc.

    Given that this variable appears in different dataset with different value labels, I decided to change it to string, based on advice given here earlier last month. I decode it via the following command:

    Code:
    decode v852a, gen(v852a_string)
    The decoded variable however disregarded all the numeric codes/values (i.e., 102 through 198; 202 through 298, etc), leaving me only partial data, as shown in this frequency table.



    Code:
    v852a_string -- how long ago first had sex with most recent partner
    ---------------------------------------------------------------
                      |      Freq.    Percent      Valid       Cum.
    ------------------+--------------------------------------------
    Valid   days: 1   |        211       1.79      18.74      18.74
            months: 1 |        224       1.90      19.89      38.63
            weeks: 1  |         77       0.65       6.84      45.47
            years: 1  |        614       5.21      54.53     100.00
            Total     |       1126       9.56     100.00           
    Missing           |      10658      90.44                      
    Total             |      11784     100.00                      
    ---------------------------------------------------------------


    How can I resolve this issue so that the variable can be decoded, but retain the numeric values, which can then be encoded later.

    thanks, Yawo

  • #2
    Here's something that might work; I haven't tried it, though: Note that -append- has a "nolabel" option. My thought would be to decide which dataset has the "best" set of labels. Start with a blank data set. Append all of the datasets except that one using the "nolabel" option. Then, append that "best" labeled dataset last, but without the "nolabel" option. Modify the labels in the final data set to fix any omitted labels.

    Comment


    • #3
      I don't see any way to resolve this data coding scheme using -decode-. It's a pretty bizarre way to encode this kind of data. Even if you succeed in somehow working it out with -decode- and -encode- I think it will make your life miserable when you try to work with.

      I would take a different approach and break this up into two separate variables: a quantity and a time unit. The former should be a straightforward number, and the latter can be a string or a value-labeled number. (In the code below, I go the latter route.)

      Code:
      gen v852a_quantity = mod(v852a, 100)
      replace v852a_quantity = . if v852a_quantity == 99
      
      label define time_unit    1    "days"    ///
                              2    "weeks"    ///
                              3    "months"    ///
                              4    "years"
      gen byte v852a_unit = floor(v852a/100)
      label values v852a_unit time_unit
      Then I would get rid of v852a itself, and work with these new variables.

      You could, in the end, of course, then reverse the steps in this code and recreate a v852a variable that looks like the one you started with. But I think it will cause you no end of difficulty to work with. For example, v852a indifferently represents 60 days (= 2 months) as 160 or 202. Worse, 90 days is a longer duration than that, but it would be coded as 190 which is bigger than the first (OK) but smaller than the second (OMG!). So as a numeric variable it is pretty useless.

      Added: Crossed with #2. I didn't think of using the -nolabel- option. That might indeed work. But then, in the end, you'll be stuck with a v852a style variable that I think you will regret having to work with.

      Comment

      Working...
      X