Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Gen command replace "0 real changes made" not working for numeric variable

    Hello,

    I am trying to generate a variable that categorizes departments into 3 groups.
    I have successfully done this before, however, it doesn't seem to be working for this particular variable. I searched the forum/manual and did not find an answer.

    Code:
    gen group=.
    replace group = 1 if Dept_ID==4748 | Dept_ID==4834 | Dept_ID==4580 | Dept_ID==4472 | Dept_ID==4014


    Output:
    . gen group=.
    (3,111 missing values generated)


    . replace group = 1 if Dept_ID==4748 | Dept_ID==4834 | Dept_ID==4580 | Dept_ID==4472 | Dept_ID==4014
    (0 real changes made)


    I then tried to use the egen command, didn't work either.

    Code:
    egen group1 = anycount(Dept_ID), values(4748 4834 4580 4472 4014)

    tab group1, m

    Output:

    Dept_ID == |
    4748 4834 |
    4580 4472 |
    4014 | Freq. Percent Cum.
    ------------+-----------------------------------
    0 | 3,111 100.00 100.00
    ------------+-----------------------------------
    Total | 3,111 100.00

    I then tried to see if there was something different about this variable.
    It is not a string variable (although I did try that as well). It is a numeric long variable and strangely says the range is 1-17. This might mean that my Dept_ID numbers must be labels (?) rather than the actual value.
    Could this be the problem? I cannot figure out how to look at the underlying value...

    codebook Dept_ID

    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Dept_ID (unlabeled)
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    Type: Numeric (long)
    Label: Dept_ID

    Range: [1,17] Units: 1
    Unique values: 17 Missing .: 58/3,111

    Examples: 4 4683
    6 4688
    10 4807
    14 4937

    describe Dept_ID

    Variable Storage Display Value
    name type format label Variable label
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Dept_ID long %12.0g Dept_ID



    It is my first time posting and I did my best to follow all instructions, please forgive me if I missed something.
    Also note that this .dta file is downloaded from R and I can't help but feel like this might be contributing to these issues.

    Thank you for any help you can provide!




  • #2
    The values of that variable run from 1 to 17. The text like 4738 and 4834 constitutes value labels. There aren’t any numeric values like 4834, which is why nothing changed. See

    Code:
    help label
    for more information.

    Comment


    • #3
      To add to #2, at some point earlier you or someone else applied encode to a string variable. Your string variable had (mostly) values like

      Code:
      "4748" "4834" "4580" "4472" "4014"
      but there were also 58 missing values. The effect of encode -- with nothing else said -- is to map distinct values in alphanumeric order to integers 1 up and assign the string values as value labels.

      Consider this sill example where matters are more obvious.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str6 fruit
      "orange"
      "banana"
      "apple" 
      "apple" 
      "banana"
      end
      
      .encode fruit, gen(encoded)
      
      . tab encoded
      
          encoded |      Freq.     Percent        Cum.
      ------------+-----------------------------------
            apple |          2       40.00       40.00
           banana |          2       40.00       80.00
           orange |          1       20.00      100.00
      ------------+-----------------------------------
            Total |          5      100.00
      
      . tab encoded, nolabel
      
          encoded |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                1 |          2       40.00       40.00
                2 |          2       40.00       80.00
                3 |          1       20.00      100.00
      ------------+-----------------------------------
            Total |          5      100.00
      
      . label li encoded
      encoded:
                 1 apple
                 2 banana
                 3 orange
      Why your data were previously string I cannot say.

      Comment

      Working...
      X