Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Standardizing var gives multiple 0's and 1's instead of a continuous var

    Hi everyone,

    I am cleaning survey data from the Asianbarometer and want to standardize some variables to range from 0-1.
    This is the code I usually use, in this case for the variable executive approval, that ranges from 1-4. I want it to range from 0-1. The variable educ ranges from 1-9. I also want it to range from 0 to 1.

    sum execapproval
    replace execapproval = (execapproval - `r(min)') / (`r(max)'-`r(min)')
    Here is the data:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(educ execapproval)
    7 2
    8 3
    5 3
    8 2
    7 3
    7 3
    5 3
    3 1
    9 .
    8 1
    7 3
    6 2
    6 3
    5 2
    6 2
    7 1
    4 2
    7 2
    8 3
    5 3
    7 3
    1 .
    1 3
    3 4
    1 .
    8 3
    end
    label values educ labels10
    label def labels10 1 "No formal", modify
    label def labels10 3 "Complete primary /elementary", modify
    label def labels10 4 "Incomplete secondary/high", modify
    label def labels10 5 "Complete secondary/high", modify
    label def labels10 6 "Incomplete secondary/high school", modify
    label def labels10 7 "Complete secondary/high school", modify
    label def labels10 8 "Some university/college-level, with diploma", modify
    label def labels10 9 "With University/College degree", modify
    label values execapproval labels56
    label def labels56 1 "Not at all satisfied", modify
    label def labels56 2 "Not very satisfied", modify
    label def labels56 3 "Fairly satisfied", modify
    label def labels56 4 "Very satisfied", modify

    for some reason the code neither works for educ nor execapproval, something seems to be up with the labels but i cannot figure out what it is. It works if I create a new variable, for example:

    gen educn = educ
    sum educn
    replace educn = (educn - `r(min)') / (`r(max)'-`r(min)')
    label var educn "Education - normalized"
    But not for educ.

    Can someone help?

    Thank you so much!!

    Best,
    Hannah

  • #2
    You start out with your variables having value labels attached to them. When you rescale them, most of the resulting values are not integers. And the ones that are, namely 0 and 1, do not correspond correctly to the labels. (0 isn't part of either label. And 1 now represents 4 or 9 in the original variable and is labeled accordingly--which is wrong, of course.) The problem goes away when you remove the value label.


    Code:
    sum execapproval
    replace execapproval = (execapproval - `r(min)') / (`r(max)'-`r(min)')
    list, clean
    
    label values execapproval    // REMOVE THE VALUE LABEL
    list, clean
    The problem you are having is merely an illusion. Internally, Stata has the correctly calculated numbers--but because you haven't removed the value labels, it is displaying them in a misleading way. Even if you left the labels on, which, to be clear, I don't recommend, all calculations will be done correctly with the new values--the labels just mislead your eyes into seeing something different from what Stata is doing.

    Comment


    • #3
      Hi Clyde,
      thank you so much for your quick help on this matter!
      I have tried to run your code, unfortunately I still only get multiple 0's and 1's.
      Do i need to modifiy the code somehow? I also tried to the code "label val varname" before running sum execapproval...

      Thank you!
      Kind regards,
      Hannah

      Comment


      • #4
        I can't reproduce the problem reported. Here is a minimal test.

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input byte educ
        1
        5
        9
        end
        
        su educ
        gen educn = (educ - r(min)) / (r(max) - r(min))
        label var educn "Education - normalized"
        
        list
        Values are returned as 0, 0.5, 1, as they should be.


        The last few lines of code are just a slightly shorter variant of yours. Your code gives the same result.
        Last edited by Nick Cox; 28 Mar 2024, 06:02.

        Comment


        • #5
          I wonder if the display format is somehow messed up. Try running your code again, getting the 0's and 1's. Then run -format execapproval educ %3.2f- and -list- or -browse- the data. See if that shows the numbers correctly.

          Comment

          Working...
          X