Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help looping with levelsof and egen...

    I need to create a binary indicator (weightmedian_binary) based on whether that variable (weight) is below or above its median value (weightmedian_priceq5) within each quintile of another variable (price_q5). Given how confusing this sounds, I have tried to illustrate with an example using the auto dataset. The issue is I need to generate multiple variables like this - beyond just weight - all based on whether they are below/above their median value within each quintile of price. I am trying to write a do file, but I am having difficulties since I do not know how to reference the median values from levelsof, have the loop correctly iterate quintile values, etc. Any help would be greatly appreciated. Thanks in advance!

    Code:
    sysuse auto
    egen price_q5 = xtile(price), n(5)
    egen weightmedian_priceq5 = median( weight ),  by( price_q5)
    levelsof weightmedian_priceq5
    gen weightmedian_binary = .
    replace weightmedian_binary = 0 if price_q5==1 & weight<2640
    replace weightmedian_binary = 1 if price_q5==1 & weight>=2640
    replace weightmedian_binary = 0 if price_q5==2 & weight<2650
    replace weightmedian_binary = 1 if price_q5==2 & weight>=2650
    replace weightmedian_binary = 0 if price_q5==3 & weight<2670
    replace weightmedian_binary = 1 if price_q5==3 & weight>=2670
    replace weightmedian_binary = 0 if price_q5==4 & weight<3280
    replace weightmedian_binary = 1 if price_q5==4 & weight>=3280
    replace weightmedian_binary = 0 if price_q5==5 & weight<3890
    replace weightmedian_binary = 1 if price_q5==5 & weight>=3890

  • #2
    To my understanding, what you want can be achieved with
    Code:
    egen price_q5 = xtile(price), n(5)
    egen weightmedian_priceq5 = median( weight ), by( price_q5)
    gen weightmedian_binary = (weight > weightmedian_priceq5)
    I don't understand why you thought all the -replace0 statements were needed. To understand the sense of what I suggest, you might try browsing the relevant variables after each of the preceding statements.

    All this being said: I'd be hard put to think of how what you are doing would be an effective way to analyze your data. My guess is that what you ultimately want to know could better be achieved by some more conventional analysis. Among things, I don't know why you'd want to throw away information by making a continuous variable binary.

    Comment


    • #3
      Thank you - your solution is much more elegant/simpler, and very importantly, it also highlights an issue I did not notice earlier - the levelsof command displays values from smallest to largest; however, in this case, that does not coincide with the quantile-specific median values (3rd/4th were switched). I wrote the code in long form when I was trying to troubleshoot it because my data, unlike auto, has missing observations and it was helpful to check/run crosstabs after each step manually. I do recognize it is not efficient though. The binary variables are only being used for descriptive stats (I gave a very simplified example - my data/purposes would not produce the symmetrical results one would expect given the example above), but I appreciate your warning about the loss of information and helpful answer/code overall.
      Last edited by Monica Aswani; 23 Mar 2019, 14:37.

      Comment

      Working...
      X