Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using gen and replace to create variable for changed values

    Hello everyone,

    I am totally stuck and hope someone can help.

    I am doing a study on vegetable intake and the risk of colorectal. I am specifically interested in how change in intake affects the risk of cancer. I made a variable with change in intake.
    When I ran my regressions model (stcox) and margins afterwards to predict scenarios, it seems like the higher the vegetables intake, the higher the risk of cancer. Something must be wrong in the coding of the change variable.
    My intention was to create
    0= no change in considered if change is between -75 and +75
    1= decrease in intake
    2= increase in intake

    gen diff = (f_nsv-b_nsv)
    gen change_in_intake=.
    replace change_in_intake=0 if(diff <=-40 & diff<=40)
    replace change_in_intake=1 if(diff > -40)
    replace change_in_intake=2 if(diff > 40)
    label define changelab45 0 "No change in intake" 1"Decrease in intake,-50gr or lower" 2" Increase in intake, +50gr or higher"
    label values change_in_intake changelab45
    codebook change_in_intake


    f_nsv is vegetables intake at followup and b_nsv is intake at baseline.

    Hope someone can offer advice

    // Anne

    Ps. dataex not an option


  • #2
    Your code does not achieve your stated intention. On top of that, your stated intention is probably a bad idea as well. The construct of change in intake is inherently a continuous measure, and in the absence of good reason to believe that your cancer outcome is qualitatively different when the change crosses a magic boundary at -75 or + 75, making a category variable out of it just discards information and introduces noise. For example, you are saying that a person with a change of -74 is radically different from a person with a change of -76 but is indistinguishable from a person with a change of +74. Sometimes categorizations like this can be useful for creating bar graphs to simplify a descriptive presentation, but such variables usually work out badly when used as regression predictors. You are better off using the change itself as a predictor in your model. If non-linearity is a concern, then an appropriate transformation is the better solution.

    Even if you stick with a categorization of this variable, you will find it easier to work with one that at least respects the ordinal properties of change. Why assign the middle category the 0 value? Better to have 0 for decrease, 1 for no change, and 2 for increase. So if you must use a category, I would recommend
    Code:
    label define changelab45 0 "Decrease" 1 "No Change" 2 "Increase"
    gen change_in_intake:changelab45 = 0 if diff < -75 replace change_in_intake = 1 if inrange(diff, -75, 75) replace change_in_intake = 2 if diff > 75 & !missing(diff)
    That said, even with a corrected change_in_intake variable, there may also be problems with the way you set up your analysis of cancer risk. If just fixing this variable doesn't solve your problem, post back showing code and output for the analysis.

    Comment


    • #3
      Anne,

      If you think there's something wrong with the coding of the variable, you should probably verify that by doing some data checking. Maybe something like...
      Code:
      list f_nsv b_nsv diff change_in_intake
      You could scan the output and see if your code is doing what you think it is doing.

      I don't know if part of the problem could be your code uses +/- 40 as the outpoint but your description indicates +/- 75.

      I worked up a little example of some code you could use with the auto dataset.
      Code:
      sysuse auto, clear
      
      gen x = uniform()
      replace x = -1*x in 1/30
      
      gen price1 = price - mpg*trunk*x
      order price1, after(price)
      
      gen diff = price - price1
      order diff, after(price1)
      
      gen change = irecode(diff,-75,75)
      tab diff change, m
      At that point, you could recode the change variable to match what you want.

      hth,
      Lance

      Comment


      • #4
        Thank you both for the response! I will go a head and try your tips.

        Comment

        Working...
        X