Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating Binary variable

    Hi Family,
    I have a household data set containing nutritional outcome of at least 2 children in each household.
    I am seeking to create an additional variable named "divergent_nut_outcome" with a binary outcome, where 1 signifies 'Divergent' (i.e., if children within the same household exhibit differing nutritional outcomes), and 0 indicates 'Not Divergent' (all children within the household have the same nutritional status, either undernutrition or normal).

    This is the command I tried but became confused along the line.

    egen underNutriton_count = total( nutritional_status == 2), by( nutritional_status )
    gen divergent_status = "Divergent" if underNutriton_count > 0 & underNutriton_count < _N

    I Kindly request your assistance to resolve this.

    Below is the example of the data set for your perusal.
    [CODE]
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str3 uid byte age_months long(gender nutritional_status)
    "119" 52 2 1
    "119" 21 1 2
    "119" 5 1 1
    "131" 40 1 1
    "131" 5 2 2
    "311" 54 2 1
    "311" 11 2 2
    "419" 39 1 1
    "419" 59 2 1
    "419" 19 1 1
    "525" 33 2 2
    "525" 21 2 1
    "526" 57 1 1
    "526" 8 1 1
    "633" 39 1 1
    "633" 34 1 1
    "7 2" 27 2 2
    "7 2" 7 2 2
    end
    label values gender gender
    label def gender 1 "female", modify
    label def gender 2 "male", modify
    label values nutritional_status nutritional_status
    label def nutritional_status 1 "Normal", modify
    label def nutritional_status 2 "Undernutrition", modify

    Thank you for your continues support.
    Amidu
    Last edited by Shamsudini Amidu; 13 Oct 2023, 10:07.

  • #2
    There is no variation in the status codes if all children have the same outcome within a household, that means the HH-level SD will be 0. So, one way to solve this would be through egen & sd():

    Code:
    egen divergent_nut_outcome = sd(nutritional_status), by(uid)
    replace divergent_nut_outcome = 1 if divergent_nut_outcome > 0 & !missing(divergent_nut_outcome)

    Comment


    • #3
      Hi ken,
      I am most grateful for your swift response.
      It has created it successfully, I am running a logistics regression to investigate whether the gender of the child is associated with the nutritional divergent.

      I use the command below;
      logistic divergent_nut_outcome i.gender


      [CODE]
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str3 uid byte age_months long(gender nutritional_status) float divergent_nut_outcome
      "119" 52 2 1 1
      "119" 21 1 2 1
      "119" 5 1 1 1
      "131" 40 1 1 1
      "131" 5 2 2 1
      "311" 54 2 1 1
      "311" 11 2 2 1
      "419" 39 1 1 0
      "419" 59 2 1 0
      "419" 19 1 1 0
      "525" 33 2 2 1
      "525" 21 2 1 1
      "526" 57 1 1 0
      "526" 8 1 1 0
      "633" 39 1 1 0
      "633" 34 1 1 0
      "7 2" 27 2 2 0
      "7 2" 7 2 2 0
      end
      label values gender gender
      label def gender 1 "female", modify
      label def gender 2 "male", modify
      label values nutritional_status nutritional_status
      label def nutritional_status 1 "Normal", modify
      label def nutritional_status 2 "Undernutrition", modify


      Am I correct?

      Alternatively, I also taught of collapsing the divergent_nut_outcome by household' before I run the logistic regression,
      However, if I do so, I will face with potential bias when collapsing the data to a single observation per household and the risk of losing individual-level information like sex, age.

      What is the best way to go?

      Thank you.
      I do appreciate your time and support.

      Comment

      Working...
      X