Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • creating new variable from same data in two different ways resulting in different results

    I want to createa variable which indicates if a woman has some idea about family planning. For that i have combined 4 variables . The variable thus created =1 if any of the 4 variables is 1 and its 0 otherwise.
    There are no missing values in any of the 4 variables.
    So i tried creating it in two ways -


    First way is creating a variable with all missing values and then replacing them with 0 and 1 depending on the conditions .
    second way is creating a variable with all values 0 and replacing the values which is 1 in any of the 4 variables with 1.
    the result from both methods should be the same but its not. why?
    have included the results below.


    gen fam_pl = .
    (259627 missing values generated)

    .
    . replace fam_pl = 1 if v384a == 1 | v384b ==1| v384c ==1| s616d ==1
    (169700 real changes made)

    . replace fam_pl = 0 if v384a == 0 | v384b ==0| v384c ==0| s616d ==0
    (238870 real changes made)

    . tabulate fam_pl

    fam_pl | Freq. Percent Cum.
    ------------+-----------------------------------
    0 | 238,870 92.01 92.01
    1 | 20,757 7.99 100.00
    ------------+-----------------------------------
    Total | 259,627 100.00

    . gen famp = 0

    .
    . replace famp = 1 if v384a == 1 | v384b ==1| v384c ==1| s616d ==1
    (169700 real changes made)

    . tabulate famp

    famp | Freq. Percent Cum.
    ------------+-----------------------------------
    0 | 89,927 34.64 34.64
    1 | 169,700 65.36 100.00
    ------------+-----------------------------------
    Total | 259,627 100.00

  • #2
    These two are not at all equivalent. In the first one, you start by setting fam_pl = 1 if any of your four conditions, v384a, v384b, and v384c, and s616d is 1. But then in the next command you then change those 1's to zero if any of the four conditions is zero. So the net result is that fam_pl = 1 only if all four of those conditions is 1.

    In the second approach you are setting famp to 1 if any of the four conditions is 1, and 0 otherwise.

    From your description in words of what you want, the second approach is correct and the first is wrong.

    By the way, you can simplify this to:

    Code:
    egen byte famp = rowmax(v384a v384b v384c s616d)
    or to
    Code:
    gen famp = inlist(1, v384a, v384b, v384c, s616d)

    Comment


    • #3
      Thanks a lot for the easier code and now I understand why my codes dont give the same result.

      Comment

      Working...
      X