Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Differences in numbers after the duplication of column

    Hello,
    I am facing problems with a quite simple operation in Stata. For each individual, I have two columns : one for the mothers, and one for the fathers. Each column contains the unique identifier of their first child. Of course, if my individual is a mother, I have a point in the father column, and if the individual is a father, I have a point in the mother column. So it looks like this

    ID SEX id_child_m id_child_f
    Paul M . John
    Mary F John .

    What I would like to is to create a column id_child which takes the value of the column id_child_m or id_child_f, depending on the sex of my individual, and so it would like this:
    ID SEX id_child
    Paul M John
    Mary F John

    However, it doesn't work. My id_child_m and id_child_f are not "John" but some 8-digits-identifiers, and when I use the following piece of code, the identifier changes a bit. For example, 11111112 becomes 11111113. I have asbolutely no idea why .... I made sure the formats of the columns is the same, but it seems not to be the cause of the mistake.
    Could anyone please help me? (making the sum of the two columns or using the command "rowtotal" leads to the same result)
    Thanks a lot in advance,
    Quentin

    Code: forvalues v = 1 (1) 24 {

    format id_child_m`v' id_child_f`v' %9.0g
    recast float id_child_m`v' id_child_f`v', force

    gen id_children`v' = id_child_m`v'
    replace id_children`v' = id_child_f`v' if missing(id_children`v')
    }

  • #2
    You actually went out of your way to create a problem by using -recast float id_child_m`v' id_child_f`v', force-. Your identifiers have too many digits to be stored in a float, the maximum capacity of which is 7 digits. So when you recast them to -float- you lose precision. Omit that line, and then do -gen long id_children`v' = id_child_m`v'-.

    Added: Using a long storage type will be good for up to 9 digits (or small 10 digit numbers). The id's you show are 8 digits, so this will work fine for them. But if some of your id's have 10 or more digits, then you need to go to -double- to have enough room.

    Also added: I notice you used the -force- operation in your -recast-. I imagine you did that because without it, Stata gave you an error message saying that it couldn't do that without losing information. You should have heeded Stata's warning instead of trying to suppress it. Error messages are your friends: you should embrace them happily when you get them. They are Stata's way of warning you that you are doing something unsafe with your data. In fact, as a general rule, you should never use a -force- option unless you are 100% certain that you understand why Stata won't do what you ask unless -force-d and you are 100% certain that the abnormalities/distortions that doing it will create are OK in your situation.
    Last edited by Clyde Schechter; 14 Oct 2022, 09:36.

    Comment


    • #3
      Dear Clyde,
      This perfectly works, thanks a lot for your explanation.
      I tried "force" in a desperately way, but knowing that it was risky. Thank you for your comments and tips!
      Quentin

      Comment

      Working...
      X