Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to use one row to minus another row?

    In my dataset, s47a means total children given birth to, s47 means dead children,I want to get the number of surviving children using "s47a-s47" and match the number to the household(hhid), but there are a lot of missing value as the picture, so I use egen,
    Code:
    egen ns= s47a-s47
    , but that's wrong, I want to find syntax like "rowtotal" but there isn't for minus, what should I do?
    Code:
    tabulate s47a
    
           S47A |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |         51        0.46        0.46
              1 |      5,160       46.28       46.74
              2 |      4,049       36.31       83.05
              3 |      1,392       12.48       95.53
              4 |        355        3.18       98.72
              5 |        106        0.95       99.67
              6 |         26        0.23       99.90
              7 |          8        0.07       99.97
              8 |          1        0.01       99.98
              9 |          2        0.02      100.00
    ------------+-----------------------------------
          Total |     11,150      100.00
    
    . tabulate s47
    
            S47 |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |        314       26.17       26.17
              1 |        648       54.00       80.17
              2 |        172       14.33       94.50
              3 |         42        3.50       98.00
              4 |         15        1.25       99.25
              5 |          8        0.67       99.92
              9 |          1        0.08      100.00
    ------------+-----------------------------------
          Total |      1,200      100.00
    Click image for larger version

Name:	3.png
Views:	3
Size:	15.3 KB
ID:	1378851

    Attached Files

  • #2
    You will need to use - generate - instead of - egen - to create the variable.

    Code:
     . gen ns_match= (s47a-s47)/hhid
    With regards to missing values, I cannot envisage how this shall be dealt otherwise, since a difference between two numbers, one being missing, is impossible to estimate.

    Hope that helps.
    Last edited by Marcos Almeida; 17 Mar 2017, 04:41.
    Best regards,

    Marcos

    Comment


    • #3
      Originally posted by Marcos Almeida View Post
      You will need to use - generate - instead of - egen - to create the variable.

      Code:
      . gen ns_match= (s47a-s47)/hhid
      With regards to missing values, I cannot envisage how this shall be dealt otherwise, since a difference between two numbers, one being missing, is impossible to estimate.

      Hope that helps.
      Thanks for the reply, but I don't understand why using /hhid, hhid is the identifier of a specific family, and is there any good way for the missing value, I'm worried about if there is one missing value the result will be treat as missing value, this will affect the real number of siblings, because this is a huge survey, lots of people didn't answer the question of number of dead children,but they do have surviving children, so if the s47 is missing, but s47a isn't missing, will be the result of s47a-s47 treat as a missing value?
      gen ns=s47a-s47
      (22,886 missing values generated)

      Code:
      . tabulate ns
      
               ns |      Freq.     Percent        Cum.
      ------------+-----------------------------------
               -7 |          1        0.12        0.12
               -1 |          2        0.25        0.37
                0 |        191       23.82       24.19
                1 |        214       26.68       50.87
                2 |        226       28.18       79.05
                3 |        125       15.59       94.64
                4 |         36        4.49       99.13
                5 |          6        0.75       99.88
                6 |          1        0.12      100.00
      ------------+-----------------------------------
            Total |        802      100.00
      
      . tabulate s47a
      
             S47A |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |         51        0.46        0.46
                1 |      5,160       46.28       46.74
                2 |      4,049       36.31       83.05
                3 |      1,392       12.48       95.53
                4 |        355        3.18       98.72
                5 |        106        0.95       99.67
                6 |         26        0.23       99.90
                7 |          8        0.07       99.97
                8 |          1        0.01       99.98
                9 |          2        0.02      100.00
      ------------+-----------------------------------
            Total |     11,150      100.00
      
      . tabulate s47
      
              S47 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |        314       26.17       26.17
                1 |        648       54.00       80.17
                2 |        172       14.33       94.50
                3 |         42        3.50       98.00
                4 |         15        1.25       99.25
                5 |          8        0.67       99.92
                9 |          1        0.08      100.00
      ------------+-----------------------------------
            Total |      1,200      100.00
      
      .
      Last edited by Spacey Shi; 17 Mar 2017, 06:00.

      Comment


      • #4
        , hhid is the identifier of a specific family
        Being this so, you may use:

        Code:
        . by hhid, sort:
        will be the result of s47a-s47 treat as a missing value?
        Yes, provided one of them is missing.
        Best regards,

        Marcos

        Comment


        • #5
          Marcos is absolutely correct that s47 - s47a will necessarily be missing if either of those variables has a missing value. But since you originally wanted to use -egen- for this, I'm going to guess that what you want is to calculate s47-s47a when they are both non missing, and to treat s47a as if it were zero if it is missing. If that is the case, the following code will do it:

          Code:
          gen diff = s47-s47a
          replace diff = s47 if missing(s47a)

          Comment


          • #6
            Or

            Code:
             
             gen diff = s47- cond(missing(s47a), 0, s47a)

            Comment

            Working...
            X