Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generating sum variable by item

    Hello every one, I use stata 10 for my work. I am working with unit level data where in one of the data file the household expenditure is given item wise (1:food, 2: health, 3: education, etc...7: sum of 2-6 and 8:sum of 1-6). Data is in long form. Now my interest is to validate the item 7 and 8 i.e. whether the sum is correct or not. I am trying to generate new variable for item 7 and 8 using command
    Code:
    bys hhid: egen item8=sum( exp) if sno_item>1 & sno_item<8
    With this command I am getting missing value in item 8 cell, whereas I want value there also so that I can easily validate the data. Below is the output

    Code:
       +-----------------------------------------+
         |  hhid   sno_item   exp   item8 |
         |-----------------------------------------|
      1. |  41000011          1       2000       . |
      2. |  41000011          2       3500    3680 |
      3. |  41000011          6        130    3680 |
      4. |  41000011          7         50    3680 |
      5. |  41000011          8       3680       . |
         |-----------------------------------------|
      6. |  41000011          9       5680       . |
      7. | 410000110          1       2600       . |
      8. | 410000110          6        300     300 |
      9. | 410000110          8        300       . |
     10. | 410000110          9       2900       . |
         +-----------------------------------------+
    Please suggest me correction in my command.


    Thanks
    Prakash

  • #2
    Prakas:
    the problem seems to rest on the fact that you have missing values somewhere in your items.
    As a possible work-around you may want to consider:
    Code:
     egen item7=rowtotat(item2 item3 item4 item5 item6)// as -rowtotal- treats missing values as zeros  egen item8=rowtotal(item1 item2 item3 item4 item5 item6)
    and then:
    Code:
     collapse (sum) item7 item8, by(hhid)
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      The problem is not missing values in the data. It is the way the if command works.
      The if option included in your code tell Stata it should perform preceding formula only when conditions apply, and return a missing value when they do not.
      I am not aware of a conditional sum function, but this might help you narrow your search. Simplest, perhaps low tech option I can come up with is to copy the entire column, with values you want to ignore excluded, and then run the sum over the new column, as:

      Code:
        
      gen helper = exp if sno_item>1 & sno_item<8
      bys hhid: egen item8=sum(helper)

      Comment


      • #4
        There is no need for a specific conditional sum function, as the egen function total() will feed on an expression, which can be complicated than a variable name. The expression must ensure that observations irrelevant to the calculation are ignored, which can be done in this case by setting them to zero or to missing; neither affects the total, but missing is better for the case in which all observations are missing.

        Note that I use the name total() here, explicit since Stata 9. The name sum() still works for an egen function, but is likely to lead to confusion with the cumulative sum function sum().

        For more, see
        http://www.stata-journal.com/article.html?article=dm0055

        Code:
        . clear
        
        . input long hhid sno_item exp item8
        
                     hhid   sno_item        exp      item8
          1. 41000011  1 2000 .
          2. 41000011  2 3500  3680
          3. 41000011  6  130  3680
          4. 41000011  7 50  3680
          5. 41000011  8 3680 .
          6. 41000011  9 5680 .
          7. 410000110  1 2600 .
          8. 410000110  6  300 300
          9. 410000110  8  300   .
         10. 410000110  9 2900 .
         11. end
        
        . bys hhid: egen wanted = total(exp / (sno_item>1 & sno_item<8))
        
        . bys hhid: egen wanted_2  = total(cond(sno_item>1 & sno_item<8, exp, .))
        
        . list , sepby(hhid)
        
             +---------------------------------------------------------+
             |      hhid   sno_item    exp   item8   wanted   wanted_2 |
             |---------------------------------------------------------|
          1. |  41000011          1   2000       .     3680       3680 |
          2. |  41000011          2   3500    3680     3680       3680 |
          3. |  41000011          6    130    3680     3680       3680 |
          4. |  41000011          7     50    3680     3680       3680 |
          5. |  41000011          8   3680       .     3680       3680 |
          6. |  41000011          9   5680       .     3680       3680 |
             |---------------------------------------------------------|
          7. | 410000110          1   2600       .      300        300 |
          8. | 410000110          6    300     300      300        300 |
          9. | 410000110          8    300       .      300        300 |
         10. | 410000110          9   2900       .      300        300 |
             +---------------------------------------------------------+

        Comment

        Working...
        X