Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why is the sum of missing values turned to zero? How should I fix this, please?

    I am summing the values of each observation of the weight variable by ID and year. If the values of the "weight" variables are missing, I want the "total weight" value to be missing. Therefore, in my command below I added an option “missing”.




    bysort ID year: egen Total weight = total (Weight), missing




    However, the value of the Total weight is zero, if I add two or more missing observations. For example in the case of Id “1” year “2002”, the total weight is 0, but I want it to be missing because the two values of weight are both missing.




    However, if the group has only one observation and that observation is missing then the total weight is missing, as in the case of 2003 and 2004. Those two cases are ok and good.




    However, as I stated above if the group has two or more observations and those observations are missing then total weight values are 0 ( e.g total weight values for 2002, and total weight value for 2005). Please let me know your suggestion on how to fix issues of the zero total weights shown in 2002 and 2005 when they were supposed to be missing.
    ID Year Weight Total weight
    1 2001 126422 475262
    1 2001 18394 475262
    1 2001 330446 475262
    1 2002 . 0
    1 2002 . 0
    1 2003 . .
    1 2004 . .
    1 2005 . 0
    1 2005 . 0

    Thank you very much and I look forward to your tips,

  • #2
    I cannot replicate your problem:

    Code:
    . * Example generated by -dataex-. For more info, type help dataex
    . clear
    
    . input byte id int year long weight
    
               id      year        weight
      1. 1 2001 126422
      2. 1 2001  18394
      3. 1 2001 330446
      4. 1 2002      .
      5. 1 2002      .
      6. 1 2003      .
      7. 1 2004      .
      8. 1 2005      .
      9. 1 2005      .
     10. end
    
    .
    . by id year, sort: egen total_weight = total(weight), miss
    (6 missing values generated)
    
    .
    . list, noobs clean
    
        id   year   weight   total_~t  
         1   2001   126422     475262  
         1   2001    18394     475262  
         1   2001   330446     475262  
         1   2002        .          .  
         1   2002        .          .  
         1   2003        .          .  
         1   2004        .          .  
         1   2005        .          .  
         1   2005        .          .
    Are you perhaps using an old version of Stata? Or is your version not updated?

    In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Thank you very much Clyde Schechter!! Thank you also for the tip about the "-dataex-". Below I show the issue again using "-dataex-".
      When I keep only a few observations and calculate the total, I do not get a 0 value for the sum of the missing observations.
      However, when I calculate without decreasing the number of observations in my dataset, the problem persists. The total number of observations in my dataset is 10,797. I have two years 2011 and 2015. In addition to the year and the ID, I have one more variable called "Documentnumber" which I use as a grouping variable in my command below.
      Below I presented my issue. First, using the "-dataex-" I list those variables that have 0 total weight values when the values of weight are missing. Then, I run again the command for calculating total weight to generate a variable called “total_weight _again”
      Then, I list those variables that have issues ( that is zero values for the “Total_weight_again” variable when the “weight” variable in the group is missing.

      I am using STATASE 17 (64-bit)


      dataex ID year Documentnumber weight if Total_weight==0 & weight==.

      ----------------------- copy starting from the next line -----------------------
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input long ID int year double(Documentnumber weight)
      1998004150 2011 810000402073 .
      1998004150 2011 810000402073 .
      1998004150 2011 810000402088 .
      1998004150 2011 810000402112 .
      1998004150 2011 810000402112 .
      1998004150 2015 810000402947 .
      2002019581 2011 810000402066 .
      2002019581 2011 810000402066 .
      2002019581 2011 810000402087 .
      2004028746 2011 810000402057 .
      2004028746 2011 810000402061 .
      2004028746 2011 810000402061 .
      2004028746 2011 810000402068 .
      2004028746 2011 810000402084 .
      end
      ------------------ copy up to and including the previous line ------------------

      Listed 14 out of 10797 observations

      . by ID year Documentnumber, sort: egen Total_weight_again = total(weight), miss
      (28 missing values generated)

      . list if Total_weight_again ==0 & weight==., noobs clean

      Documentnu~r ID year weight Total_~t Total_~n
      810000402073 1998004150 2011 . 0 0
      810000402073 1998004150 2011 . 0 0
      810000402088 1998004150 2011 . 0 0
      810000402112 1998004150 2011 . 0 0
      810000402112 1998004150 2011 . 0 0
      810000402947 1998004150 2015 . 0 0
      810000402066 2002019581 2011 . 0 0
      810000402066 2002019581 2011 . 0 0
      810000402087 2002019581 2011 . 0 0
      810000402057 2004028746 2011 . 0 0
      810000402061 2004028746 2011 . 0 0
      810000402061 2004028746 2011 . 0 0
      810000402068 2004028746 2011 . 0 0
      810000402084 2004028746 2011 . 0 0

      .






      Comment


      • #4
        Hi again, I am sorry, but now the issue is solved. When I look into the dataset it is for the sum of missing and zero values. When the "weight" variable has 0 values for some observations and missing values for others. I am sorry for the issue. Now it is solved. Thank you for the help.

        Comment

        Working...
        X