Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • rowtotal with missing values

    Hi,

    I need to create a variable that is the sum of other values in a row. I normally use newvar=egen(rowtotal var 1 var2....., etc.) for this. However I need newvar to be missing if any one of var 1, var 2, etc. are missing, and rowtotal treats missing as 0. Is there an easy way to solve this problem?

    Thanks,
    Alyssa

  • #2
    Alyssa: Try (1) egen rowmiss and then (2) condition your egen rowtotal on the result of the rowmiss calculation being zero.

    E.g.

    egen rm=rowmiss(v1-vn)
    egen rt=rowtotal(v1-vn) if rm==0

    Comment


    • #3
      Apart from the excellent advice in #2, and just for the sake of providing a toy example (with the variables var1, var2 and var3), you may try a quite down-to-earth command in order to get exactly what you wish.

      Code:
      gen newvar = var1 + var2 + var3
      By the way, precisely to avoid the "problem" of getting missing values (as long as there is at least a variable with missing data in the same row), I suspect, the command - egen - with the option "rowtotal" (hence ignoring missing data) was aimed at.
      Last edited by Marcos Almeida; 06 Feb 2017, 08:25.
      Best regards,

      Marcos

      Comment


      • #4
        Hi all,

        I'm having a problem with generating a total score from 19 variables. According to the scoring sheet for the scale, my max score should be 27 points. However, even though the variables I've created (really complicated creation compared to possible answers on original items) have the appropriate max values that match the ones on the scoring sheet, egen produces a max score of 26 while gen newvar (as described above by Marcos) produces 25.

        What could be the reasons for this, other than no one in the sample scores a 27? And why would egen differ from gen? Here are the two sets of code for egen and gen:


        cap drop aicastot
        gen aicastot= ///
        aicashrs + ///
        aicasfreq + ///
        aicasdur + ///
        aicaspreocc + ///
        aicaslocplaymore + ///
        aicaswith + ///
        aicastolerance + ///
        aicascraving + ///
        aicasresist + ///
        aicasescape + ///
        aicaslim + ///
        aicasforgot + ///
        aicastoolong + ///
        aicasprwkschl + aicasprobfam + aicasprmoney + ///
        aicasnegleis + aicasnegfriend + aicasprobhlt
        codebook aicastot
        summ aicastot


        cap drop aicasscale
        egen aicasscale=rowtotal( ///
        aicashrs ///
        aicasfreq ///
        aicasdur ///
        aicaspreocc ///
        aicaslocplaymore ///
        aicaswith ///
        aicastolerance ///
        aicascraving ///
        aicasresist ///
        aicasescape ///
        aicaslim ///
        aicasforgot ///
        aicastoolong ///
        aicasprwkschl aicasprobfam aicasprmoney ///
        aicasnegleis aicasnegfriend aicasprobhlt), missing
        codebook aicasscale
        summ aicasscale

        I'm worried that I somehow created the variable incorrectly because it's so complicated. For example, the most complicated question allots 2 points to this combination (reproduced from the scoring sheet for the scale):
        12. How often have you tried to give up or, respectively, to limit your
        online behaviour? (must =4 or 3)

        (Item 12 has to be in taken in combination with Item 12.1)
        AND

        12.1 … … if you have previously tried to change your online
        behaviour: were you successful?(must = „no“)
        Thank you all for any suggestions!

        Michelle

        Comment


        • #5
          I can't see what the problem is. Perhaps try

          Code:
          edit aic* if aicasscale != aicastot

          Comment


          • #6
            Hi

            This is maybe a simple solution, but when writing an Introduction to Stata course a couple of years ago, I included the egen command just to highlight the availability of a more advanced generate command, and also how you can use the hyphen to save time on writing code (and potential pitfalls). This was the command I used:

            egen anxiety_total= rowmiss(anxiety_1 - anxiety_10)

            I also then mentioned about the issue with missing data being counted as 0, so if there was missing data to use:

            egen anxiety_total= rowmiss(anxiety_1 - anxiety_10), missing

            This was in Stata version 14.

            However, we now run the course in version 15, and rightly so, one of the students has pointed out that including the option of missing seems to have no impact at all, still counting missing values as zero in the calculation rather than replacing the new values as missing if a missing value exists.

            I've looked through the help file, but it all seems the same content, but doesn't seem to work. Anyone know if this option has been removed in the new version?


            For example, the following scenario in Stata 15 provides new variables that are the same despite the missing data:


            sysuse auto

            * missing values in rep78

            egen new= rowtotal(rep78- trunk)

            egen new2= rowtotal(rep78- trunk), missing


            Thanks,
            Dan

            Comment


            • #7
              Code:
              viewsource _growtotal.ado 
              shows that the code has been unchanged since 2008. The problem here is different: a misunderstanding of what this option does. The help says

              If missing is specified and all values in varlist are missing ]for an
              observation, newvar is set to missing.
              The key here is that all values must be missing to return a result of missing. If there is even one value in an observation that is non-missing then "all" is not true. The option does not ensure that missing is returned if any value is missing.

              This missing option was added because of community reaction: some users objected to Stata's rules for adding values. Forgetting egen for a moment:Stata's logic is that (say) the sum of a missing and a non-missing value is always regarded as the non-missing value -- as the missing value is ignored. So, the sum of missings alone is implicitly zero. Some users didn't want to see zero as an explicit result whenever all the values are missing.

              In your auto example, missings are present only in rep78 and so specifying the missing option, as you say, makes no difference.

              If what you want is to see missings as a total whenever any value is missing, then you can write your own egen function -- that's serious; the user community has published many more egen functions than are included in official Stata -- or you need a three-step from official Stata:

              Code:
              sysuse auto, clear
              
              * missing values in rep78
              egen new= rowtotal(rep78- trunk)
              egen prob = rowmiss(rep78- trunk)
              replace new = . if prob

              Comment


              • #8
                Thank you, Nick, for illuminating this piece of Stata logic in your post.

                I have also something to add regarding the dangers of rowtotal. On a couple of occasions this got me unprepared while using egen, rowtotal:

                The syntax v1-vN is extremely dangerous, and should be avoided whenever possible. This syntax depends on the ordering of your variables in the dataset, and this ordering you can not always guess correctly.

                Here are two examples, in the first the syntax does what I expect it to do, in the second the syntax creates a mess.

                Code:
                . clear
                
                . set obs 1
                number of observations (_N) was 0, now 1
                
                . gen a1 = 1
                
                . gen a2 =1
                
                . gen b1 = 1000
                
                . egen total = rowtotal(a1-a2)
                
                . summ total
                
                    Variable |        Obs        Mean    Std. Dev.       Min        Max
                -------------+---------------------------------------------------------
                       total |          1           2           .          2          2
                
                .
                Here everything went fine, rowtotal did what we wanted it to do. And now things go wrong:

                Code:
                . clear
                
                . set obs 1
                number of observations (_N) was 0, now 1
                
                . gen a1 = 1
                
                . gen b1 = 1000
                
                . gen a2 = 1
                
                . egen total = rowtotal(a1-a2)
                
                . summ total
                
                    Variable |        Obs        Mean    Std. Dev.       Min        Max
                -------------+---------------------------------------------------------
                       total |          1        1002           .       1002       1002
                
                .
                So when using the syntax v1-vN, it is very hard to guess what Stata understands by it, unless you actually look at your data table, and see that the ordering of your variables is what you think it is. (This very often is not possible if you are running some complicated code that changes the data.)

                Comment


                • #9
                  Thank you for your reply Nick.

                  Just shows that I am still learning with Stata, even when I thought I got it (but didn't read the text properly).

                  In my previous workings I must have used observations with all missing values when it worked. Writing my own function sounds pretty intriguing, but as you say, serious.

                  Joro- you are absolutely right about the hyphen usage. I like this example because it shows the students while it is a shortcut (that they are likely to see in the code of others), there are dangers if the variables are not in the order you expect.

                  Thank you both,
                  Dan

                  Comment

                  Working...
                  X