Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a conditional sum of variables

    I have a database with 2 kinds of variables:

    DT1_OMS, PDT2_OMS...(1 to 5): instrumentation (you can have up to 5, and there are 5 different instrumentations)
    PDT1_DIA : nÂș of days you have used this instrumentation

    I would like to create a new variable in wich sum all PDT*_DIA, but only for an specific instrumentations.

    I know how to do creating some other variables, but I wonder if it is possible to do it directly:

    egen float d_su = rowtotal(PDT1_DIA if PDT1_OMS=="57.94", PDT2_DIA if PDT2_OMS=="57.94", ...)

    This program does not work, but it is possible to do something like that?

    Thank you in advance

  • #2
    Welcome to Statalist!

    I think what you want is to use the cond function where you have been trying to put if clauses.
    Code:
    egen float d_su = rowtotal(cond(PDT1_OMS=="57.94",PDT1_DIA,0), cond(PDT2_OMS=="57.94".PDT2_DIA,0),...)

    Comment


    • #3
      rowtotal() expects a varlist, so that shouldn't work. A further problem is testing for equality with a decimal value, here 57.94.

      This example is salutary:

      Code:
       
      . set obs 1
      obs was 0, now 1
      
      . gen foo = 57.94
      
      . count if foo == 57.94
          0
      
      . count if foo == float(57.94)
          1
      search precision to find out why. So I suggest

      Code:
      gen d_su = 0 
      
      forval j = 1/2 { 
             replace d_su = d_su + PDT`j'_DIA if PDT`j'_OMS == float(57.94) 
      }
      for "any value of 2".

      Comment


      • #4
        Nick Cox is of course correct about the rowtotal() function. I was too focused on promoting the cond() function to solve the conditional sum to realize the situation was inappropriate. On the other hand, I think there's a reasonable possibiity that the OMS variables are not "numbers" but rather "numeric identifiers" like the ICD9 codes that turn up on Statalist occasionally. So I am inclined to accept the implication that they are stored as string variables to be compared, as they were, to string values. But that's a guess. It's clear that the statement of the problem misses a fair amount of the information necessary to fully understand the problem.

        Jose Valencia will perhaps be able to sort out what he needs from our responses. If not, Jose, please follow up here, first having reviewed the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question. It would be particularly helpful to post a small hand-made example, perhaps with fewer variables as well as fewer observations, showing the data before the process and how you expect it to look after the process. Or show us what you did, creating additional variables, that worked, again including some sample data.

        Nick's answer and mine show how difficult it is to fully understand a problem statment in the absence of attention to the guidance in the FAQ.

        Comment


        • #5
          As William rightly points out if the variables concerned really are string, then using double quotes is quite correct.

          Code:
          gen d_su = 0  
          forval j = 1/2 {        
               replace d_su = d_su + PDT`j'_DIA if PDT`j'_OMS == "57.94"  
          }
          My own blindness here stems from rarely using such string codes, but that's not an excuse.

          Comment


          • #6
            Thank you all for your quick and genlte responses

            Certainly, PDT*_OMS are string variables (belong to a software database that I cannot modify); so, until your responses I did like that (it works, but it is hard, since it is not the only variable creation I have to do):

            gen d_su1= PDT1_DIA if PDT1_OMS=="57.94"
            gen d_su2= PDT2_DIA if PDT2_OMS=="57.94"
            gen d_su3= PDT3_DIA if PDT3_OMS=="57.94"
            gen d_su4= PDT4_DIA if PDT4_OMS=="57.94"
            gen d_su5= PDT5_DIA if PDT5_OMS=="57.94"
            egen float d_su = rowtotal(d_su1 d_su2 d_su3 d_su4 d_su5)

            I do not know well the forval command, but it seems very useful, I will try this other way

            Have a nice sunday evening! Jose

            Comment


            • #7
              Your code should work because rowtotal() ignores missings by default.

              Comment


              • #8
                You are right, Nick, the only problem is the huge quantity of code that is necessary (with my syntaxis)
                Thank you for your responses
                Last edited by Jose Valencia; 14 Feb 2016, 15:56. Reason: Ambiguous post (before)

                Comment


                • #9
                  What is the huge quantity of code? Nick's solution in #3 is just four lines, one of which is just a closing curly brace! How much shorter do you want it to be?

                  Comment


                  • #10
                    Clyde, I was talking about my code (not Nick's, obviously!)

                    As I said, I did not known 'forval' command before, so I have to seen how it works. I have just understanded and done it and... IT IS AMAZING!!

                    Nick, thank you, very, very much, you save me an enormous amount of time and open me another dimension of Stata
                    Last edited by Jose Valencia; 14 Feb 2016, 15:45.

                    Comment

                    Working...
                    X