Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need some help dealing with duplicates

    Hi all, I need some help with dealing with duplicates in my data.

    I have something like this:
    var1 var2 var3 var4
    a x Red 1
    a x Green 2
    b y Red 3
    b y Green 4

    This is a simplified version of my data, most of them only have 1 unique var1 and var2, which I intend to use for merging data. But from some variables, there are different types of Var3, which separates them.
    What I want to achieve is to add up var4 if var1 and var2 are the same, and drop every var3 != Red.

    So the effect I want is:
    var1 var2 var3 var4
    a x Red 3
    b y Red 7

    I got a feeling that I should use for loop but not sure where to start.

    Can anyone help please?
    Thank you sooooo much!!!

  • #2
    Maybe something like the following. It has some extra data-integrity checks, but when you're dealing with a dataset with known duplicates, they're probably worth it for the reassurance.

    .ÿ
    .ÿversionÿ17.0

    .ÿ
    .ÿclearÿ*

    .ÿ
    .ÿquietlyÿinputÿstr1(var1ÿvar2)ÿstr5ÿvar3ÿbyteÿvar4

    .ÿ
    .ÿ
    .ÿbysortÿvar1ÿvar2:ÿgenerateÿbyteÿlastÿ=ÿvar3ÿ==ÿ"Red"

    .ÿisidÿvar1ÿvar2ÿlast,ÿsort
    (dataÿnowÿsortedÿbyÿvar1ÿvar2ÿlast)

    .ÿ
    .ÿquietlyÿbyÿvar1ÿvar2:ÿreplaceÿvar4ÿ=ÿsum(var4)

    .ÿ
    .ÿquietlyÿkeepÿifÿlast

    .ÿ
    .ÿlistÿvar?,ÿnoobs

    ÿÿ+---------------------------+
    ÿÿ|ÿvar1ÿÿÿvar2ÿÿÿvar3ÿÿÿvar4ÿ|
    ÿÿ|---------------------------|
    ÿÿ|ÿÿÿÿaÿÿÿÿÿÿxÿÿÿÿRedÿÿÿÿÿÿ3ÿ|
    ÿÿ|ÿÿÿÿbÿÿÿÿÿÿyÿÿÿÿRedÿÿÿÿÿÿ7ÿ|
    ÿÿ+---------------------------+

    .ÿ
    .ÿexit

    endÿofÿdo-file


    .

    Comment


    • #3
      Sorry, one of the sorts above is pointless. (Its presence there resulted from habit of a variation on it used in another context.)
      Code:
      /* bysort var1 var2: */ generate byte last = var3 == "Red"
      isid var1 var2 last, sort
      
      quietly by var1 var2: replace var4 = sum(var4)
      
      quietly keep if last

      Comment


      • #4
        Hi Joseph,

        Thank you so much! That totally works!

        Just one more quick question. What if I have more than one column of var4? per se another var5 which I need to do the exact same thing.

        I tried to do this: quietly by var1 var2: replace var4=sum(var4) replace var5=sum(var5), but stata doesn't seem to allow me to do that.

        Sorry I'm still kinda new to stata and confused with different sorts of syntax.

        Thanks again!

        Comment


        • #5
          Originally posted by Yifan Lu View Post
          . . .I have more than one column of var4? per se another var5 which I need to do the exact same thing.

          I tried to do this: quietly by var1 var2: replace var4=sum(var4) replace var5=sum(var5), but stata doesn't seem to allow me to do that.
          by as a prefix (colon syntax) can take only a single command. So, you need to sum different variables on separate lines. Something like the following:
          Code:
          quietly by var1 var2: replace var4=sum(var4)
          quietly by var1 var2: replace var5=sum(var5)
          By the way, I'm glad that the code I showed worked for you, but it would fail (flag an error) if you had more than two colors. For something like
          Code:
          input str1(var1 var2) str5 var3 byte var4
          a     x     Red       1
          a     x     Green     2
          a     x     Blue      5
          b     y     Red       3
          b     y     Green     4
          end
          you would need to put the isid check earlier in the program flow.
          Code:
          isid var1 var2 var3
          
          generate byte last = var3 == "Red"
          
          quietly bysort var1 var2 (last): replace var4 = sum(var4)
          quietly keep if last

          Comment


          • #6
            OMG thank you soooo much!!!

            You literally just helped me graduate.

            Everything runs smoothly.

            Thanks Again!!!

            Comment

            Working...
            X