Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Maybe the dumbest question ever

    Hello, I'm new to Stata. I have hunted around and can't seem to find a simple answer to what I'm sure is a common issue. The problem is duplicate rows. Lets say I want to calculate the mean of a dataset and my data is as follows:
    Var_Manager Var_total employees Var_region served
    Bob 30 North
    Bob 30 East
    Mary 40 South
    Jonas 60 North
    Jonas 60 South
    Jonas 60 West
    I'm just using this as an example to illustrate. I want to calculate the mean employees for each manager, which is [30+40+60]/3. But I can't figure out a way for Stata to understand that Bob is unique, Mary is unique, Jonas is unique, and not six different managers.

    Any help on which way I should be looking would be much appreciated.

  • #2
    Code:
    collapse (mean) Var_total_employees, by(Var_Manager)
    seems to be what you want.

    Comment


    • #3
      Originally posted by Matt Salmon View Post
      I want to calculate the mean employees for each manager, which is [30+40+60]/3.
      What Clyde shows will give you "the mean employees for each manager", but based upon your example calculation the following seem to be what you want.
      Code:
      bysort Var_Manager: assert Var_total_employees == Var_total_employees[1]
      quietly by Var_Manager: keep if _n == 1
      
      summarize Var_total_employees
      Alternatively:
      Code:
      duplicates drop Var_Manager Var_total_employees, force
      isid Var_Manager
      
      summarize Var_total_employees
      (Some safety checks are added in.)

      Comment


      • #4
        I agree that the original calculation makes #3 the solution rather than #2. However, #3 is destructive of the original dataset. If that is a problem, then here is an alternative:

        Code:
        egen byte tagged = tag(Var_manager)
        summarize Var_total_employees if tagged

        Comment


        • #5
          Wow thank you Stata community! This is terrific.

          Comment


          • #6
            It's not (the dumbest question ever).

            I recommend the word distinct where you say unique. You won't be misunderstood when you say unique,but distinct is still the better word. More at Section 2 of https://www.stata-journal.com/articl...article=dm0042

            Comment

            Working...
            X