Maybe the dumbest question ever

Matt Salmon

Join Date: Jan 2023

Posts: 2
#1

Maybe the dumbest question ever

16 Jan 2023, 21:35

Hello, I'm new to Stata. I have hunted around and can't seem to find a simple answer to what I'm sure is a common issue. The problem is duplicate rows. Lets say I want to calculate the mean of a dataset and my data is as follows:
Var_Manager Var_total employees Var_region served

Bob 30 North

Bob 30 East

Mary 40 South

Jonas 60 North

Jonas 60 South

Jonas 60 West

I'm just using this as an example to illustrate. I want to calculate the mean employees for each manager, which is [30+40+60]/3. But I can't figure out a way for Stata to understand that Bob is unique, Mary is unique, Jonas is unique, and not six different managers.

Any help on which way I should be looking would be much appreciated.
Tags: categorical, data
Clyde Schechter

Join Date: Apr 2014

Posts: 30084
#2

16 Jan 2023, 22:23

Code:

collapse (mean) Var_total_employees, by(Var_Manager)

seems to be what you want.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4403
#3

16 Jan 2023, 23:10

Originally posted by Matt Salmon View Post

I want to calculate the mean employees for each manager, which is [30+40+60]/3.

What Clyde shows will give you "the mean employees for each manager", but based upon your example calculation the following seem to be what you want.

Code:

bysort Var_Manager: assert Var_total_employees == Var_total_employees[1] quietly by Var_Manager: keep if _n == 1 summarize Var_total_employees

Alternatively:

Code:

duplicates drop Var_Manager Var_total_employees, force isid Var_Manager summarize Var_total_employees

(Some safety checks are added in.)
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1385
#4

16 Jan 2023, 23:15

I agree that the original calculation makes #3 the solution rather than #2. However, #3 is destructive of the original dataset. If that is a problem, then here is an alternative:

Code:

egen byte tagged = tag(Var_manager) summarize Var_total_employees if tagged
Comment
Matt Salmon

Join Date: Jan 2023

Posts: 2
#5

17 Jan 2023, 01:50

Wow thank you Stata community! This is terrific.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35668
#6

17 Jan 2023, 04:22

It's not (the dumbest question ever).

I recommend the word distinct where you say unique. You won't be misunderstood when you say unique,but distinct is still the better word. More at Section 2 of https://www.stata-journal.com/articl...article=dm0042
Comment

Var_Manager	Var_total employees	Var_region served
Bob	30	North
Bob	30	East
Mary	40	South
Jonas	60	North
Jonas	60	South
Jonas	60	West

Announcement

Maybe the dumbest question ever

Comment

Comment

Comment

Comment

Comment