How to check whether an small id (eg: employee_id) appears in two or more different big ids (eg: office_id)?

Daniel Ram

Join Date: Sep 2019

Posts: 29
#1

How to check whether an small id (eg: employee_id) appears in two or more different big ids (eg: office_id)?

05 Jul 2022, 14:58

I have a dataset of offices and employee sales within that office. An employee shouldn't appear in 2 offices. However, an employee id can appear multiple times within an office.

I want to check whether any employee_id appears in more than one office_id. (EG: whether employee_id 123 appears in BOTH office_id ABC and office_id DEF).

The following example code creates three ids: An office_id, an employee id that only appears in one office (employee_id_oneoffice), and an employee id that appears in two offices (employee_id_repeats).
What is an assert statement (or other test) I could run to make sure that employees only appear in one office? (i.e that employee_id_oneoffice would pass and employee_id_repeats would fail)

Code:

set seed 339910 clear set obs 100 gen office_id = round(_n, 20) gen employee_id_oneoffice = office_id + floor(runiform(1,4)) gen employee_id_repeats = round(_n, 5)

Please let me know if I can clarify.

Thanks,
Daniel
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#2

05 Jul 2022, 15:11

Okay so I am a little confused here. You have many employees in offices. They can appear many times in one office, but never in other offices (e.g., a unique ID for office and employee)
I tried

Code:

duplicates tag office_id emp* , generate(dup)

But I suspect this isn't quite what you want. Why do we have variables for employees for one office and employee id repeats? Why shouldn't every employee have a unique ID? That is, consider

Code:

cls u "http://fmwww.bc.edu/repec/bocode/g/growth.dta", clear xtset region time, q

Here, each region appears once and only once, and each region (or employee) has a unique ID. So my real question is, does this really reflect your underlying data structure? Or was this just an example?
1 like
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10481
#3

05 Jul 2022, 17:33

Originally posted by Daniel Ram View Post

I want to check whether any employee_id appears in more than one office_id. (EG: whether employee_id 123 appears in BOTH office_id ABC and office_id DEF).

This is an FAQ, see https://www.stata.com/support/faqs/d...ions-in-group/. The idea is to sort on a variable within groups and then check whether the first and last values differ. This leads to the following assert commands (thanks for the data example):

Code:

set seed 339910 clear set obs 100 gen office_id = round(_n, 20) gen employee_id_oneoffice = office_id + floor(runiform(1,4)) gen employee_id_repeats = round(_n, 5) bys employee_id_oneoffice (office_id): assert office_id[1]==office_id[_N] bys employee_id_repeats (office_id): assert office_id[1]==office_id[_N]

Res.:

Code:

. . bys employee_id_oneoffice (office_id): assert office_id[1]==office_id[_N] . . bys employee_id_repeats (office_id): assert office_id[1]==office_id[_N] 5 contradictions in 21 by-groups assertion is false r(9);
1 like
Comment
Daniel Ram

Join Date: Sep 2019

Posts: 29
#4

07 Jul 2022, 13:45

Originally posted by Andrew Musau View Post

This is an FAQ, see https://www.stata.com/support/faqs/d...ions-in-group/. The idea is to sort on a variable within groups and then check whether the first and last values differ. This leads to the following assert commands (thanks for the data example):

Thank you, this helped a lot!
Comment

Announcement

How to check whether an small id (eg: employee_id) appears in two or more different big ids (eg: office_id)?

Comment

Comment

Comment