Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to check whether an small id (eg: employee_id) appears in two or more different big ids (eg: office_id)?

    I have a dataset of offices and employee sales within that office. An employee shouldn't appear in 2 offices. However, an employee id can appear multiple times within an office.

    I want to check whether any employee_id appears in more than one office_id. (EG: whether employee_id 123 appears in BOTH office_id ABC and office_id DEF).

    The following example code creates three ids: An office_id, an employee id that only appears in one office (employee_id_oneoffice), and an employee id that appears in two offices (employee_id_repeats).
    What is an assert statement (or other test) I could run to make sure that employees only appear in one office? (i.e that employee_id_oneoffice would pass and employee_id_repeats would fail)

    Code:
    set seed  339910
    clear
    set obs 100 
    gen office_id = round(_n, 20)
    gen employee_id_oneoffice = office_id + floor(runiform(1,4))
    gen employee_id_repeats = round(_n, 5)
    Please let me know if I can clarify.

    Thanks,
    Daniel

  • #2
    Okay so I am a little confused here. You have many employees in offices. They can appear many times in one office, but never in other offices (e.g., a unique ID for office and employee)
    I tried
    Code:
    duplicates tag office_id emp* , generate(dup)
    But I suspect this isn't quite what you want. Why do we have variables for employees for one office and employee id repeats? Why shouldn't every employee have a unique ID? That is, consider
    Code:
    cls
    
    u "http://fmwww.bc.edu/repec/bocode/g/growth.dta", clear
    
    xtset region time, q
    Here, each region appears once and only once, and each region (or employee) has a unique ID. So my real question is, does this really reflect your underlying data structure? Or was this just an example?

    Comment


    • #3
      Originally posted by Daniel Ram View Post

      I want to check whether any employee_id appears in more than one office_id. (EG: whether employee_id 123 appears in BOTH office_id ABC and office_id DEF).
      This is an FAQ, see https://www.stata.com/support/faqs/d...ions-in-group/. The idea is to sort on a variable within groups and then check whether the first and last values differ. This leads to the following assert commands (thanks for the data example):

      Code:
      set seed  339910
      clear
      set obs 100 
      gen office_id = round(_n, 20)
      gen employee_id_oneoffice = office_id + floor(runiform(1,4))
      gen employee_id_repeats = round(_n, 5)
      
      bys employee_id_oneoffice (office_id): assert office_id[1]==office_id[_N]
      bys employee_id_repeats (office_id): assert office_id[1]==office_id[_N]
      Res.:

      Code:
      . 
      . bys employee_id_oneoffice (office_id): assert office_id[1]==office_id[_N]
      
      . 
      . bys employee_id_repeats (office_id): assert office_id[1]==office_id[_N]
      5 contradictions in 21 by-groups
      assertion is false
      r(9);

      Comment


      • #4
        Originally posted by Andrew Musau View Post

        This is an FAQ, see https://www.stata.com/support/faqs/d...ions-in-group/. The idea is to sort on a variable within groups and then check whether the first and last values differ. This leads to the following assert commands (thanks for the data example):
        Thank you, this helped a lot!

        Comment

        Working...
        X