Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to identify industry specialists in a huge sample

    I would like to define an employee (identified by employee ID) as an industry specialist (equal to 1) if the firm where s/he an employee shares the same industry code (identified by two digits) with at least one firm in his employment history. A worker in the year 2004, for example, would be an industry specialist if s/he worked in a firm that shares the same industry code prior to 2004 (and of course not after). Therefore, the employee identified as 12342 is an industry specialist in 2002 (and not in 1999) as shown in the following table:
    companyId Year employeeID IndustryCode
    GB12345678910 2002 12342 20
    GB00000000000 2003 45674567 30
    GB23452345234 2004 12342 40
    GB52349088454 2005 45674567 20
    GB5345893748 2006 6546456 10
    GB34534534500 1999 12342 20
    GB0977709709 2000 1234454 30
    GB09304570394 2001 6546456 10
    Bear in mind that an employee might not be an industry specialist in a particular year but might be so in the following years. I have a large number of observations and I am wondering if there are any commands to create such dummy variables in Stata.
    Last edited by Mo Hos; 11 Nov 2019, 14:01.

  • #2
    Hello Mo Hos,

    Perhaps you might try the following code (it is not efficient, but should yield the result you are after):

    sort employeeid year
    gen indspec = 0
    replace indspec = 1 if industrycode[_n-1]==industrycode[_n] & employeeid[_n-1]==employeeid[_n]


    Using the above with your sample of data returned the following:
    companyid year employeeid industrycode indspec
    GB34534534500 1999 12342 20 0
    GB12345678910 2002 12342 20 1
    GB23452345234 2004 12342 40 0
    GB0977709709 2000 1.20E+06 30 0
    GB09304570394 2001 6.50E+06 10 0
    GB5345893748 2006 6.50E+06 10 1
    GB00000000000 2003 4.60E+07 30 0
    GB52349088454 2005 4.60E+07 20 0
    Sincerely,


    Scott

    Comment


    • #3
      Hi Scott,
      Thank you for your suggestion.

      There is a small problem. If an employee worked for the same company for more than one year, s/he should not be identified as an industry specialist unless s/he has worked for another company that has the same industry code in the past. Therefore, I think we should take into account the company ID. Can you add the company ID to the code?

      Additionally, let's say that employee No. 12342 worked in another company in 2005 and 2006 in an industry coded 20. I would like to recognise such employee as an industry specialist as well. By using your command, does it make him/her an industry specialist (equal to 1) in 2005 and 2006?

      Comment


      • #4
        I think this should do what you're after:
        Code:
        bysort employeeID IndustryCode (Year): gen tag = companyId != companyId[_n-1] & _n > 1
        bysort employeeID companyId: egen wanted = max(tag)
        Last edited by Wouter Wakker; 12 Nov 2019, 02:44.

        Comment


        • #5
          Originally posted by Wouter Wakker View Post
          I think this should do what you're after:
          Code:
          bysort employeeID IndustryCode (Year): gen tag = companyId != companyId[_n-1] & _n > 1
          bysort employeeID companyId: egen wanted = max(tag)
          Thanks. Can you explain this in words please?

          Comment


          • #6
            Bysort makes groups of each IndustryCode within each EmployeeID and sorts them by year. Then, every time the companyID is different from the companyID in the previous year (coded as [_n-1]) this observation is tagged. If someone works for a company for more than one year, the other years are not tagged, which is why the second line is needed. This carries over the tag to other years where an employee kept working for the same company.

            Comment


            • #7
              Originally posted by Wouter Wakker View Post
              I think this should do what you're after:
              Code:
              bysort employeeID IndustryCode (Year): gen tag = companyId != companyId[_n-1] & _n > 1
              bysort employeeID companyId: egen wanted = max(tag)
              The first code generated the following result:

              Code:
              Weights not allowed

              Comment


              • #8
                I can't see the data that your working on and the exact code that you used. You may have included a space between companyID and [_n-1] when adapting the code.

                It would be best to give a reproducible example which includes a data example (make sure to use dataex), your code and the output. See also the FAQ for advice on how best to pose questions.

                Comment


                • #9
                  Building off of Wouter Wakker idea and code:

                  Note, I created some toy data so that I could have more examples and more variation:

                  Code:
                  dataex emp_id company_id year ind_code
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input byte emp_id str6 company_id int year byte ind_code
                  1 "firm1"  1997 10
                  1 "firm3"  2001 10
                  1 "firm2"  2004 30
                  1 "firm4"  2008 10
                  2 "firm7"  2003 40
                  2 "firm6"  2007 20
                  2 "firm5"  2009 20
                  3 "firm11" 2001 50
                  3 "firm10" 2002 50
                  3 "firm9"  2006 20
                  3 "firm8"  2007 40
                  4 "firm13" 1999 30
                  4 "firm12" 2001 30
                  5 "firm14" 1997 20
                  5 "firm15" 1999 40
                  5 "firm18" 2001 20
                  5 "firm17" 2006 20
                  5 "firm16" 2007 40
                  6 "firm19" 2008 60
                  end
                  
                  . list, sepby(emp_id) noobs
                  
                    +-------------------------------------+
                    | emp_id   compan~d   year   ind_code |
                    |-------------------------------------|
                    |      1      firm1   1997         10 |
                    |      1      firm3   2001         10 |
                    |      1      firm2   2004         30 |
                    |      1      firm4   2008         10 |
                    |-------------------------------------|
                    |      2      firm7   2003         40 |
                    |      2      firm6   2007         20 |
                    |      2      firm5   2009         20 |
                    |-------------------------------------|
                    |      3     firm11   2001         50 |
                    |      3     firm10   2002         50 |
                    |      3      firm9   2006         20 |
                    |      3      firm8   2007         40 |
                    |-------------------------------------|
                    |      4     firm13   1999         30 |
                    |      4     firm12   2001         30 |
                    |-------------------------------------|
                    |      5     firm14   1997         20 |
                    |      5     firm15   1999         40 |
                    |      5     firm18   2001         20 |
                    |      5     firm17   2006         20 |
                    |      5     firm16   2007         40 |
                    |-------------------------------------|
                    |      6     firm19   2008         60 |
                    +-------------------------------------+
                  
                  // Sorting by industry and THEN year (so instances of working in same industry are next to each other, even if they had other jobs
                  // in between them
                  bysort emp_id ind_code (year): gen exp = _n  // exp is short for experience. Times person had worked in that industry.
                  gen specialist = 0
                  replace specialist = 1 if exp > 1 & exp <.  // could do as 1 line: gen specialist = (exp > 1 & exp<.)
                  
                  // NOTE: This is still sorted by industry and then year (but it makes it easier to see any prior instances in same industry
                  list, sepby(emp_id ind_code ) noobs
                  
                    +------------------------------------------------------+
                    | emp_id   compan~d   year   ind_code   exp   specia~t |
                    |------------------------------------------------------|
                    |      1      firm1   1997         10     1          0 |
                    |      1      firm3   2001         10     2          1 |
                    |      1      firm4   2008         10     3          1 |
                    |------------------------------------------------------|
                    |      1      firm2   2004         30     1          0 |
                    |------------------------------------------------------|
                    |      2      firm6   2007         20     1          0 |
                    |      2      firm5   2009         20     2          1 |
                    |------------------------------------------------------|
                    |      2      firm7   2003         40     1          0 |
                    |------------------------------------------------------|
                    |      3      firm9   2006         20     1          0 |
                    |------------------------------------------------------|
                    |      3      firm8   2007         40     1          0 |
                    |------------------------------------------------------|
                    |      3     firm11   2001         50     1          0 |
                    |      3     firm10   2002         50     2          1 |
                    |------------------------------------------------------|
                    |      4     firm13   1999         30     1          0 |
                    |      4     firm12   2001         30     2          1 |
                    |------------------------------------------------------|
                    |      5     firm14   1997         20     1          0 |
                    |      5     firm18   2001         20     2          1 |
                    |      5     firm17   2006         20     3          1 |
                    |------------------------------------------------------|
                    |      5     firm15   1999         40     1          0 |
                    |      5     firm16   2007         40     2          1 |
                    |------------------------------------------------------|
                    |      6     firm19   2008         60     1          0 |
                    +------------------------------------------------------+
                  
                  
                  sort emp_id year  // sorting back so in chronological order
                  . list, sepby(emp_id) abbrev(12) noobs
                  
                    +----------------------------------------------------------+
                    | emp_id   company_id   year   ind_code   exp   specialist |
                    |----------------------------------------------------------|
                    |      1        firm1   1997         10     1            0 |
                    |      1        firm3   2001         10     2            1 |
                    |      1        firm2   2004         30     1            0 |
                    |      1        firm4   2008         10     3            1 |
                    |----------------------------------------------------------|
                    |      2        firm7   2003         40     1            0 |
                    |      2        firm6   2007         20     1            0 |
                    |      2        firm5   2009         20     2            1 |
                    |----------------------------------------------------------|
                    |      3       firm11   2001         50     1            0 |
                    |      3       firm10   2002         50     2            1 |
                    |      3        firm9   2006         20     1            0 |
                    |      3        firm8   2007         40     1            0 |
                    |----------------------------------------------------------|
                    |      4       firm13   1999         30     1            0 |
                    |      4       firm12   2001         30     2            1 |
                    |----------------------------------------------------------|
                    |      5       firm14   1997         20     1            0 |
                    |      5       firm15   1999         40     1            0 |
                    |      5       firm18   2001         20     2            1 |
                    |      5       firm17   2006         20     3            1 |
                    |      5       firm16   2007         40     2            1 |
                    |----------------------------------------------------------|
                    |      6       firm19   2008         60     1            0 |
                    +----------------------------------------------------------+
                  Last edited by David Benson; 12 Nov 2019, 12:04.

                  Comment


                  • #10
                    Originally posted by Wouter Wakker View Post
                    I can't see the data that your working on and the exact code that you used. You may have included a space between companyID and [_n-1] when adapting the code.

                    It would be best to give a reproducible example which includes a data example (make sure to use dataex), your code and the output. See also the FAQ for advice on how best to pose questions.
                    You are right. There was a space between companyID and [_n-1]. Sorry for the stupid mistake.

                    Thank you for your help!

                    Comment


                    • #11


                      Thank you @David Benson.
                      If an employee worked for the same company for more than one year, s/he should not be identified as an industry specialist unless s/he has worked for another company that has the same industry code in the past. Your code considered an employee who worked for the same company for more than one year as a specialist. Can you add something to the code to resolve such an issue?

                      Comment


                      • #12
                        I just realized that instead of
                        Code:
                        bysort employeeID IndustryCode (Year): gen tag = companyId != companyId[_n-1] & _n > 1
                        bysort employeeID companyId: egen wanted = max(tag)
                        it would be better to use
                        Code:
                        bysort employeeID IndustryCode (Year): gen tag = companyId != companyId[_n-1] & _n > 1
                        bysort employeeID IndustryCode (Year): gen wanted = sum(tag) >= 1
                        It is theoretically possible that employees, after becoming a specialist, move back the the firm where they started. The first code will wrongly include them as a specialist in the first instance of the first firm. The second code only carries the tag forward to future years.

                        Comment


                        • #13
                          @Wouter Wakker after using your first suggested code:

                          Code:
                            
                          bysort employeeID IndustryCode (Year): gen tag = companyId != companyId[_n-1] & _n > 1
                          bysort employeeID companyId: egen wanted = max(tag)
                          The results were as follows:

                          Code:
                           
                          companyid year employeeid industrycode tag wanted
                          GB0293475073 2005 12342 20 1 1
                          GB0980232133 2006 12342 20 1 1
                          GB12345678910 2002 12342 20 1 1
                          GB23452345234 2004 12342 40 0 0
                          GB34534534500 1999 12342 20 0 0
                          GB76858567845 2008 12342 40 1 1
                          GB0977709709 2000 1.20E+06 30 0 0
                          GB09304570394 2001 6.50E+06 10 0 0
                          GB5345893748 2006 6.50E+06 10 1 1
                          GB9084326092 2002 6.50E+06 10 1 1
                          GB00000000000 2003 4.60E+07 30 0 0
                          GB52349088454 2006 4.60E+07 20 0 0
                          GB52349088454 2005 4.60E+07 20 0 0
                          GB98012354698 2008 4.60E+07 20 0 1
                          GB98012354698 2007 4.60E+07 20 1 1
                          GB0997123156 2002 4.60E+08 60 1 1
                          GB5723409122 2001 4.60E+08 60 0 0
                          GB8763453423 2000 4.60E+08 50 0 0
                          GB8763453423 2003 4.60E+08 50 0 0
                          As you can see, after becoming a specialist, the employee 4.60E+08 moved back to the firm where s/he started and were tagged as a non-specialist when s/he came back. No problems with that.

                          Your new code

                          Code:
                          bysort employeeID IndustryCode (Year): gen tag = companyId != companyId[_n-1] & _n > 1
                          bysort employeeID IndustryCode (Year): gen wanted = sum(tag) >= 1
                          generated the exact same results but in a different order:

                          Code:
                           
                          companyid year employeeid industrycode tag wanted
                          GB34534534500 1999 12342 20 0 0
                          GB12345678910 2002 12342 20 1 1
                          GB0293475073 2005 12342 20 1 1
                          GB0980232133 2006 12342 20 1 1
                          GB23452345234 2004 12342 40 0 0
                          GB76858567845 2008 12342 40 1 1
                          GB0977709709 2000 1.20E+06 30 0 0
                          GB09304570394 2001 6.50E+06 10 0 0
                          GB9084326092 2002 6.50E+06 10 1 1
                          GB5345893748 2006 6.50E+06 10 1 1
                          GB52349088454 2005 4.60E+07 20 0 0
                          GB52349088454 2006 4.60E+07 20 0 0
                          GB98012354698 2007 4.60E+07 20 1 1
                          GB98012354698 2008 4.60E+07 20 0 1
                          GB00000000000 2003 4.60E+07 30 0 0
                          GB8763453423 2000 4.60E+08 50 0 0
                          GB8763453423 2003 4.60E+08 50 0 0
                          GB5723409122 2001 4.60E+08 60 0 0
                          GB0997123156 2002 4.60E+08 60 1 1
                          Last edited by Mo Hos; 13 Nov 2019, 03:13.

                          Comment


                          • #14
                            The results are the same because employee 4.60E+08 is not an illustration of the case I was talking about, as the firms in between are not in the same industry and therefore the employee didn't become a specialist at that point (for industry 50). I still believe the second code should be used.

                            Comment


                            • #15
                              First, thank you for discussing such a matter as this may uncover inadequacies.

                              Second, I do not quite understand your point but there may be a misunderstanding regarding who should be considered as an industry specialist. The main idea is whether the director has been ever worked in a firm, other than the current one, that has the same industry code before. If yes then s/he an industry specialist. Therefore, once the director becomes an industry specialist, the director will be regarded as so in the following years as long as s/he has been working in the same industry, even if s/he came back to the firm where they started (should be in the same industry though). If you mean something different, please let me know.

                              Comment

                              Working...
                              X