Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding the number of firms for which a given worker has worked - Matched employer-employee dataset

    Dear all,

    I am working with a matched employer-employee dataset from Brazil in which each observation represents a pair worker-firm for some period and I would like to know for how many different firms a given worker has worked.

    In particular, the dataset is like

    year week id_worker id_firm
    2017 1 17 25
    2017 2 17 41
    2017 3 17 19
    2017 3 17 25
    2017 4 17 53
    2017 5 17 19

    I would like to create a variable like 'number_of_firms_week'

    year week id_worker id_firm number_of_firms_week
    2017 1 17 25 4
    2017 2 17 41 4
    2017 3 17 19 4
    2017 3 17 25 4
    2017 4 17 53 4
    2017 5 17 19 4
    where 'number_of_firms_week' is the variable for the number of different firms for which a given worker has worked (considering all periods).

    Can you help me to find a solution for that?

    Thank you very much!


    Below I provide the code for importing the example dataset into Stata :

    clear
    input year week id_worker id_firm
    2017 1 17 25
    2017 2 17 41
    2017 3 17 19
    2017 3 17 25
    2017 4 17 53
    2017 5 17 19
    end

    Obs: I tried to use 'dataex' but I found it easier, in this case, to provide the 'importing code'.

  • #2
    Code:
    by id_worker id_firm, sort: gen number_of_firms_week = _n == 1
    by id_worker: replace number_of_firms_week = sum(number_of_firms_week)
    by id_worker: replace number_of_firms_week = number_of_firms_week[_N]
    The name of this new variable, number_of_firms_week, bothers me. It makes me think that it should be the number of distinct firms the worker was it in a single week--which is not the case. Did you perhaps not explain the problem fully.

    It may have been simpler for you to create this code for importing the example data, but once you start using string variables, or date variables, or value labeled integer variables, it will be much easier to use -dataex-. In fact, nothing could be simpler. Load the data into memory and just type
    Code:
    dataex
    in the command line. -dataex- will then make an example data set out of the first 100 observations in your data (which is usually good enough for the purpose). Then copy the -dataex- output from the Results window into the forum here and you are done.

    If you want to show specific observations, rather than the first 100, you can use -if- and -in- with the -dataex- command just as you would with almost all Stata commands.

    Look, what you did here is fine--it looks almost exactly like -dataex- output and it works for those who are trying to recreate your data example in Stata. But eventually you will have some data for which this is just tiresome to do, and -dataex- is just so easy to use.

    Comment


    • #3
      @Clyde gives excellent advice. It seems to me that the variable wanted is the number of distinct values of firm for each worker. which is

      Code:
      egen tag = tag(id_worker id_firm)
      egen wanted = total(tag), by(id_worker)
      as discussed in https://www.stata-journal.com/articl...article=dm0042

      @Clyde's code is exactly equivalent and asks Stata to do less....

      Comment


      • #4
        Thank you very much @Clyde

        Comment


        • #5
          Thank you very much Clyde Schechter and Nick Cox !! (Please, dismiss the previous post!)

          Yes, Clyde, I agree with you as to the name of the variable!

          I normally do not use 'dataex' because both the name of the string variables and their values are in Portuguese so it would be more difficult to convey the idea using the original format.
          Last edited by Otavio Conceicao; 29 Oct 2020, 09:21.

          Comment

          Working...
          X