Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding the number of different pairs from different variables that satisfies some criteria - panel data

    Dear all,

    I am working with a matched employer-employee dataset from Brazil in which each observation represents a pair worker-firm for some period and I would like to know for how many different firms from the same sector a given worker has worked.

    In particular, the dataset is like

    year week id_worker id_firm firm_sector
    2017 1 17 25 Services
    2017 2 17 41 Manufacturing
    2017 3 17 19 Manufacturing
    2017 3 17 25 Services
    2017 4 17 53 Services
    2017 5 17 19 Manufacturing

    I would like to have other three new variables such as

    year week id_worker id_firm firm_sector number_of_firms_services number_of_firms_manufacturing number_of_firms_commerce
    2017 1 17 25 Services 2 1 1
    2017 2 17 41 Manufacturing 2 1 1
    2017 3 17 19 Commerce 2 1 1
    2017 3 17 25 Services 2 1 1
    2017 4 17 53 Services 2 1 1
    2017 5 17 19 Commerce 2 1 1

    where 'number_of_firms_sector' counts the number of different firms for which the worker has worked in a given 'sector', where sector \in {Manufacturing, Services, Commerce}.

    Can you help to find a solution for that?

    Thank you!



    Code for importing:

    clear
    input year week id_worker id_firm str15 firm_sector
    2017 1 17 25 "Services"
    2017 2 17 41 "Manufacturing"
    2017 3 17 19 "Commerce"
    2017 3 17 25 "Services"
    2017 4 17 53 "Services"
    2017 5 17 19 "Commerce"
    end

  • #2
    Code:
    clear
    input year week id_worker id_firm str15 firm_sector
    2017 1 17 25 "Services"
    2017 2 17 41 "Manufacturing"
    2017 3 17 19 "Commerce"
    2017 3 17 25 "Services"
    2017 4 17 53 "Services"
    2017 5 17 19 "Commerce"
    end
    
    by id_worker firm_sector id_firm, sort: gen number_of_firms_ = (_n == 1)
    by id_worker firm_sector: replace number_of_firms_ = sum(number_of_firms_)
    by id_worker firm_sector: replace number_of_firms_ = number_of_firms_[_N]
    
    gen long obs_no = _n
    reshape wide number_of_firms_, i(obs_no) j(firm_sector) string
    drop obs_no
    Notes: You might want to pick a shorter stub than "number of firms." At 15 characters, plus an additional _ to separate it from the sector, if you have any long sector names, you will exceed Stata's 32 character limit on variable names, and the code will break. Also, this code will only work if all the sector names are suitable for inclusion in a variable name: that is, no embedded blanks, no special characters, only letters, digits, and underscore. If you have sector names that violate those constraints, you will want to perhaps apply the -strtoname()- function to them first.

    The final layout created by this code is pretty awkward. On the one hand, the number of firms per sector is set out side-by-side in separate variables. Yet you still have multiple observations per worker showing the individual details of each job. That is likely to prove confusing or create problems as you move ahead in this project. Probably the most sensible thing to do is to stop before the -gen long obs_no = _n- command and keep the data in fully long layout. That works best for nearly everything in Stata.

    Comment


    • #3
      This is I think a variant on your previous thread.

      Code:
      egen tag = tag(id_worker id_firm firm_sector)
      
      egen wanted  = total(tag), by(id_worker firm_sector)

      Comment


      • #4
        Thank you very much Clyde Schechter and Nick Cox !!!

        Comment

        Working...
        X