Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating an indicator variable in an employer-employee dataset with repeated observations

    I have a employer-employee panel dataset at quarter frequency. The data includes some irregularities which I want to document to draw a picture of the labor market dynamics in that sector. In data, there exist workers who work for one firm for some day, and pass to another firm within the same quarter. I want to create indicator variables to tag how many times a particular worker has changed job/firm (i) within same period t and (ii) from t-1 to t and (iii) for how many quarters she stayed unemployed (missing mday in my data). I figured out the third one in a long way but could not manage the first two.
    In the data mday refers to number of working days, wid is worker identity and firmid is the identity of the firm and id refers to panel id for each (worker,firm) group.
    My data looks like the following:

    Code:
     Example generated by -dataex-. To install: ssc install dataex
    clear
    input float date byte(wid firmid) long id double mday 
    2015q1 1 40 12 .
    2015q1 1 10 1 20
    2015q1 1 50 15 .
    2015q1 1 20 4 8
    2015q1 1 30 8 .
    2015q2 1 10 1 30
    2015q2 1 30 8 .
    2015q2 1 20 4 .
    2015q2 1 40 12 .
    2015q2 1 50 15 .
    2015q3 1 20 4 .
    2015q3 1 50 15 .
    2015q3 1 30 8 .
    2015q3 1 10 1 30
    2015q3 1 40 12 .
    2015q4 1 30 8 10
    2015q4 1 20 4 20
    2015q4 1 50 15 .
    2015q4 1 10 1 .
    2015q4 1 40 12 .
    2016q1 1 40 12 .
    2016q1 1 30 8 .
    2016q1 1 50 15 .
    2016q1 1 20 4 .
    2016q1 1 10 1 .
    2016q2 1 40 12 .
    2016q2 1 20 4 .
    2016q2 1 50 15 .
    2016q2 1 10 1 .
    2016q2 1 30 8 .
    2016q3 1 50 15 .
    2016q3 1 20 4 .
    2016q3 1 40 12 .
    2016q3 1 10 1 .
    2016q3 1 30 8 .
    2016q4 1 10 1 1
    2016q4 1 40 12 .
    2016q4 1 30 8 .
    2016q4 1 50 15 10
    2016q4 1 20 4 .
    2017q1 1 40 12 .
    2017q1 1 20 4 .
    2017q1 1 50 15 .
    2017q1 1 10 1 30
    2017q1 1 30 8 .
    2017q2 1 10 1 15
    2017q2 1 30 8 .
    2017q2 1 50 15 .
    2017q2 1 20 4 13
    2017q2 1 40 12 4
    2017q3 1 50 15 .
    2017q3 1 30 8 .
    2017q3 1 20 4 15
    2017q3 1 10 1 15
    2017q3 1 40 12 .
    2017q4 1 30 8 .
    2017q4 1 40 12 .
    2017q4 1 50 15 .
    2017q4 1 20 4 20
    2017q4 1 10 1 11
    2015q1 2 30 9 31
    2015q1 2 40 13 .
    2015q2 2 30 9 31
    2015q2 2 40 13 .
    2015q3 2 30 9 31
    2015q3 2 40 13 .
    2015q4 2 40 13 .
    2015q4 2 30 9 31
    2016q1 2 40 13 .
    2016q1 2 30 9 31
    2016q2 2 30 9 31
    2016q2 2 40 13 .
    2016q3 2 40 13 .
    2016q3 2 30 9 .
    2016q4 2 40 13 31
    2016q4 2 30 9 .
    2017q1 2 40 13 31
    2017q1 2 30 9 .
    2017q2 2 30 9 .
    2017q2 2 40 13 31
    2017q3 2 40 13 31
    2017q3 2 30 9 .
    2017q4 2 30 9 .
    2017q4 2 40 13 31
    2015q1 3 40 14 .
    2015q1 3 60 17 .
    2015q1 3 10 2 25
    2015q1 3 50 16 .
    2015q1 3 30 10 5
    2015q1 3 20 5 .
    2015q2 3 50 16 .
    2015q2 3 60 17 .
    2015q2 3 20 5 .
    2015q2 3 10 2 30
    2015q2 3 30 10 .
    2015q2 3 40 14 .
    2015q3 3 30 10 .
    2015q3 3 60 17 .
    2015q3 3 50 16 .
    2015q3 3 20 5 .
    end
    format %tq date
    Worker 1 has worked for Firm 30 and Firm 20 in 2015q4, which is different than the firmshe worked in 2015q3(firm 10). So for 2015q4 my indicator variable should be 1 for both observations at 2015q4. But I could not manage so far.
    I tried the following to tag the periods t when a worker started to work in a different firm than t-1.

    Code:
    [bysort wid date: gen timechange =1 if date!=date[_n-1] &mday!=.
    replace timechange=1 if date==date[_n-1] &mday!=.
    //I planned this timechange variable to show the times the worker is actively working
    bysort wid date: gen firmchange=1 if firmid != firmid[_n-1] & timechange==1

    But this does not solve my problem. For ex. this code results firmchange=1 for wid=1 and date= 2017q2, but since the worker worked for firm 10 in both 2017q1 and 2017q2, that should not be tagged as a firm change at 2017q2.

    To be more precise data for worker 1 looks like the following:
    wid firmid mday
    2015q1 1 20 8
    2015q1 1 10 20
    2015q2 1 10 30
    2015q3 1 10 30
    2015q4 1 30 10
    2015q4 1 20 20
    2016q1 1
    2015q2 1
    2015q3 1
    2016q4 1 50 10
    2016q4 1 10 1
    2017q1 1 10 30
    2017q2 1 10 15
    2017q2 1 20 13
    2017q2 1 40 4
    2017q3 1 10 15
    2017q3 1 20 15
    2017q4 1 10 11
    2017q4 1 20 20
    Any help is appreciated, thanks in advance.

  • #2
    Your data example isn't quite there. input won't accept 2005q1 in a float. Hence I can't see that dataex was used to produce the data example directly.

    This will work for data input.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str6 date byte(wid firmid) long id double mday 
    2015q1 1 40 12  .
    2015q1 1 10  1 20
    2015q1 1 50 15  .
    2015q1 1 20  4  8
    2015q1 1 30  8  .
    2015q2 1 10  1 30
    2015q2 1 30  8  .
    2015q2 1 20  4  .
    2015q2 1 40 12  .
    2015q2 1 50 15  .
    2015q3 1 20  4  .
    2015q3 1 50 15  .
    2015q3 1 30  8  .
    2015q3 1 10  1 30
    2015q3 1 40 12  .
    2015q4 1 30  8 10
    2015q4 1 20  4 20
    2015q4 1 50 15  .
    2015q4 1 10  1  .
    2015q4 1 40 12  .
    2016q1 1 40 12  .
    2016q1 1 30  8  .
    2016q1 1 50 15  .
    2016q1 1 20  4  .
    2016q1 1 10  1  .
    2016q2 1 40 12  .
    2016q2 1 20  4  .
    2016q2 1 50 15  .
    2016q2 1 10  1  .
    2016q2 1 30  8  .
    2016q3 1 50 15  .
    2016q3 1 20  4  .
    2016q3 1 40 12  .
    2016q3 1 10  1  .
    2016q3 1 30  8  .
    2016q4 1 10  1  1
    2016q4 1 40 12  .
    2016q4 1 30  8  .
    2016q4 1 50 15 10
    2016q4 1 20  4  .
    2017q1 1 40 12  .
    2017q1 1 20  4  .
    2017q1 1 50 15  .
    2017q1 1 10  1 30
    2017q1 1 30  8  .
    2017q2 1 10  1 15
    2017q2 1 30  8  .
    2017q2 1 50 15  .
    2017q2 1 20  4 13
    2017q2 1 40 12  4
    2017q3 1 50 15  .
    2017q3 1 30  8  .
    2017q3 1 20  4 15
    2017q3 1 10  1 15
    2017q3 1 40 12  .
    2017q4 1 30  8  .
    2017q4 1 40 12  .
    2017q4 1 50 15  .
    2017q4 1 20  4 20
    2017q4 1 10  1 11
    2015q1 2 30  9 31
    2015q1 2 40 13  .
    2015q2 2 30  9 31
    2015q2 2 40 13  .
    2015q3 2 30  9 31
    2015q3 2 40 13  .
    2015q4 2 40 13  .
    2015q4 2 30  9 31
    2016q1 2 40 13  .
    2016q1 2 30  9 31
    2016q2 2 30  9 31
    2016q2 2 40 13  .
    2016q3 2 40 13  .
    2016q3 2 30  9  .
    2016q4 2 40 13 31
    2016q4 2 30  9  .
    2017q1 2 40 13 31
    2017q1 2 30  9  .
    2017q2 2 30  9  .
    2017q2 2 40 13 31
    2017q3 2 40 13 31
    2017q3 2 30  9  .
    2017q4 2 30  9  .
    2017q4 2 40 13 31
    2015q1 3 40 14  .
    2015q1 3 60 17  .
    2015q1 3 10  2 25
    2015q1 3 50 16  .
    2015q1 3 30 10  5
    2015q1 3 20  5  .
    2015q2 3 50 16  .
    2015q2 3 60 17  .
    2015q2 3 20  5  .
    2015q2 3 10  2 30
    2015q2 3 30 10  .
    2015q2 3 40 14  .
    2015q3 3 30 10  .
    2015q3 3 60 17  .
    2015q3 3 50 16  .
    2015q3 3 20  5  .
    end
    
    gen work = quarterly(date, "YQ")
    drop date 
    rename work date 
    format %tq date

    Comment


    • #3
      Sorry, I had not noticed the problem with the date variable
      In the original data, I had year and month. (i.e i tried to convert 2015 month 3 to 2015q1 and month 6 to 2015q2, so on) I constructed the date variable as follows:
      Code:
      gen date = qofd(dofm(ym(year, month)))
      format %tq date
      Did I make sth wrong there?. When I use the following line, i get an error of type mismatch:
      gen work = quarterly(date, "YQ")

      Comment


      • #4
        That was wrong because date is already numeric and already a quarterly date. The small point is that your dataex example will not work as posted, and I made a change so that it would work for others. The change is not for you as you already have the data.

        Comment


        • #5
          Thanks for the clarification and the quick response. I would be glad to take your opinion regarding the construction of indicator variables.

          Comment


          • #6
            I didn't answer any of those because I wasn't confident I understood what you want, and that's still true. No one else has responded.

            On (i) if the idea is whether a worker was employed by two or more different firms in a given quarter, then a (0, 1) indicator is given by

            Code:
            bysort wid date (firmid) : gen diff = firmid[_N] != firmid[1]
            as explained at https://www.stata.com/support/faqs/d...ions-in-group/

            But you also talk about counting changes, which could mean

            1. number of times a worker changed jobs within the same quarter, but how are we supposed to know about time order within a quarter, or whether jobs were held simultaneously or sequentially

            2. number of different firms worked for in a given quarter.

            On (ii) sorry, but I don't follow what you want either. It could be (e.g.) whether there is any firm worked for in the present quarter that was not worked for in the previous quarter, or the number of such firms, or ...

            As flagged at https://www.stata-journal.com/articl...article=dm0099 indicator variables often mean narrowly binary variables with values 0 or 1 or missing (otherwise known as dummy, Boolean, dichotomous, logical, quantal, and perhaps by other terms too) -- and often mean just data generally.

            Comment


            • #7
              Dear Nick,
              For the first point (i): "1. number of times a worker changed jobs within the same quarter, but how are we supposed to know about time order within a quarter, or whether jobs were held simultaneously or sequentially". If a worker works for more than one firm in the same period I want to flag it as she changed a job. Indeed what I seek eventually is the number of different firms a worker works at a given quarter. I thought I should first create an indicator variable than I can sum them to reach the number of different firms a worker works at the same period.

              For the second comment, my mind is not clear, you are right. Maybe I should assign one firm to each worker (according to some rule), then decide the firm change between t+1 and t. Otherwise in cases where a worker works for firms (A, B, C) at time t, and she works for only firm (A) at time t+1, I am not sure whether this is a job change or not. I will think about it, thanks a lot.

              Comment

              Working...
              X