Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • count over previous observations that meet a condition

    Dear all,
    I am quite new to Stata and still struggling with some commands.

    My dataset contains information on workers and their workplace: most workers are observed in the years 1990-1997. For each observation, I have a worker_id and a firm_id: some of the workers changed their workplace over the years. I have a dummy variable identifying those workers and the id of the firm they previously worked in. I am trying to build a measure of their network, as given by the number of coworkers in their previous workplace.

    Is there a way to count the observations whose firm_id is identical to the previous_firm_id that the job-switching observations have and whose year is antecedent to the one of the switch? It seems to me that some kind of loop must be created, but I haven't been able to find a way yet.

    Thank you, any help would be much appreciated.

    Gioia

  • #2
    Welcome to the Stata Forum / Statalist.

    I kindly recommend to follow the FAQ advice, I mean, sharing data under CODE delimiters or by installing the SSC dataex.

    Also, you could present "the wishful" variable in the example.

    You don't need to share the whole dataset, Just an abridged version or a toy example will do fine.

    Best regards,

    Marcos

    Comment


    • #3
      Here is a simplified example of what I have in mind.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte(worker_id firm_id) int anno byte(d previous_firm network)
      1 1 1990 0 . .
      1 1 1991 0 . .
      1 1 1992 0 . .
      2 1 1990 0 . .
      2 1 1991 0 . .
      2 2 1992 1 1 2
      3 2 1990 0 . .
      3 2 1991 0 . .
      3 2 1992 0 . .
      4 1 1990 0 . .
      4 1 1991 0 . .
      4 1 1992 0 . .
      end
      Network is what I would like to obtain, i.e. the count of the workers (in this example, worker_id 1 and worker_id 4) which up to the year of the switch (i.e. up to 1991) worked in the previous_firm .

      Hope it is clearer in this way!

      Best,
      Gioia

      Comment


      • #4
        This may not be as complicated as you think. However, for the sake of consistency, note the following:

        1) The total number of workers varies by year. Therefore, you want a count for the year that the worker changed firms.
        2) Since you want the count at the point where the worker changed firms, you can take advantage of the fact that the observation
        immediately preceding that defines the firm and year of interest.

        Code:
        *\\ first generate total no. of workers in a given firm and year.
        sort worker_id (anno)
        egen nworkers = count(worker_id), by(firm_id anno)
        
        
        *\\ count no. of coworkers (= total workers - self)
        gen network2 =  cond(d==1, nworkers[_n-1]-1, .)
        Beware of duplicate observations. You should eliminate these before running the code.
        Last edited by Andrew Musau; 18 May 2017, 15:23.

        Comment


        • #5
          This was very helpful, thank you for your answer!

          Comment


          • #6
            Hello again, this is an excerpt of my data after cleaning and keeping only the observations which have the outcome I am interested in:

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input long worker_id int year long firm_id byte dummy long previous_firm byte network
            3673681 2009 7187167 0       . .
             617972 2006 7187167 0       . .
            8098419 2010 7187167 0       . .
            5159819 2006 7187167 0       . .
            7314747 2010    2376 0       . .
            5258171 2008 7187167 0       . .
            8337358 2009 7187167 0       . .
            7774480 2011 7187167 0       . .
             878789 2011    7187 0       . .
            3898310 2012 7187167 0       . .
              67635 2007 7187167 0       . .
            8976214 2012 7187167 0       . .
             772794 2011 7187167 0       . .
            6767379 2006 7187167 0       . .
            4447136 2012 7187167 0       . .
             476385 2009 7187167 0       . .
            3225566 2008 7187167 0       . .
            7901186 2006 7187167 0       . .
            2059679 2008 7187167 0       . .
            5467850 2009    2301 1 7187167 8
            end
            "Network" is still the variable I want to obtain, i.e. the count of the coworkers before the switch (which is again indexed by the dummy). The code suggested before does not work anymore because now it is not obvious that the previous observations belong to the same firm (sometimes I have only one observation per firm, and the workers who changed workplace are rare in the dataset).

            I guess that my question is, more in general, how to count observations conditional of the values of variables of other observations (previous_firm and year), which are "tagged" by the dummy.

            I would be very grateful if someone could help!

            Best

            Gioia

            Comment


            • #7
              I think that you need to calculate the number of workers in the previous firm by reducing the data to one observation per firm and year and then use a cumulative sum to add up the workers for prior years. Once you have this information, you can merge it with the original data.

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input long worker_id int year long firm_id byte dummy long previous_firm byte network
              3673681 2009 7187167 0       . .
               617972 2006 7187167 0       . .
              8098419 2010 7187167 0       . .
              5159819 2006 7187167 0       . .
              7314747 2010    2376 0       . .
              5258171 2008 7187167 0       . .
              8337358 2009 7187167 0       . .
              7774480 2011 7187167 0       . .
               878789 2011    7187 0       . .
              3898310 2012 7187167 0       . .
                67635 2007 7187167 0       . .
              8976214 2012 7187167 0       . .
               772794 2011 7187167 0       . .
              6767379 2006 7187167 0       . .
              4447136 2012 7187167 0       . .
               476385 2009 7187167 0       . .
              3225566 2008 7187167 0       . .
              7901186 2006 7187167 0       . .
              2059679 2008 7187167 0       . .
              5467850 2009    2301 1 7187167 8
              end
              gen order = _n
              save "statalist_ex.dta", replace
              
              * create a count of number of workers per year per firm
              collapse (count) n=worker_id, by(firm_id year)
              
              * make it a cumulative count of prior years
              bysort firm_id : gen cumsum = sum(n[_n-1])
              drop n
              
              * merge the count of prior years with original data
              rename firm_id previous_firm
              merge 1:m previous_firm year using "statalist_ex.dta", ///    
                  keep(match using) nogen
              
              sort order

              Comment

              Working...
              X