Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting event occurencies

    Hello to everybody,
    I would like to ask a clarification. I am working with a dataset that reports hirings for different firms.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long(id_firm id_worker) float date str3 timest float cat
     2 4757957 22340 "211" 1
     6 2531010 21273 "181" 2
     6 2525847 22705 "221" 2
     7 3127188 21556 "191" 2
     7 5432840 22406 "212" 2
     8 2767717 21322 "182" 1
     8 3324773 21366 "183" 1
     8 3324773 21435 "183" 1
     8 2767717 21366 "183" 1
     8 2767717 21538 "184" 1
     8 1809629 21538 "184" 1
     8 3324773 21550 "191" 1
     8 3324773 21722 "192" 2
    10 2588153 21948 "201" 3
    11 1809597 21871 "194" 2
    11 1809597 22636 "214" 2
    13 4679017 22228 "204" 2
    13 4679017 22384 "212" 2
    13 4679017 22557 "214" 2
    17 4819689 21216 "181" 2
    17 3121329 22074 "202" 2
    17 6482945 22221 "204" 2
    17 1744725 22494 "213" 2
    18 3208794 21400 "183" 1
    18 3208794 21484 "184" 2
    18 2074777 21946 "201" 2
    20 4612010 21400 "183" 2
    21 2011931 22228 "204" 1
    22 2235257 22469 "213" 1
    23 2890692 21192 "181" 3
    23 1782179 21426 "183" 2
    23 4173506 21556 "191" 3
    23 4448762 22382 "212" 2
    23 6648116 22551 "213" 2
    23 4448762 22634 "214" 2
    23 6648116 22657 "221" 2
    23 6655422 22726 "221" 3
    23 6073019 22673 "221" 2
    24 1744329 21801 "193" 4
    26 1724201 21360 "182" 1
    26 5432881 21550 "191" 4
    26 4276312 21570 "191" 1
    26 1724201 21550 "191" 1
    26 1356830 21885 "194" 1
    30 5182870 21472 "184" 2
    32 5971076 22221 "204" 1
    32 5971076 22247 "204" 1
    33 5858242 21753 "193" 1
    34 2095856 22515 "213" 2
    36 2021812 22649 "221" 1
    36 4106555 22709 "221" 1
    36 2976335 22653 "221" 1
    38 3244418 22537 "213" 2
    39 6056335 21353 "182" 1
    39 3091619 21449 "183" 2
    39 4266132 21458 "184" 1
    39 2453190 21498 "184" 4
    39 1461105 21514 "184" 4
    39 2257055 21586 "191" 2
    39 1457304 21605 "191" 1
    39 5124893 21593 "191" 2
    39 2760694 21556 "191" 1
    39 2984909 21570 "191" 1
    39 2301709 21605 "191" 4
    39 6364497 21654 "192" 4
    39 2473546 21651 "192" 2
    39 3435774 21671 "192" 1
    39 4851860 21683 "192" 2
    39 5654859 21647 "192" 1
    39 2540078 21794 "193" 4
    39 4861016 21731 "193" 2
    39 2063773 21854 "194" 4
    39 5786679 21888 "194" 1
    39 1600459 21892 "194" 1
    39 5120255 21894 "194" 2
    39 3435774 21892 "194" 1
    39 3347656 21843 "194" 2
    39 4009911 21976 "201" 2
    39 4562915 21927 "201" 4
    39 6384250 21969 "201" 1
    39 5123489 21948 "201" 4
    39 4225350 21948 "201" 1
    39 3297393 21921 "201" 1
    39 6317057 22039 "202" 4
    39 2567088 22171 "203" 1
    39 5662283 22123 "203" 2
    39 2499806 22207 "204" 4
    39 6412690 22232 "204" 2
    39 5178517 22242 "204" 4
    39 6138546 22284 "211" 2
    39 5120255 22340 "211" 2
    39 6056335 22298 "211" 2
    39 5654859 22340 "211" 1
    39 1950936 22340 "211" 1
    39 3800120 22340 "211" 1
    39 2298615 22333 "211" 1
    39 3394461 22337 "211" 1
    39 3091619 22347 "211" 3
    39 1445438 22340 "211" 1
    39 5457269 22344 "211" 1
    end
    format %td date

    each row identifies a working relationship between a firm (id_firm) and a worker (id_workers) that starts at "date". "st_time" identifies different quarters ( 211 = "first quarter of 2021") and it is based on date. "cat" identify different types of workers.
    I would like to know how many workers of each cat is hired by each firm in every quarter, hence the total number of hirings for each cat for every firm\time combination.
    I am planning to use the contract command:

    contract id_impresa whf_cat st_time, freq(hire)
    sort id_impresa st_time
    egen tot_hire=sum( hire), by( id_impresa st_time)

    gen q1= hire if cat==1
    gen q2= hire if cat==2
    gen q3= hire if cat==3
    gen q4= hire if cat==4

    foreach var of varlist q1 q2 q3 q4 {
    recode `var' .=0
    egen `var'_hire=max( `var' ), by( id_impresa st_time)
    gen share_hir_`var'= (`var'_hire / tot_hire)
    }

    quietly bys id_impresa st_time : gen dup = cond(_N==1,0,_n)
    drop if dup>1

    Is this the correct way to obtain what I am interested in?
    thank you!

  • #2
    The code you show refers to variable names that do not appear in your example data. While I think I see the correspondence between the two sets of variable names, rather than trying to understand your code, I have written a few lines that will give you what you are asking for, using your -dataex- example data.

    Code:
    //  VERIFY THERE ARE NO DUPLICATE OBSERVATIONS
    duplicates report
    assert r(N) == r(unique_value)
    
    //  "I would like to know how many workers of each cat is hired by each firm in every quarter"
    collapse (count) hires = id_worker, by(cat id_firm timest)
    
    //  "hence the total number of hirings for each cat for every firm\time combination"
    by cat: egen hires_this_cat = total(hires)
    
    //  AND IT APPEARS YOU ALSO WANT THE SHARE OF HIRES IN EACH CATEGORY
    summ hires, meanonly
    gen share_of_hires_this_cat = hires_this_cat/r(sum)
    By the way, though it does not impact these calculations, you will probably be better off working with a Stata internal format quarterly date variable rather than your string variable timest. Getting that is very easy:
    Code:
    drop timest
    gen timest = qofd(date)
    format timest %tq

    Comment


    • #3
      A quick question: you do have multiple observations of the same worker being hired by the same firm in the same quarter:

      Code:
      duplicates tag id_firm id_worker timest, gen(dup)
      . li if dup, noobs
        +-----------------------------------------------------+
        | id_firm   id_wor~r        date   cat   timest   dup |
        |-----------------------------------------------------|
        |       8    3324773   01jul2018     1   2018q3     1 |
        |       8    3324773   08sep2018     1   2018q3     1 |
        |      32    5971076   02nov2020     1   2020q4     1 |
        |      32    5971076   28nov2020     1   2020q4     1 |
        +-----------------------------------------------------+
      Do you want count such hires separately (in which case the code in #2 should be fine), or treat every person hired however many times in the same quarter, as one hire?

      If the latter, you might want to first do something like:

      Code:
      bysort id_firm id_worker timest: keep if _n == 1
      and then run the code in #2 on this subset of the data.
      Last edited by Hemanshu Kumar; 17 Jan 2023, 09:38.

      Comment


      • #4
        "The code you show refers to variable names that do not appear in your example data"

        I have tried to translate the name of few variable for the dataex example but not for the code, my fault.

        In #2 you suggest me to use -collapse (count)- instead of -contract-; despite the fact that collapse allows more options (that can be usefull to mantain more informations), there is any other practical reason?

        "Do you want count such hires separately (in which case the code in #2 should be fine), or treat every person hired however many times in the same quarter, as one hire?"

        I would like to count each hires separately since I am interested also in the adoption of short contracts in order to determine the hirings dynamics.

        " you will probably be better off working with a Stata internal format quarterly date variable rather than your string variable"

        I have created this weird time variable because I was working with different time span, for quarters and semester I am going to adopt the SIF that is more clear

        Thank you all

        Comment


        • #5
          [quote]In #2 you suggest me to use -collapse (count)- instead of -contract-; despite the fact that collapse allows more options (that can be usefull to mantain more informations), there is any other practical reason?
          [/code]
          No, no other reason. -contract- will also do the job here.

          Comment


          • #6
            thank you again for the suggestions

            Comment

            Working...
            X