Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How many establishments does each firm have?

    Hi Hi,

    I have dataset:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double(NPC_FIC ESTAB_ID) long Sede
    500000001 -8146170000 2
    500000001 -8146170000 2
    500000001 -8146170000 2
    500000002      794390 2
    500000002      794390 2
    500000002      794390 2
    500000002      794390 2
    500000002      794390 2
    500000033      725800 2
    500000033      725800 2
    500000050           5 2
    500000050           5 2
    500000083           8 2
    500000104      721748 2
    500000104      721748 2
    500000119      766713 2
    500000119      766713 2
    500000119      766713 2
    500000119      766713 2
    500000119      766713 2
    500000119      766713 2
    500000119      766713 2
    500000119      766713 2
    500000121      649554 2
    500000121      649554 2
    500000156      932332 2
    500000156      932332 2
    500000157          18 2
    500000157          18 2
    500000157          18 2
    500000165      671147 2
    500000165      671147 2
    500000165      671147 2
    500000180          21 2
    500000198      732861 2
    500000198      732861 2
    500000198      732861 2
    500000201      736025 2
    500000204          25 2
    500000204          25 2
    500000212      722107 2
    500000215      798536 2
    500000215      798536 2
    500000215      798536 2
    500000215      798536 2
    500000240          33 2
    500000240          33 2
    500000276          38 2
    500000279      785658 2
    500000284          42 2
    500000284          42 2
    500000284          42 2
    500000284          42 2
    500000305      646194 2
    500000324      882387 1
    500000334      737994 2
    500000341          52 2
    500000341          52 2
    500000346      929162 2
    500000346      929162 2
    500000346      929162 2
    500000346      929162 2
    500000346      929162 2
    500000376          55 1
    500000376          56 1
    500000376          55 1
    500000376          56 1
    500000376          54 2
    500000376          56 1
    500000376          54 2
    500000376          55 1
    500000376          55 1
    500000395          57 2
    500000395          57 2
    500000395          57 2
    500000395          57 2
    500000395          57 2
    500000395          57 2
    500000465      761201 2
    500000465      761201 2
    500000465      761201 2
    500000470      895709 2
    500000470      895709 2
    500000478      653463 2
    500000525          65 2
    500000543      846232 2
    500000543      846232 2
    500000565      817918 2
    500000583      636304 1
    500000583      636303 2
    500000600          71 1
    500000600          71 1
    500000633          75 2
    500000633          75 2
    500000675      439022 2
    500000709      624582 2
    500000748          79 2
    500000757      796083 2
    500000761      779588 1
    500000761      779588 1
    end
    label values Sede Sede
    label def Sede 1 "N", modify
    label def Sede 2 "S", modify
    I need to determine how many establishments, each firm has. (Number of establishments).
    Firms' identifiers: NPC_FIC
    Establishment ID: ESTAB_ID
    If the firm is Headquarter, Sede= 2 (N)
    If the firm is not Headquarter, Sede= 1 (S)
    Number of observations: 3,432,821

    Appreciate any help.
    Paris

  • #2
    Code:
    by NPC_FIC (ESTAB_ID), sort: gen n_establishments = sum(ESTAB_ID != ESTAB_ID[_n-1])
    by NPC_FIC: replace n_establishments = n_establishments[_N]

    Comment


    • #3
      Thank you so much for the codes.


      tab n_establishments

      n_establish |
      ments | Freq. Percent Cum.
      ------------+-----------------------------------
      0 | 910,273 32.96 32.96
      1 | 1,850,359 67.01 99.97
      2 | 756 0.03 100.00
      ------------+-----------------------------------
      Total | 2,761,388 100.00

      How is possible a firm with 0 establishment? Each firm has at least one establishment.

      Comment


      • #4
        I would check the code you actually used, as the code of Clyde Schechter looks solid to me and consistent with other recipes as reviewed in dm0042 from Stata Journal.

        Comment


        • #5
          Using your example data
          Code:
          . by NPC_FIC (ESTAB_ID), sort: gen n_establishments = sum(ESTAB_ID != ESTAB_ID[_n-1])
          
          . by NPC_FIC: replace n_establishments = n_establishments[_N]
          (7 real changes made)
          
          . tab n_establishments
          
          n_establish |
                ments |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                    1 |         89       89.00       89.00
                    2 |          2        2.00       91.00
                    3 |          9        9.00      100.00
          ------------+-----------------------------------
                Total |        100      100.00
          But if perhaps you have in your data observations for which the ESTAB_ID is a missing value, the code in post #2 needs to be changed, but how it is changed depends on how you want to treat observations with missing ESTAB_ID.

          In particular, the code in post #2 will create n_establishments=0 for any NPC_FIC for which all values of ESTAB_ID are missing.

          Comment


          • #6
            William Lisowski is correct about missing values.

            Comment


            • #7
              I got confounded a bit. The number of establishments must be greater than the number of firms. The number of firms might be around 240000.
              with the new data (I had to omit missing data):
              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input float n_establishments double(NPC_FIC ESTAB_ID)
              1 500000001 -8146170000
              1 500000001 -8146170000
              1 500000001 -8146170000
              1 500000002      794390
              1 500000002      794390
              1 500000002      794390
              1 500000002      794390
              1 500000002      794390
              1 500000033      725800
              1 500000033      725800
              1 500000050           5
              1 500000050           5
              1 500000083           8
              1 500000104      721748
              1 500000104      721748
              1 500000119      766713
              1 500000119      766713
              1 500000119      766713
              1 500000119      766713
              1 500000119      766713
              1 500000119      766713
              1 500000119      766713
              1 500000119      766713
              1 500000121      649554
              1 500000121      649554
              1 500000156      932332
              1 500000156      932332
              1 500000157          18
              1 500000157          18
              1 500000157          18
              1 500000165      671147
              1 500000165      671147
              1 500000165      671147
              1 500000180          21
              1 500000198      732861
              1 500000198      732861
              1 500000198      732861
              1 500000201      736025
              1 500000204          25
              1 500000204          25
              1 500000212      722107
              1 500000215      798536
              1 500000215      798536
              1 500000215      798536
              1 500000215      798536
              1 500000240          33
              1 500000240          33
              1 500000276          38
              1 500000279      785658
              1 500000284          42
              1 500000284          42
              1 500000284          42
              1 500000284          42
              1 500000305      646194
              1 500000324      882387
              1 500000334      737994
              1 500000341          52
              1 500000341          52
              1 500000346      929162
              1 500000346      929162
              1 500000346      929162
              1 500000346      929162
              1 500000346      929162
              3 500000376          54
              3 500000376          54
              3 500000376          55
              3 500000376          55
              3 500000376          55
              3 500000376          55
              3 500000376          56
              3 500000376          56
              3 500000376          56
              1 500000395          57
              1 500000395          57
              1 500000395          57
              1 500000395          57
              1 500000395          57
              1 500000395          57
              1 500000465      761201
              1 500000465      761201
              1 500000465      761201
              1 500000470      895709
              1 500000470      895709
              1 500000478      653463
              1 500000525          65
              1 500000543      846232
              1 500000543      846232
              1 500000565      817918
              2 500000583      636303
              2 500000583      636304
              1 500000600          71
              1 500000600          71
              1 500000633          75
              1 500000633          75
              1 500000675      439022
              1 500000709      624582
              1 500000748          79
              1 500000757      796083
              1 500000761      779588
              1 500000761      779588
              end




              I need to show that among X number of the firms Y number of establishments exist. So how the numbers X & Y can be computed?
              Attached Files

              Comment


              • #8
                Perhaps the distinct command is what you want, the example below applies it to the data in post #7. After installing it, read the output of help distinct for instructions on its use.
                Code:
                . * one time only
                . ssc install distinct
                checking distinct consistency and verifying not already installed...
                installing into /Users/lisowskiw/Library/Application Support/Stata/ado/plus/...
                installation complete.
                
                . 
                . distinct NPC_FIC ESTAB_ID
                
                          |        Observations
                          |      total   distinct
                ----------+----------------------
                  NPC_FIC |        100         42
                 ESTAB_ID |        100         45
                
                .

                Comment


                • #9
                  Thank you William for getting back to me. Actually, I am not seeking distinct.

                  Comment


                  • #10
                    I need to show that among X number of the firms Y number of establishments exist. So how the numbers X & Y can be computed?
                    It's not entirely clear what you mean by that. But perhaps it's this:

                    Code:
                    by NPC_FIC (ESTAB_ID), sort: gen n_establishments = sum(ESTAB_ID != ESTAB_ID[_n-1])
                    by NPC_FIC: replace n_establishments = n_establishments[_N]
                    egen firm_flag = tag(NPC_FIC)
                    tab n_establishments if firm_flag

                    Comment


                    • #11
                      Actually, I am not seeking distinct.
                      Do you understand that your response is of no help to me in figuring out why you think you are not seeking distinct, why the results I show are incorrect, or what results you expect? How am I supposed to respond to that?

                      I need to show that among X number of the firms Y number of establishments exist.
                      Post #8 tells us that in your 100 observations of sample data among 42 firms 45 establishments exist.

                      Comment


                      • #12
                        Originally posted by William Lisowski View Post

                        Do you understand that your response is of no help to me in figuring out why you think you are not seeking distinct, why the results I show are incorrect, or what results you expect? How am I supposed to respond to that?

                        Because:

                        Code:
                        .  * one time only
                        
                        . 
                        . . ssc install distinct
                        checking distinct consistency and verifying not already installed...
                        installing into C:\Users\35193\ado\plus\...
                        installation complete.
                        
                        . distinct NPC_FIC ESTAB_ID
                        
                                  |        Observations
                                  |      total   distinct
                        ----------+----------------------
                          NPC_FIC |    2214192     249530
                         ESTAB_ID |    2214192     291638
                        end

                        The total number of firms is equal to 249,530

                        Post #8 tells us that in your 100 observations of sample data among 42 firms 45 establishments exist.
                        Yes, that's exactly what I need. So what's the prescription for some million observations?

                        I used #2 code by adding one line, surely there are more clever ways.

                        Code:
                        . by NPC_FIC (ESTAB_ID), sort: gen n_establishments = sum(ESTAB_ID != ESTAB_ID[_n-1])
                        
                        . by NPC_FIC: replace n_establishments = n_establishments[_N]
                        (687,082 real changes made)
                        
                        . bysort NPC_FIC n_establishments: keep if _n == _N 
                        (1,964,662 observations deleted)
                        
                        . sum n_establishments
                        
                            Variable |        Obs        Mean    Std. dev.       Min        Max
                        -------------+---------------------------------------------------------
                        n_establis~s |    249,530    1.168749    2.790426          1        992
                        end


                        Comment


                        • #13
                          Code:
                          . distinct NPC_FIC ESTAB_ID
                          
                                    |        Observations
                                    |      total   distinct
                          ----------+----------------------
                            NPC_FIC |    2214192     249530
                           ESTAB_ID |    2214192     291638
                          tells us that in your 2,214,192 observations of data from 249,530 firms, 291,638 establishments exist.

                          The code in post #12 confirms that you have 1,964,662 (dropped observations) + 249,530 (kept observations) = 2,214,192 observations and that the number of firms is 249,530.

                          I still do not understand why you say you are not seeking distinct, which seems to give the answers you desire in the column headed "distinct".
                          Last edited by William Lisowski; 02 Jan 2023, 14:40.

                          Comment


                          • #14

                            You are right. Let me thank you again. The only issue is that Distinct does not show Min/Max/ Std dev.
                            I really appreciate your help William, Clyde, and Nick.

                            Comment

                            Working...
                            X