Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating average number of market overlap with other firms

    Code:
    clear
    input float firm_id str6 market
    1 "a"
    1 "b"
    1 "c"
    1 "d"
    2 "b"
    2 "c"
    2 "f"
    3 "a"
    3 "b"
    4 "c"
    end
    Using the example dataset above, I am trying to count the number of overlapping markets that each firm has. In the above example dataset, a firm with firm_id "1" is operating in 4 markets: "a", "b", "c", and "d". A firm with firm_id "2" is operating in 3 markets: "b", "c", and "f". Between these two firms, "1" and "2", they overlap in 2 markets, "b" and "c".

    What I would like to accomplish ultimately is generating a column variable that takes the average number of overlapping markets that each firm has with other firms in the dataset. Let's call this column avg_n_overlap. Using firm "1" as an example, firm "1" has 2 overlapping markets with firm "2", 2 overlapping markets with firm "3", and 1 overlapping market with firm "4". Therefore, after taking the average of these numbers, firm "1" will have the value 1.6666 (=(2+2+1)/3).

    After calculating this avg_n_overlap, the final dataset that I am trying to create would look like below:

    Code:
    clear
    input float firm_id str6 market float avg_n_overlap
    1 "a" 1.6666666
    1 "b" 1.6666666
    1 "c" 1.6666666
    1 "d" 1.6666666
    2 "b" 1.3333334
    2 "c" 1.3333334
    2 "f" 1.3333334
    3 "a"         1
    3 "b"         1
    4 "c"  .6666667
    end
    I hope my question makes sense. I would like to write a macro code that accomplishes the task explained above, but I am a bit stuck at this point, and I would really appreciate any help!

  • #2
    should avg_n_overlap = 1.5 for firm 3 (2 overlapping markets with firm 1 and 1 overlapping market with firm 2 = (2 + 1) / 2 = 1.5)?
    Code:
    tempfile x
    save `x'
    gen n=1
    reshape wide n, i(market) j(firm_id)
    joinby market using `x'
    collapse (sum) n*, by(firm_id)
    reshape long n, i(firm_id)
    collapse (sum) a = n (count) b = n if n & firm_id != _j, by(firm_id)
    gen avg_n_overlap = a / b
    merge 1:m firm_id using `x'
    Code:
           firm_id   market   avg_n_overlap  
      1.         1        a        1.666667  
      2.         1        b        1.666667  
      3.         1        c        1.666667  
      4.         1        d        1.666667  
      5.         2        b        1.333333  
      6.         2        c        1.333333  
      7.         2        f        1.333333  
      8.         3        a             1.5  
      9.         3        b             1.5  
     10.         4        c               1
    Last edited by Øyvind Snilsberg; 21 Nov 2022, 02:27.

    Comment


    • #3
      This code assumes that -- as in your data example -- values of market are words in Stata's sense, meaning that they do not contain spaces. If not, then use strtoname() first, or just replace spaces by underscores or other punctuation.

      The fractions 1/3 and 2/3 here round following the usual rules, not as in #1.

      Code:
      clear
      input float firm_id str6 market
      1 "a"
      1 "b"
      1 "c"
      1 "d"
      2 "b"
      2 "c"
      2 "f"
      3 "a"
      3 "b"
      4 "c"
      end
      
      bysort firm_id : gen profile = market[1] if _n == 1
      by firm_id : replace profile = profile[_n-1] + " " + market if _n > 1
      by firm_id : replace profile = profile[_N]
      
      save maindata
      
      keep firm_id profile
      by firm_id : keep if _n == 1
      
      local max = _N
      local maxM1 = _N - 1
      gen overlap = 0
      forval i = 1/`maxM1' {
          local I = `i' + 1
          forval j = `I'/`max' {
              local A = profile[`i']
              local B = profile[`j']
              local inter : list A & B
              local count : list sizeof inter
              replace overlap = overlap + `count' in `i'
              replace overlap = overlap + `count' in `j'
          }
      }
      
      replace overlap = overlap / `maxM1'
      
      merge 1:m firm_id using maindata
      
      drop _merge
      
      sort firm_id market
      
      list, sepby(firm_id)
          +---------------------------------------+
           | firm_id   profile    overlap   market |
           |---------------------------------------|
        1. |       1   a b c d   1.666667        a |
        2. |       1   a b c d   1.666667        b |
        3. |       1   a b c d   1.666667        c |
        4. |       1   a b c d   1.666667        d |
           |---------------------------------------|
        5. |       2     b c f   1.333333        b |
        6. |       2     b c f   1.333333        c |
        7. |       2     b c f   1.333333        f |
           |---------------------------------------|
        8. |       3       a b          1        a |
        9. |       3       a b          1        b |
           |---------------------------------------|
       10. |       4         c   .6666667        c |
           +---------------------------------------+
      See also https://journals.sagepub.com/doi/epu...36867X20909698 for concatenating string values over observations.

      EDIT: #2 got there first. My concern in writing mine was to avoid an explosion in file sizes that isn't really needed.

      Also, does the real problem also involve data in time?
      Last edited by Nick Cox; 21 Nov 2022, 02:43.

      Comment


      • #4
        Thank you both for the answers! I will go through line by line to study what I should have done. (@Øyvind Snilsberg, yes you are right, I made an error in the example calculation.)

        @Nick Cox: Thank you. "market" variable does actually contain spaces, and I will be sure to deal with this before using your code. The actual problem also does involve data in time. In other words, I want to calculate this overlap measure for each firm-year. Should I do something different in this case?

        Comment


        • #5
          Surely you need to change the code if you want to monitor changes, as the code shown in #2 and #3 knows and says nothing about time.

          This code includes instances of 0 overlap in the calculations.


          Code:
          clear
          input float year firm_id  str6 market
          2020 1 "a"
          2020 1 "b"
          2020 1 "c"
          2020 1 "d"
          2020 2 "b"
          2020 2 "c"
          2020 2 "f"
          2020 3 "a"
          2020 3 "b"
          2020 4 "c"
          2021 1 "a"
          2021 1 "b"
          2021 1 "c"
          2021 1 "d"
          2021 2 "b"
          2021 2 "c"
          2021 2 "f"
          2021 3 "a"
          2021 3 "b"
          2021 4 "c"
          2021 4 "d" 
          end
          
          bysort year firm_id (market): gen profile = market[1] if _n == 1
          by year firm_id: replace profile = profile[_n-1] + " " + market if _n > 1
          by year firm_id: replace profile = profile[_N]
          
          save maindata, replace 
          
          keep year firm_id profile
          by year firm_id: keep if _n == 1
          gen long obsno = _n 
          gen overlap = 0 
          
          egen y = group(year) 
          su y, meanonly 
          
          
          forval y = 1/`r(max)' { 
          
          su obsno if y == `y', meanonly 
          local maxM1 = r(N) - 1 
          local first = r(min) 
          local last = r(max) 
          local lastM1 = r(max) - 1 
          
          forval i = `first'/`lastM1' {
              local I = `i' + 1
              forval j = `I'/`last' {
                  local A = profile[`i']
                  local B = profile[`j']
                  local inter : list A & B
                  local count : list sizeof inter
                  replace overlap = overlap + `count' in `i'
                  replace overlap = overlap + `count' in `j'
              }
          }
          
          replace overlap = overlap / `maxM1' in `first'/`last' 
          
          } 
          
          merge 1:m firm_id year using maindata
          
          drop _merge
          
          sort year firm_id market
          
          list, sepby(year firm_id)
          
               +----------------------------------------------------------+
               | year   firm_id   profile   obsno    overlap   y   market |
               |----------------------------------------------------------|
            1. | 2020         1   a b c d       1   1.666667   1        a |
            2. | 2020         1   a b c d       1   1.666667   1        b |
            3. | 2020         1   a b c d       1   1.666667   1        c |
            4. | 2020         1   a b c d       1   1.666667   1        d |
               |----------------------------------------------------------|
            5. | 2020         2     b c f       2   1.333333   1        b |
            6. | 2020         2     b c f       2   1.333333   1        c |
            7. | 2020         2     b c f       2   1.333333   1        f |
               |----------------------------------------------------------|
            8. | 2020         3       a b       3          1   1        a |
            9. | 2020         3       a b       3          1   1        b |
               |----------------------------------------------------------|
           10. | 2020         4         c       4   .6666667   1        c |
               |----------------------------------------------------------|
           11. | 2021         1   a b c d       5          2   2        a |
           12. | 2021         1   a b c d       5          2   2        b |
           13. | 2021         1   a b c d       5          2   2        c |
           14. | 2021         1   a b c d       5          2   2        d |
               |----------------------------------------------------------|
           15. | 2021         2     b c f       6   1.333333   2        b |
           16. | 2021         2     b c f       6   1.333333   2        c |
           17. | 2021         2     b c f       6   1.333333   2        f |
               |----------------------------------------------------------|
           18. | 2021         3       a b       7          1   2        a |
           19. | 2021         3       a b       7          1   2        b |
               |----------------------------------------------------------|
           20. | 2021         4       c d       8          1   2        c |
           21. | 2021         4       c d       8          1   2        d |
               +----------------------------------------------------------+

          Comment


          • #6
            Nick Cox Of course, I was thinking about creating a "firm_id_year" column before using the code that you provided before. I am assuming that this approach should work similarly?

            Thank you so much for this additional code that would take into account change over time!

            Comment


            • #7
              Indeed, a joint variable (you say "column") that was a composite of identifier and year could be used. But then you'd need to make sure that you were only comparing within the same year. The code in #5 is fiddly but feel free to rewrite it.

              Comment


              • #8
                Right, of course. Thank you for the insightful comment and all the help!

                Comment

                Working...
                X