Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • bysort year : egen x=count() / bysort year : egen x=mean()

    Hello Statalist colleagues,

    I hope you are all staying healthy.

    I had a question while struggling with the data to create three series called Depth1 and Depth2 and Breadth. I was thinking of making this manually as I got stuck in doing this, but wanted to reach out to statalist before that.
    I thought this could be somehow done with bysort egen count and this sort of codes. But failed.. It would be great if I could get some ideas from more proficient fellows.

    My dataset looks like the following below.

    What I want to create are:

    Depth1: In each set of year and panelkey, I'd like to show the count of the countrycode that appears the most. For example, in 2006 for panelkey 102634, 123 appears the most and it appears 3 times. So I want to assign 3 to all these observations.
    Depth2: In each set of year and panelkey, I'd like to show sum(count of different country code)/ number of different country code in a year and panelkey. For example, in 2009 for panel 5152235, 107 appears 7 times and 128 appears 2 times. And the number of different countrycode is 2. So I want to have (7+2)/2 in the Depth2 series for this observation.
    Breadth: In addition, it would be great if I could also know how I can generate a series for counting different number of country codes within a year and panelkey set. So for 2006 and panelkey 102634, there are two different countries 123 and 128. So I'd like to have 2 in this observation. This is also the denominator in Depth2 calculation.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    year panelkey countrycode Depth1 Depth2 Breadth
    2006  102634 123 3 (3+1)/2 2
    2006  102634 123 3 (3+1)/2 2
    2006  102634 123 3 (3+1)/2 2
    2006  102634 128 3 (3+1)/2 2
    2008  102634 123 2 (2+1)/2 2
    2008  102634 123 2 (2+1)/2 2
    2008  102634 128 2 (2+1)/2 2
    .
    .
    .
    2017  102634 123 1 (1+1)/2 2
    2017  102634 128 1 (1+1)/2 2
    2018  102634 123 1 (1+1)/2 2
    2018  102634 128 1 (1+1)/2 2
    2006 1122135 301 1 1/1 1
    2007 1122135 301 1 1/1 1
    2008 1122135 301 1 1/1 1
    2009 1122135 301 1 1/1 1
    .
    .
    
    2009 5152235 107 7 (7+2)/2 2
    2009 5152235 107 7 (7+2)/2 2
    2009 5152235 107 7 (7+2)/2 2
    2009 5152235 107 7 (7+2)/2 2
    2009 5152235 107 7 (7+2)/2 2
    2009 5152235 107 7 (7+2)/2 2
    2009 5152235 107 7 (7+2)/2 2
    2009 5152235 128 7 (7+2)/2 2
    2009 5152235 128 7 (7+2)/2 2
    .
    .
    .
    end
    Thank you so much for your information.

    Thanks!


    Last edited by Ed Suh; 03 Jun 2021, 22:18.

  • #2
    Code:
    bysort panelkey year countrycode: gen x = _N 
    by panelkey year countrycode: gen n = _n == 1
    by panelkey year: gen breadth = sum(n)
    by panelkey year: replace breadth = breadth[_N] 
    by panelkey year: egen depth1 = max(x)
    by panelkey year: egen depth2 = total(cond(n==1,x,.))
    replace depth2 = depth2 / breadth
    drop x n

    Comment


    • #3
      Hi Ali,

      thank you so much for your quick respsonse. I saw this reply just now. This works perfectly. I should upgrade my skills definitely. I didn't know there is such code as total(codn(n==1,x,.).

      I would figure out what that means and internalize the skills.

      Thank you again,
      Ed

      Comment


      • #4
        That code limits the range of values which the egen function should sum up to those in which variable n is equal to 1. If the condition n==1 is true for a given observation, the value of x in that observation is included in the total. If not, a missing value is substituted.

        Comment


        • #5

          Here are some comments, ideally to complement the very helpful answer from Ali Atia.

          Thanks for the data example, which was indeed produced by dataex -- until you messed it up by adding extra stuff!

          Anyone interested could start with the version below.

          Code:
          * Example generated by -dataex-. 
          clear
          input year panelkey countrycode 
          2006  102634 123 
          2006  102634 123 
          2006  102634 123 
          2006  102634 128 
          2008  102634 123 
          2008  102634 123 
          2008  102634 128 
          2017  102634 123 
          2017  102634 128 
          2018  102634 123 
          2018  102634 128 
          2006 1122135 301 
          2007 1122135 301 
          2008 1122135 301 
          2009 1122135 301 
          2009 5152235 107 
          2009 5152235 107 
          2009 5152235 107 
          2009 5152235 107 
          2009 5152235 107 
          2009 5152235 107 
          2009 5152235 107 
          2009 5152235 128 
          2009 5152235 128 
          end

          Of your questions,

          Depth1: In each set of year and panelkey, I'd like to show the count of the countrycode that appears the most. For example, in 2006 for panelkey 102634, 123 appears the most and it appears 3 times. So I want to assign 3 to all these observations.
          This is the frequency of the mode. In terms of frequency, it doesn't bite if there are ties for mode, but in terms of what the mode is, it does bite. egen does have a mode() function, but doing what you want directly is better, as you want its frequency. Here is my take.

          Code:
          bysort panelkey year countrycode : gen freq = _N 
          bysort panelkey year (freq) : gen mode_count = freq[_N]

          Depth2: In each set of year and panelkey, I'd like to show sum(count of different country code)/ number of different country code in a year and panelkey. For example, in 2009 for panel 5152235, 107 appears 7 times and 128 appears 2 times. And the number of different countrycode is 2. So I want to have (7+2)/2 in the Depth2 series for this observation.
          That looks like

          Code:
           
          egen tag = tag(panelkey year countrycode)
          by panelkey year: egen mean_freq = mean(cond(tag, freq, .))
          Breadth: In addition, it would be great if I could also know how I can generate a series for counting different number of country codes within a year and panelkey set. So for 2006 and panelkey 102634, there are two different countries 123 and 128. So I'd like to have 2 in this observation. This is also the denominator in Depth2 calculation.
          That is now directly available:

          Code:
           
          by panelkey year : egen ndistinct = total(tag)
          See

          https://www.stata-journal.com/articl...article=dm0055

          https://www.stata-journal.com/articl...article=dm0042


          for a round-up of some technique.

          Comment

          Working...
          X