Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Average time between artists' album releases in panel data

    Dear Statalist.
    I have a panel data set containing artists, album releases by artists, and the year of each album release.
    I need to calculate a variable "wanted" that shows the average time between the artists' album releases in the data set.
    I am trying to get a sense of the average time an artist spends in between releases.
    Some artists only release a single album.

    The following toy data set shows the structure of the data:
    artist_id album_id year
    1 1 2000
    1 2 2003
    1 3 2007
    2 4 1996
    3 5 2005
    3 6 2010
    3 7 2011
    Any and all suggestions on this problem would be greatly appreciated.
    Thank you!

    Kind regards,
    Erik



  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(artist_id album_id) int year
    1 1 2000
    1 2 2003
    1 3 2007
    2 4 1996
    3 5 2005
    3 6 2010
    3 7 2011
    end
    
    bysort artist_id (year): gen diff = year - year[_n-1]
    egen mean_diff = mean(diff), by(artist_id)
    list, sepby(artist_id) noobs
    Code:
    . list, sepby(artist_id) noobs
    
      +----------------------------------------------+
      | artist~d   album_id   year   diff   mean_d~f |
      |----------------------------------------------|
      |        1          1   2000      .        3.5 |
      |        1          2   2003      3        3.5 |
      |        1          3   2007      4        3.5 |
      |----------------------------------------------|
      |        2          4   1996      .          . |
      |----------------------------------------------|
      |        3          5   2005      .          3 |
      |        3          6   2010      5          3 |
      |        3          7   2011      1          3 |
      +----------------------------------------------+

    Comment


    • #3
      Always present data examples using dataex as per the FAQs. There is only one way to calculate the arithmetic mean, unless you have any other definitions.

      Code:
      bys artist_id (year): gen avg_duration= (year[_N]-year[1])/(_N-1) if _N>1
      Last edited by Andrew Musau; 01 Nov 2019, 04:12.

      Comment


      • #4
        Please use dataex to show data examples (FAQ Advice #12).

        Does this help?

        Code:
        clear
        input artist_id album_id year
        1 1 2000
        1 2 2003
        1 3 2007
        2 4 1996
        3 5 2005
        3 6 2010
        end
        
        bysort artist_id: gen lag = year - year[_n-1]
        egen wanted = mean(lag), by(artist_id)
        egen nlags = count(lag), by(artist_id)
        label var nlags "# lags"
        label var wanted "mean lag (year)"
        tabdisp artist_id, c(nlags wanted)
        
        --------------------------------------------
        artist_id |          # lags  mean lag (year)
        ----------+---------------------------------
                1 |               2              3.5
                2 |               0                
                3 |               1                5
        --------------------------------------------
        Last edited by Nick Cox; 01 Nov 2019, 04:10.

        Comment


        • #5
          Dear Wouter, Andrew and Nick.

          Thank you so much!
          It is very useful to see three different solutions/approaches to this problem.
          I will try your suggestions on my full dataset and report back.

          I apologize for not using dataex in this post, I will do so from now on.

          Sincerely,
          Erik

          Comment

          Working...
          X