Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating a mean of previous values within a time period

    Hello, all!

    I have an unbalanced panel dataset from Internet Movie Database. I am trying to compute the mean gross of a filmmaker's movies over the three years prior to the release of a particular film. Here is a sample of the dataset:

    title talent fdate loggedadjgross
    A Fine Mess (1986) Adams, Tony (I) 08aug1986 15.32454586029
    That's Life! (1986) Adams, Tony (I) 10sep1986 14.93389987946
    Blind Date (1987) Adams, Tony (I) 24mar1987 17.1725769043
    Sunset (1988) Adams, Tony (I) 29apr1988 14.98368549347
    Skin Deep (1989) Adams, Tony (I) 03mar1989 16.37933731079
    Switch (1991) Adams, Tony (I) 10may1991 16.04848480225
    Son of the Pink Panther (1993) Adams, Tony (I) 27aug1993 14.15256595612
    Shrek (2001) Adamson, Andrew 22apr2001 18.6501750946
    Shrek 2 (2004) Adamson, Andrew 15may2004 19.07325172424
    The Chronicles of Narnia: The Lion the Witch and the Wardrobe (2005) Adamson, Andrew 07dec2005 18.62376976013
    Shrek the Third (2007) Adamson, Andrew 06may2007 18.66974639893
    Ballast (2008) Adamson, Andrew 19jan2008 10.32473182678
    The Chronicles of Narnia: Prince Caspian (2008) Adamson, Andrew 09may2008 17.80108451843
    Shrek Forever After (2010) Adamson, Andrew 21apr2010 18.32175827026
    The Chronicles of Narnia: The Voyage of the Dawn Treader (2010) Adamson, Andrew 02dec2010 17.49599838257
    Puss in Boots (2011) Adamson, Andrew 23oct2011 17.82677841187
    Cirque du Soleil: Worlds Away (2012) Adamson, Andrew 20oct2012 15.31925678253

    So, for example, for the observation for Adamson's Balast, I want to calculate the mean of his returns for Shrek the Third and Chronicles of Narnia, but not the projects before those or after Balast. I would like to define the three-year window based upon the actual three-years prior to the release of the film (in the example, January 18, 2005 to January 18, 2008).

    As a newcomer to working with Stata, the steps surrounding might be quite elementary. So, I apologize for the potential low degree of difficulty. Thanks in advance!

    Addendum: For people with shorter than three year careers, the resulting figure would cover all of their prior output. The prior three years is the limit for this window.
    Last edited by Pat Reilly; 16 Jul 2015, 18:08.

  • #2
    Hey Pat

    Sounds like you want to do something like:

    sort movieid firm
    bysort firm: gen previousyrs=(sales[_n-1] + sales[_n-2] + sales[_n-3])/3

    although that should do it, check two by hand to make sure

    Comment


    • #3
      Hey, Christos!

      Thanks for your suggestion! Though that is approach that I took for a filmmaker's prior three films, I have trouble doing something like this that is inclusive of their total work over a prior time period. The main problem is that there is incredible variance in how prolific people are. There are some people (particularly film producers) that will work on multiple films in a year and, at times, take a year off. However, for certain directors, they may do just one film within a three year window. Also, the individual "windows" within an individual's respective group of films are frequently not discrete; they can overlap. What I would like to do is to have an average that weighs all of the films a person did in the previous three years to a particular film equally.

      Apologies if my previous post was a bit unclear.
      Last edited by Pat Reilly; 16 Jul 2015, 18:48.

      Comment


      • #4
        Hi Pat & Christos,

        Yeah, the key is to get the time period in a way where stata can easily calculate what going back 3 years is in a loop. Since the units here are actor + film, 3 years could be no work for many actors and a bunch of roles for Samuel L. Jackson.

        1) I'd start by converting fdate from dmy format to a single number, so stata can calculate going 365*3 days back. I recommend creating a new variable, treating the first day of observations as 1 and counting from there. (I had to do something similar to create 15 minute time periods for months of data scraping.) Let's call it ndate.

        2) sort talent ndate

        3) gen filmdummy=1

        This will eventually be used in the denominator to get an average per film.

        4) create a forvalues loop. Ideally this will fill in every prior film by the talent within the right set of dates

        gen priortotgross=0

        forvalues l=1/8 {
        replace priortotgross=priortotgross+loggedadjgross[_n-`l'] if talent==talent[_n-`l'] & (ndate-365*3)<ndate[_n-`l']
        }

        If this loop works correctly, it will go back and look at the last 8 data points and add the gross together, if the talent is the same as the prior data point and the date is close enough. One potential problem is the (ndate-365*3) code. That's a guess. You may want to create a lagged variable ndate_l3=ndate-365*3 if that fails. (Create it as step 3b.)

        A good way to test this code is to look for someone who has multiple films but more than a 3 year gap. If the code works, their priortotgross should be zero. If it's a number greater than 0, the ndate code failed. Also, everyone's first film should automatically be set to zero.

        I assumed no one has more than 8 films in three years. If that is wrong, change the 8 in 1/8 to a more appropriate number. That's a lowercase l for the macro, as in lag.

        5) Obviously, the first loop is getting total gross, not an average. So we need another loop to count films.

        gen films_l3=0

        forvalues l=1/8 {
        replace films_l3=films_l3+filmdummy[_n-`l'] if talent==talent[_n-`l'] & (ndate-365*3)<ndate[_n-`l']
        }

        6) Get an average

        gen avegross_l3=priortotgross/films_l3

        Let me know if this works!

        Best,
        Noah

        Comment


        • #5
          Guh, the tabs in the "replace" lines didn't copy paste for clarity. Sigh.

          Comment


          • #6
            use the "code" blocks - read the FAQ to find out what I mean and how to do it

            Comment


            • #7
              Thanks, Noah! It worked like a champ!

              Comment


              • #8
                Hi Noah,

                As I was reading your post today I found it really helpful, thanks a lot! I do however have a kind of follow up question. Besides the average I want to know the median as well and to be honest, I have no idea how to do this. Could you or anyone else help me on this?

                Best,
                Max

                Comment


                • #9
                  To generalize Noah's approach, you can loop until no more observations fall within the time period. You can also create variables to hold the lagged values which make it possible to use egen row functions to calculate summary statistics.

                  Code:
                  * please use -dataex- (from SSC) to generate data examples that can be 
                  * worked with
                  clear
                  input str68 title str15 talent double loggedadjgross float fdate
                  "A Fine Mess (1986)" "Adams, Tony (I)" 15.324546 9716 
                  "That's Life! (1986)" "Adams, Tony (I)" 14.9339 9749 
                  "Blind Date (1987)" "Adams, Tony (I)" 17.172577 9944 
                  "Sunset (1988)" "Adams, Tony (I)" 14.983685 10346 
                  "Skin Deep (1989)" "Adams, Tony (I)" 16.379337 10654 
                  "Switch (1991)" "Adams, Tony (I)" 16.048485 11452 
                  "Son of the Pink Panther (1993)" "Adams, Tony (I)" 14.152566 12292 
                  "Shrek (2001)" "Adamson, Andrew" 18.650175 15087 
                  "Shrek 2 (2004)" "Adamson, Andrew" 19.073252 16206 
                  "The Chronicles of Narnia: The Lion the Witch and the Wardrobe (2005)" "Adamson, Andrew" 18.62377 16777 
                  "Shrek the Third (2007)" "Adamson, Andrew" 18.669746 17292 
                  "Ballast (2008)" "Adamson, Andrew" 10.324732 17550 
                  "The Chronicles of Narnia: Prince Caspian(2008)" "Adamson, Andrew" 17.801085 17661 
                  "Shrek Forever After (2010)" "Adamson, Andrew" 18.321758 18373 
                  "The Chronicles of Narnia: The Voyage of the Dawn Treader (2010)" "Adamson, Andrew" 17.495998 18598 
                  "Puss in Boots (2011)" "Adamson, Andrew" 17.826778 18923 
                  "Cirque du Soleil: Worlds Away (2012)" "Adamson, Andrew" 15.319257 19286 
                  end
                  format %td fdate
                  
                  * keep generating lagged version until no new values are found
                  * within the desired time window
                  sort talent fdate loggedadjgross title
                  local more 1
                  local i 0
                  local vtype : type loggedadjgross
                  local window = 365 * 3
                  while `more' {
                      by talent: gen `vtype' pv_`++i' = loggedadjgross[_n-`i'] ///
                          if (fdate - fdate[_n-`i']) < `window'
                      qui count if !mi(pv_`i')
                      local more = r(N)
                  }
                  
                  * use the egen row functions to generate statistics
                  egen nvals = rownonmiss(pv_*)
                  egen totgross = rowtotal(pv_*)
                  gen mgross1 = totgross / nvals
                  egen mgross2 = rowmean(pv_*)
                  egen medgross = rowmedian(pv_*)
                  
                  list talent-fdate m*, sepby(talent) noobs

                  Comment


                  • #10
                    Hi Robert,

                    Thanks a lot for your respons, it works perfectly for the problem described by Pat. I understand most of what you're doing but unfortunately my knowledge of Stata is not sufficient to understand all code you've written. And since my case is similar to Pat's but not exactly the same I'm running into to a small problem. What I want is a time window which is 3 years before the observation and 3 years afterwards. I've tried to add this after your date requirement:

                    Code:
                    | (DateAnn[_n+`i'] - DateAnn) < `window'
                    Normally I would use something like +f for a forward lag but this doesn't work and the above doesn't work either.

                    Further, I want the current observation to count as well. So instead or prior to the release date, I want to include the current release date as well. I've tried (it doesn't work):

                    Code:
                    by tsic: gen `vtype' pv_`++i' = FSPUebitdam[_n-`i'] + FSPUebitdam///
                    Since I don't fully understand the steps you are doing I don't know how to solve this. I hope someone can help me.

                    Best,

                    Max

                    Edit: I'm thinking about adding another requirement. So the assets of the previous observations have to be within a range(window) of (- or +)1.5x the assets of the current observation. How can I add another requirement?
                    Last edited by Max Kruitwagen; 26 Jul 2015, 04:15.

                    Comment


                    • #11
                      Since your problem is different from the one in this thread, why don't you start a new thread and include a data example (preferably created using dataex (from SSC) that is representative of your data. If would also be useful to know the size of your data, both in terms of the number of panels and number observations.

                      Comment


                      • #12
                        Thanks for your help Robert, I started a new threat.

                        Comment

                        Working...
                        X