Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating a dummy variable if revenue is within top 20% for a given year

    Hi,

    I have a dataset from the video game industry containing information about the revenue of video games, their rating and the release date (between 1995 and 2009). Now I want to generate a dummy variable named top_performer_rev that equals 1 if the game is within the top 20% for the game's release year based on its revenue and another variable, top_perfomer_qu that equals 1 if the game is within the top 20% measured by its rating, again for each release year.
    I am not sure how to do that.

    generate top_performer_rev = 0
    replace top_performer_rev = 1 if ...?

    I would really appreciate any help!

    Thanks in advance,
    Wani


  • #2

    Assuming no missing values, then

    Code:
    bysort year (revenue): gen byte wanted = _n > (0.8 * _N)
    indicates some technique.

    Comment


    • #3
      Thank you so much for the answer! I tried it with

      bysort release_date (revenue): gen byte top_performer_rev = _n > (0.8 * _N)
      replace top_performer_rev = . if missing(revenue)

      But the number of top performers is only 336 out of 9,015 (3%) so something doesn't seem quite right. Is it because the release_date is in the yearmonth format? How can I fix it?

      Comment


      • #4
        If revenue is ever missing, then the code needs to be different.


        Code:
        gen byte OK = !missing(revenue)
        bysort OK release_date (revenue) : gen wanted = _n > (0.8 * _N) if OK

        Comment


        • #5
          EDIT: This crossed with Nick's answer #4. This is in response to #3.
          EDIT #2: Per Nick's answer #6, I changed quantile to quantiles
          To extract the year from release_date, do the following:

          Code:
          gen year = year(release_date)
          After which, Nick's elegant code should generate what you want.

          Two other ways to calculate the top X per year, but both involve the user-generated functions quantiles (SSC) or egenmore (SSC) (written by the same @Nick Cox who answered in #2 & #4.)

          Code:
          * Using quantile
          ssc install quantiles   // per Nick's comment #6, changed quantile to quantiles
          bysort year: quantiles revenue, gen(rev_decile) n(10)
          gen byte top_performer_rev = 1 if inlist(rev_decile, 9, 10)
          
          * Using egenmore, xtile
          ssc install egenmore
          egen rev_decile = xtile(revenue), by(year) nq(10)  // NOTE: you could also replace nq(10) with p(10(10)90) 
          gen byte top_performer_rev = 1 if inlist(rev_decile, 9, 10)
          You can see examples using this code (or creating something similar) at:
          Last edited by David Benson; 30 Oct 2018, 13:01.

          Comment


          • #6
            Thanks for the mentions, but to clarify a minor slip and some minor ambiguity:

            1. There is no quantile on SSC. There is an official command quantile.

            2. There is a quantiles on SSC, but I didn't write it. (I would have recommended against a name so close to an official command name, as otherwise minor confusion is easy.)

            3. xtile() within egenmore was written by Uli Kohler.

            Comment


            • #7
              Thank you both, it works fine now!

              I do have a new question though: Now I want to count the number of top performer games in a certain genre that were released a year before in that genre (my goal is to find out if the number of top performer games influence the choice of genre by game developers for new games).
              I wanted to use a forvalues loop to generate a variable for each genre (there are 17). Does this look correct? or rather: What am I missing?


              Code:
              forvalues g = 1/17 {
              gen tp_rev_g`g'_lastyear =.
              forvalues year = 1995/2009 {
              
              quietly count if top_performer_rev==1 & release_date==`year'-1 & sgenre_id3 == `g'
              quietly replace tp_rev_g`g'_lastyear =r(N) if release_date==`year'
              
              }
              }

              Comment


              • #8
                You should not need 17 new variables here. The results can be stored in one variable. This may help using rangestat (SSC)

                Code:
                rangestat (sum)  top_performer_rev, int(release_date -1 -1) by(sgenre_id3)
                -- but absent any data example I am guessing here.

                Comment


                • #9

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input str106 moby_title int release_date str21 platform float revenue long sgenre_id3 float top_performer_rev
                  "BreakThru!"                                               420 "gameboy"       135987  5 0
                  "Star Trek: Starfleet Academy - Starship Bridge Simulator" 420 "snes"               . 13 .
                  "Pinball Fantasies"                                        420 "snes"          278692  3 0
                  "Micro Machines"                                           420 "gameboy"       763203 10 0
                  "Ristar"                                                   420 "game-gear"    1057011 17 0
                  "X-Men 2: Clone Wars"                                      420 "genesis"     15655003  1 1
                  "Might and Magic III: Isles of Terra"                      420 "snes"               . 11 .
                  "Myst"                                                     420 "playstation"  2525509  2 1
                  "Super Star Wars: Return of the Jedi"                      420 "game-gear"          .  1 .
                  "The Raiden Project"                                       420 "playstation"  1811448  3 0
                  "Take Your Best Shot"                                      420 "pc"            111012  3 0
                  "Cyber Sled"                                               420 "playstation"  1370586 12 0
                  "Coach K College Basketball"                               420 "genesis"     11489733 13 1
                  "Super Star Wars: Return of the Jedi"                      420 "gameboy"            . 17 .
                  "Phantasy Star IV"                                         420 "genesis"      4023353 11 1
                  "Corpse Killer"                                            420 "3do"           310800 12 0
                  "Corpse Killer"                                            420 "3do"           310800 12 0
                  "BreakThru!"                                               420 "gameboy"       135987  5 0
                  "RapJam: Volume One"                                       420 "snes"               . 15 .
                  "Space Griffon VF-9"                                       420 "playstation"        . 12 .
                  "BreakThru!"                                               420 "gameboy"       135987  5 0
                  "Pac-in-Time"                                              420 "snes"          307712  1 0
                  "Kileak: The DNA Imperative"                               420 "playstation"  2939996 12 1
                  "Corpse Killer"                                            420 "3do"           310800 12 0
                  "Micro Machines"                                           420 "gameboy"       763203 10 0
                  "Bureau 13"                                                420 "pc"            179043 16 0
                  "BreakThru!"                                               420 "gameboy"       135987  5 0
                  "Micro Machines"                                           420 "gameboy"       763203 10 0
                  "SeaQuest DSV"                                             420 "snes"          222801  1 0
                  "Bishoujo Senshi Sailor Moon S"                            420 "game-gear"          .  1 .
                  "Bureau 13"                                                420 "pc"            179043 16 0
                  "SeaQuest DSV"                                             420 "snes"          222801  1 0
                  "NBA Live 95"                                              421 "pc"           2808764 15 1
                  "Toughman Contest"                                         421 "genesis"      1675839 15 0
                  "Rise of the Phoenix"                                      421 "snes"          224119 16 0
                  "Zorro"                                                    421 "pc"             71861 17 0
                  "Master of Orion"                                          421 "macintosh"     201077 16 0
                  "Lost Eden"                                                421 "pc"            299610 11 0
                  "Pinball Fantasies"                                        421 "gameboy"            .  3 .
                  "Desert Strike: Return to the Gulf"                        421 "gameboy"            .  1 .
                  "Metaltech: Battledrome"                                   421 "pc"            313720  1 0
                  "Rise of the Robots"                                       421 "3do"           484031  6 0
                  "Ecco Jr."                                                 421 "genesis"      1149578  1 0
                  "World Series Baseball '95"                                421 "genesis"     12996396 15 1
                  "Myst"                                                     421 "3do"          2121351  2 0
                  "Syndicate"                                                421 "jaguar"        338755 16 0
                  "Syndicate"                                                421 "jaguar"        338755 16 0
                  "Master of Orion"                                          421 "macintosh"     201077 16 0
                  "Lost Eden"                                                421 "pc"            299610 11 0
                  "Myst"                                                     421 "3do"          2121351  2 1
                  "True Lies"                                                421 "game-gear"     475025  1 0
                  "Rise of the Robots"                                       421 "3do"           484031  6 0
                  "Syndicate"                                                421 "jaguar"        338755 16 0
                  "NBA Live 95"                                              421 "pc"           2808764 15 1
                  "Stargate"                                                 421 "genesis"      1523040 17 0
                  "Jewels of the Oracle"                                     421 "pc"           1698187  5 0
                  "Pac-in-Time"                                              421 "gameboy"       398926  1 0
                  "Master of Orion"                                          421 "macintosh"     201077 16 0
                  "Myst"                                                     421 "3do"          2121351  2 1
                  "Front Mission"                                            421 "snes"               . 14 .
                  "Discworld"                                                421 "pc"            290801  2 0
                  "Syndicate"                                                421 "jaguar"        338755 16 0
                  "Myst"                                                     421 "3do"          2121351  2 1
                  "Jewels of the Oracle"                                     421 "pc"           1698187  5 0
                  "NBA Live 95"                                              421 "pc"           2808764 15 1
                  "Airlift Rescue"                                           421 "pc"                 .  1 .
                  "Kirby's Avalanche"                                        421 "snes"         5924800  5 1
                  "Lost Eden"                                                421 "pc"            299610  2 0
                  "Rise of the Robots"                                       421 "3do"           484031  6 0
                  "Theme Park"                                               421 "jaguar"        257998 16 0
                  "Metaltech: Battledrome"                                   421 "pc"            313720  1 0
                  "Immercenary"                                              421 "3do"           617916 12 0
                  "Beyond Oasis"                                             421 "genesis"      1852268 11 0
                  "Troy Aikman NFL Football"                                 421 "jaguar"             . 13 .
                  "Master of Orion"                                          421 "macintosh"     201077 16 0
                  "Bust-A-Move"                                              422 "snes"         1184174  5 0
                  "Magical Pop'n"                                            422 "snes"               .  1 .
                  "Super Wing Commander"                                     422 "macintosh"     205201 13 0
                  "Undercover Cops"                                          422 "snes"               .  1 .
                  "Super Wing Commander"                                     422 "macintosh"     205201 13 0
                  "Bust-A-Move"                                              422 "snes"         1184174  5 0
                  "ClockWerx"                                                422 "pc"             64345  5 0
                  "Another Bible"                                            422 "gameboy"            . 14 .
                  "Chrono Trigger"                                           422 "snes"        19190080 11 1
                  "Zhadnost: The People's Party"                             422 "3do"                .  5 .
                  "Super Bomberman: Panic Bomber W"                          422 "snes"               .  5 .
                  "Mario's Picross"                                          422 "gameboy"            .  5 .
                  "Surging Aura"                                             422 "genesis"            . 11 .
                  "Panzer Dragoon"                                           422 "sega-saturn"  5382781 11 0
                  "Battle Chess Collection"                                  422 "pc"            106144  5 0
                  "Chrono Trigger"                                           422 "snes"        19190080 11 1
                  "Kirby's Dream Land 2"                                     422 "gameboy"     13718104 17 1
                  "Descent"                                                  422 "pc"           5441489  7 0
                  "ClockWerx"                                                422 "pc"             64345  5 0
                  "Hover Strike"                                             422 "jaguar"        303432 13 0
                  "The Lion King"                                            423 "gameboy"      5640582 17 1
                  "Jumping Flash!"                                           423 "playstation"  1986898 12 0
                  "PGA European Tour"                                        423 "gameboy"            . 13 .
                  "Terminal Velocity"                                        423 "pc"            911818  7 0
                  "Daytona USA"                                              423 "sega-saturn"  9226708 10 1
                  end
                  format %tm release_date
                  label values sgenre_id3 sgenre_id3
                  label def sgenre_id3 1 "ACTION", modify
                  label def sgenre_id3 2 "ADVENTURE", modify
                  label def sgenre_id3 3 "ARCADE", modify
                  label def sgenre_id3 5 "FAMILY ENTERTAINMENT", modify
                  label def sgenre_id3 6 "FIGHTING", modify
                  label def sgenre_id3 7 "FLIGHT", modify
                  label def sgenre_id3 10 "RACING", modify
                  label def sgenre_id3 11 "ROLE-PLAYING", modify
                  label def sgenre_id3 12 "SHOOTER", modify
                  label def sgenre_id3 13 "SIMULATION", modify
                  label def sgenre_id3 14 "SIMULATOR", modify
                  label def sgenre_id3 15 "SPORT GAMES", modify
                  label def sgenre_id3 16 "STRATEGY", modify


                  I don't know if it's helpful but it looks something like this. Eventually I want to do a multinomial logit, using sgenre_id3 as the dependent variable and the others as independent ones. Do you think it works that way?
                  Thank you for all your help, I really appreciate it!

                  Comment


                  • #10
                    Wani,

                    A couple of asides regarding your research question and research design:

                    Once you create a year variable, you can collapse down your data to create the total revenue, number of competitors, etc in a genre by year. (You could save that as a standalone firm-genre-year dataset, or you could just create it as a total in this set. But presumably you would want a standalone standalone firm-genre-year dataset so that you could calculate how much that genre grew year-to-year (revenue growth %) and how many new games were added, etc) And then your DV could be a count variable for the number of new titles added in a particular year (in addition to your multinomial logit)

                    One can imagine (at least) two competing hypotheses here:
                    • H1: Everybody wants to jump into the hottest and fastest-growing market. (i.e. "Go where the money is" and "A rising tide raises all the boats...")
                    • H2: A competing hypothesis would be, if 1-2 games dominate a particular genre, I'd be better off elsewhere with my game (especially if I am a new firm). ("If Google is dominating search, do I really want to try and build a better search engine?") (Although, there were lots of search engines before Google, so that reasoning doesn't always hold).
                    A paper that does something similar, is Josh Lerner's, "Patenting in the Shadow of Competitors," The Journal of Law and Economics 38, no. 2 (Oct., 1995): 463-495. https://doi.org/10.1086/467339 .

                    In that paper he finds that new biotech firms tend to take steps to avoid patenting "in the shadow" or large pharma firms. Your context is going to be different, because if you infringe on a patent they have the right to sue you, whereas in video game unless you're infringing on the look and feel of the game too much, they can't sue you. (Consider Fortnite vs PUBG.)
                    • BTW, in which category would Battle Royale games like that show up? I'm not a gamer myself but my kids love Fortnite right now.

                    Comment


                    • #11
                      Thank you for your input! Those two hypotheses are actually exactly what I've been thinking of.

                      I tried to follow your advice:
                      Code:
                      *generates total revenue and number of new games released per genre per year
                      egen number = count(moby_title), by(year sgenre_id3)
                      collapse number revenue, by(year sgenre_id3)
                      
                      *calculates annual revenue growth rate
                      bysort sgenre_id3 (year): gen rev_growth = (revenue - revenue[_n-1])/(revenue[_n-1] * (year - year[_n-1]))
                      How do I then incorporate them into my mlogit, can I use these as additional independent variables? Is that how you meant it?

                      *Games like Fortnite would fall into the action genre, but that's because Ive generalized the genres to the 17 most common ones. More specifically, it would be the category "shooter" (though my dataset only includes observations up to 2009, so the newer games aren't in there).

                      Comment


                      • #12
                        Hi Wani -- yes I meant that you could use them as additional independent variables.

                        Also, I would check that your collapse command and revenue growth calculation gave you what you expected. Collapse calculates the average by default and I suspect you want the count of games and to sum revenue.

                        Code:
                         egen number = count(moby_title), by(year sgenre_id3)
                        collapse (count) number (sum) revenue, by(year sgenre_id3)
                        
                        *  Calculates annual revenue growth rate
                        ysort sgenre_id3 (year): gen rev_growth = ((revenue - revenue[_n-1]) / (revenue[_n-1])) - 1
                        // I wouldn't think you would need the (year - year[_n-1]) part.  Wouldn't it just==1?
                        Last edited by David Benson; 01 Nov 2018, 18:05.

                        Comment


                        • #13
                          Thank you for all your help and patience! I'm trying out variables for the regression now, with these hyptheses:

                          H1: The choice of genre is dependent on the number of existing top-performing games measured by revenue in a genre.
                          H1a: Game developers try to imitate top performers’ success by entering into the same genre.
                          H1b: Game developers try to avoid competition with high-selling games by entering into a different genre from top-performing games.

                          It currently looks like this:

                          Code:
                           mlogit sgenre_id3 c.release_date ps top_performer_rev_sum atari_p sega_p micro_p nin_p number rev_growth
                          (controlling for the release date, biggest platforms, number of new games and revenue growth per genre)
                          
                          Multinomial logistic regression                 Number of obs     =     10,475
                                                                          LR chi2(126)      =   11648.82
                                                                          Prob > chi2       =     0.0000
                          Log likelihood = -19497.003                     Pseudo R2         =     0.2300
                          
                          ------------------------------------------------------------------------------------------
                                        sgenre_id3 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                          -------------------------+----------------------------------------------------------------
                          ACTION                   |  (base outcome)
                          -------------------------+----------------------------------------------------------------
                          ADVENTURE                |
                                      release_date |   .0498489   .0017833    27.95   0.000     .0463537     .053344
                                                ps |  -1.103009   .1437939    -7.67   0.000    -1.384839   -.8211778
                             top_performer_rev_sum |  -.5747549    .062992    -9.12   0.000    -.6982169   -.4512928
                                           atari_p |  -21.28778   28010.81    -0.00   0.999    -54921.47    54878.89
                                            sega_p |  -.5510671   .2650814    -2.08   0.038    -1.070617    -.031517
                                           micro_p |  -1.060421   .2115886    -5.01   0.000    -1.475127   -.6457145
                                             nin_p |  -1.158782   .1617027    -7.17   0.000    -1.475713   -.8418504
                                            number |  -.1286186   .0030603   -42.03   0.000    -.1346167   -.1226206
                                        rev_growth |   .5691262   .0739391     7.70   0.000     .4242083    .7140441
                                             _cons |  -14.28717   .8208489   -17.41   0.000      -15.896   -12.67833
                          -------------------------+----------------------------------------------------------------
                          ARCADE                   |
                                      release_date |   .0320185    .003533     9.06   0.000     .0250939    .0389431
                                                ps |  -.1778298   .3584196    -0.50   0.620    -.8803193    .5246596
                             top_performer_rev_sum |   .2259194   .1234846     1.83   0.067    -.0161059    .4679447
                                           atari_p |  -20.38943   53251.92    -0.00   1.000    -104392.2    104351.5
                                            sega_p |   .9951693    .446489     2.23   0.026     .1200668    1.870272
                                           micro_p |   .5521793    .599354     0.92   0.357     -.622533    1.726892
                                             nin_p |   .6415877   .3419701     1.88   0.061    -.0286614    1.311837
                                            number |  -.3190956    .013858   -23.03   0.000    -.3462568   -.2919344
                                        rev_growth |   .5903945   .0972139     6.07   0.000     .3998587    .7809303
                                             _cons |  -1.782089   1.836259    -0.97   0.332     -5.38109    1.816913
                          -------------------------+----------------------------------------------------------------
                          CHILDREN_S_ENTERTAINMENT |
                                      release_date |   .0150721   .0080371     1.88   0.061    -.0006803    .0308245
                                                ps |   .3108431   .6603469     0.47   0.638    -.9834131    1.605099
                             top_performer_rev_sum |   .6036564   .2705252     2.23   0.026     .0734367    1.133876
                                           atari_p |  -15.40619   98410.94    -0.00   1.000    -192897.3    192866.5
                                            sega_p |  -17.28768   7750.106    -0.00   0.998    -15207.22    15172.64
                                           micro_p |  -14.86823   2537.979    -0.01   0.995    -4989.215    4959.478
                                             nin_p |   1.369725   .6131861     2.23   0.025     .1679019    2.571547
                                            number |   -.572284   .0578387    -9.89   0.000    -.6856458   -.4589223
                                        rev_growth |   .4400669   .1514842     2.91   0.004     .1431633    .7369706
                                             _cons |   8.801733   4.555853     1.93   0.053    -.1275744    17.73104
                          -------------------------+----------------------------------------------------------------
                          FAMILY_ENTERTAINMENT     |
                                      release_date |   .0455727   .0017084    26.68   0.000     .0422244    .0489211
                                                ps |  -.4294972   .1278267    -3.36   0.001    -.6800329   -.1789615
                             top_performer_rev_sum |   -.126878   .0307522    -4.13   0.000    -.1871513   -.0666048
                                           atari_p |   .1598698   1.498964     0.11   0.915    -2.778046    3.097786
                                            sega_p |  -.3383123   .2835629    -1.19   0.233    -.8940853    .2174608
                                           micro_p |  -.9795381   .1886858    -5.19   0.000    -1.349356   -.6097206
                                             nin_p |  -.1098563   .1326103    -0.83   0.407    -.3697677    .1500551
                                            number |  -.0674744    .002224   -30.34   0.000    -.0718334   -.0631154
                                        rev_growth |   .9722408   .0661211    14.70   0.000     .8426459    1.101836
                                             _cons |  -17.21225   .7930431   -21.70   0.000    -18.76658   -15.65791
                          -------------------------+----------------------------------------------------------------
                          I have defined top performers as discussed above. If I'm interpreting the results correctly, top performers do have a significant effect on choice of genre; however, when I want to look at marginal effects, this significance seems to disappear:

                          Code:
                          . margins, dydx(top_performer_rev_sum )
                          
                          Average marginal effects                        Number of obs     =     10,475
                          Model VCE    : OIM
                          
                          dy/dx w.r.t. : top_performer_rev_sum
                          1._predict   : Pr(sgenre_id3==ACTION), predict(pr outcome(1))
                          2._predict   : Pr(sgenre_id3==ADVENTURE), predict(pr outcome(2))
                          3._predict   : Pr(sgenre_id3==ARCADE), predict(pr outcome(3))
                          4._predict   : Pr(sgenre_id3==CHILDREN_S_ENTERTAINMENT), predict(pr outcome(4))
                          5._predict   : Pr(sgenre_id3==FAMILY_ENTERTAINMENT), predict(pr outcome(5))
                          6._predict   : Pr(sgenre_id3==FIGHTING), predict(pr outcome(6))
                          7._predict   : Pr(sgenre_id3==FLIGHT), predict(pr outcome(7))
                          8._predict   : Pr(sgenre_id3==OTHER_GAMES_COMPILATIONS), predict(pr outcome(9))
                          9._predict   : Pr(sgenre_id3==RACING), predict(pr outcome(10))
                          10._predict  : Pr(sgenre_id3==ROLE_PLAYING), predict(pr outcome(11))
                          11._predict  : Pr(sgenre_id3==SHOOTER), predict(pr outcome(12))
                          12._predict  : Pr(sgenre_id3==SIMULATION), predict(pr outcome(13))
                          13._predict  : Pr(sgenre_id3==SPORT_GAMES), predict(pr outcome(15))
                          14._predict  : Pr(sgenre_id3==STRATEGY), predict(pr outcome(16))
                          15._predict  : Pr(sgenre_id3==17), predict(pr outcome(17))
                          
                          ---------------------------------------------------------------------------------------
                                                |            Delta-method
                                                |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
                          ----------------------+----------------------------------------------------------------
                          top_performer_rev_sum |
                                       _predict |
                                             1  |  -.0023884    .001716    -1.39   0.164    -.0057516    .0009749
                                             2  |  -.0233723   .2554385    -0.09   0.927    -.5240227     .477278
                                             3  |   .0099778    .286822     0.03   0.972     -.552183    .5721386
                                             4  |   .0033038   .0760032     0.04   0.965    -.1456598    .1522674
                                             5  |  -.0058774   .0144927    -0.41   0.685    -.0342827    .0225278
                                             6  |   .0181605   .4047448     0.04   0.964    -.7751247    .8114457
                                             7  |   .0006966   .0409284     0.02   0.986    -.0795215    .0809147
                                             8  |  -.0387957   1.294957    -0.03   0.976    -2.576866    2.499274
                                             9  |  -.0003055   .0085375    -0.04   0.971    -.0170387    .0164277
                                            10  |  -.0028814   .0237131    -0.12   0.903    -.0493581    .0435953
                                            11  |  -.0065901   .0042347    -1.56   0.120    -.0148899    .0017097
                                            12  |   .0328852   .0057612     5.71   0.000     .0215934    .0441769
                                            13  |   .0030331   .0182149     0.17   0.868    -.0326675    .0387338
                                            14  |   .0015708   .1059767     0.01   0.988    -.2061398    .2092814
                                            15  |   .0105829   .0502787     0.21   0.833    -.0879614    .1091273
                          ---------------------------------------------------------------------------------------
                          Am I doing something wrong or how do I interpret this?
                          Last edited by Wani Huang; 04 Nov 2018, 17:15.

                          Comment


                          • #14
                            Please start a new thread. The title of this thread no longer applies.

                            Comment

                            Working...
                            X