Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Increasing Efficiency of the Foreach Loop

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str35 imdbcastlink str36 imdbfilmlink str121 imdbgenre float(movies_tag gen_Action gen_Adult gen_Animation)
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0040021"  "Drama"                                               1 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0042727"  "Drama"                                               2 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0044081"  "Drama"                                               3 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0045296"  "Biography, Drama, History, Western"                  4 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0045943"  "Drama, History"                                      5 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0047677"  "Crime, Drama, Romance"                               6 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0047296"  "Crime, Drama, Thriller"                              7 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0046903"  "Biography, Drama, History, Romance"                  8 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0044284"  "Drama, History, Music"                               9 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0048140"  "Comedy, Crime, Musical, Romance"                    10 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0049830"  "Comedy, Drama"                                      11 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0050933"  "Drama, Romance"                                     12 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0052415"  "Action, Drama, War"                                 13 1 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0052832"  "Drama, Romance"                                     14 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0055257"  "Drama, Western"                                     15 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0056264"  "Adventure, Drama, History, Romance"                 16 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0056632"  "Adventure, Drama, Thriller"                         17 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0057878"  "Comedy"                                             18 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0059470"  "Action, Drama, Thriller, War"                       19 1 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0060232"  "Crime, Drama, Thriller"                             20 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0060120"  "Action, Drama, Romance, Western"                    21 1 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0061523"  "Comedy, Romance"                                    22 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0062185"  "Drama, Romance, Thriller"                           23 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0062776"  "Adventure, Comedy, Fantasy"                         24 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0064728"  "Action, Crime, Drama, Thriller"                     25 1 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0064866"  "Action, Drama, War"                                 26 1 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0069007"  "Drama, Horror, Thriller"                            27 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0068646"  "Crime, Drama"                                       28 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0070849"  "Drama, Romance"                                     29 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0074906"  "Drama, Western"                                     30 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0809488"  "Crime, Drama, Thriller"                             31 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0078346"  "Action, Adventure, Sci-Fi"                          32 1 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0078678"  "Biography, Drama, History, War"                     33 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0078788"  "Drama, Mystery, War"                                34 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0080754"  "Crime, Thriller"                                    35 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0839995"  "Action, Adventure, Romance, Sci-Fi"                 36 1 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0097243"  "Drama, Thriller"                                    37 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0099615"  "Comedy, Crime"                                      38 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0103962"  "Adventure, Biography, Drama, History"               39 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0112883"  "Comedy, Drama, Romance"                             40 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0116654"  "Horror, Sci-Fi, Thriller"                           41 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0118768"  "Drama"                                              42 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0120678"  "Comedy, Crime"                                      43 0 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0227445"  "Action, Crime, Drama, Thriller"                     44 1 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0442674"  "Action, Crime, Drama"                               45 1 0 0
    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt10905860" "Animation, Comedy"                                  46 0 0 1
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0014187"  "Crime"                                               1 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0015750"  "Adventure"                                           2 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0016449"  "Western"                                             3 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0016430"  "Western"                                             4 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0016288"  "Action, Drama, Western"                              5 1 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0015766"  "Action, Comedy, Romance, Western"                    6 1 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0016534"  "Western"                                             7 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0016052"  "Western"                                             8 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0016480"  "Western"                                             9 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0015772"  "Action, Adventure, Comedy, Drama, History, Romance" 10 1 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0016453"  "Comedy, Drama, Western"                             11 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0177353"  "Western"                                            12 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0017212"  "Drama"                                              13 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0016641"  "Adventure, Drama, Romance"                          14 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0017465"  "Romance"                                            15 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0016823"  "Western"                                            16 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0017010"  "Drama"                                              17 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0017400"  "Western"                                            18 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0017530"  "Drama, Comedy"                                      19 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0170688"  "Western"                                            20 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0017567"  "Drama, Romance, Western"                            21 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0017226"  "Drama, History"                                     22 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0016618"  "Western"                                            23 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0018033"  "Comedy, Romance"                                    24 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0017751"  "Drama, Romance"                                     25 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0017637"  "Western"                                            26 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0018578"  "Drama, Romance, War, Action"                        27 1 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0018080"  "Western"                                            28 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0018199"  "Western"                                            29 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0018681"  "Adventure, Romance"                                 30 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0018846"  "Drama, Romance"                                     31 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0019080"  "Action, Drama, Romance, War"                        32 1 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0018971"  "Adventure, Drama, Romance"                          33 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0019098"  "Drama, Romance, War"                                34 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0018892"  "Drama, Romance"                                     35 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0019375"  "Drama, Romance, War"                                36 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0020595"  "Drama, Western"                                     37 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0019687"  "Drama"                                              38 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0020556"  "Romance, Western"                                   39 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021357"  "Adventure, Drama, War"                              40 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021219"  "Drama, Romance, War"                                41 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021232"  "Comedy, Music"                                      42 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021463"  "Western"                                            43 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021112"  "Action, Drama, Romance, War"                        44 1 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021412"  "Action, Western"                                    45 1 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021156"  "Drama, Romance"                                     46 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021861"  "Western"                                            47 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021750"  "Crime, Drama, Film-Noir, Romance"                   48 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021988"  "Drama, Romance"                                     49 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021963"  "Drama, Romance"                                     50 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0023175"  "Comedy, Drama, Romance"                             51 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0022814"  "Drama"                                              52 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0023049"  "Comedy, Drama"                                      53 0 0 0
    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0022879"  "Drama, Romance, War"                                54 0 0 0
    end
    Dear Statalisters,

    I am trying to run a loop for to calculate, for each actor/actress, the similarity in the usage of genres over their career.
    As of now, I am using the following loop to go over each actorid:

    preserve
    forvalues i = 1(1)60000 {
    keep if actorid == `i'
    tempfile copy
    save `copy'
    rangejoin movies_tag 1 . using `copy', by (actorid) prefix(U_)
    reshape long gen_ U_gen_, i(actorid movies_tag U_movies_tag) j(genre) string
    by actorid movies_tag U_movies_tag, sort: egen both = total(gen_ & U_gen_)
    by actorid movies_tag U_movies_tag: egen either = total(gen_ | U_gen_)
    gen Jaccard = both/either
    by actorid movies_tag U_movies_tag, sort: keep if _n == 1

    by actorid (U_movies_tag movies_tag), sort: gen long seq = _n
    by actorid (U_movies_tag movies_tag), sort: gen wanted = sum(Jaccard)/seq
    by actorid U_movies_tag (seq), sort: keep if _n == _N
    keep actorid U_movies_tag wanted
    save "...\Actor_Filmography_`i'.dta", replace
    restore, preserve
    }

    While each loop only takes about 20 seconds to run,
    it often stops during the iterations, and even with 20 seconds to run each loop it would take about 2 weeks to do one iteration.

    Would there be a way to more efficiently process the information I need?
    Thank you again for your valuable time.




  • #2
    I don't see why you need a loop for this at all. You are looping over actorid's, but every single command within the loop, except for the -save-, the -keep-, and the -gen Jaccard- is written so that it is already being done -by actorid-. As for the -keep- and -gen Jaccard- commands, they will the same way regardless of whether they are in the loop. So the only thing you actually need to loop for is the -save- command.

    Now, my first instinct is to say: seriously, are you going to save 60,000 different filmography files? That seems a bit outlandish to me, and if it would suffice to simply retain a long file with all 60,000 filmographies in it, then you can just remove all the looping and be left with:
    Code:
    tempfile copy
    save `copy'
    rangejoin movies_tag 1 . using `copy', by (actorid) prefix(U_)
    reshape long gen_ U_gen_, i(actorid movies_tag U_movies_tag) j(genre) string
    by actorid movies_tag U_movies_tag, sort: egen both = total(gen_ & U_gen_)
    by actorid movies_tag U_movies_tag: egen either = total(gen_ | U_gen_)
    gen Jaccard = both/either
    by actorid movies_tag U_movies_tag, sort: keep if _n == 1
    
    by actorid (U_movies_tag movies_tag), sort: gen long seq = _n
    by actorid (U_movies_tag movies_tag), sort: gen wanted = sum(Jaccard)/seq
    by actorid U_movies_tag (seq), sort: keep if _n == _N
    keep actorid U_movies_tag wanted
    save all_filmographies, replace
    The above, which comprises all of the calculation you were doing in the loop, will run quickly as there are no I/O operations involved, and no looping.

    Now, supposing you really do need to save all 60,000 files, you can do that, omitting the -save all_filmographies- command, and follow it with:
    Code:
    capture program drop one_actorid
    program define one_actorid
        local i = actorid[1]
        save Actor_Filmography_`i', replace
        exit
    end
    
    runby one_actorid, by(actorid) status
    -runby- is written by Robert Picard and me, and is available from SSC.
    This will run much faster than looping over a bunch of -preserve-s and -restores-s. The file contents will be copied only once, not to disk, but to a Mata matrix, and will be fed back one actorid at a time into the data set, and then -save-d to the appropriate filename. The -status- option will cause Stata to give you a progress report, showing how many actorid's have been processed so far and an estimate of the time remaining to completion.

    Note: None of this is tested. Your example data, though well-meaning, does not even include an actorid variable!

    Comment


    • #3
      My advice, for better advice, is to include a small example with the variables you're using. Then we can test it to see if it's what you need.

      Comment


      • #4
        #2) Dear Clyde, Thank you for the advice. #3 Dear Jared, I have updated the variables to test on a small example.

        The only problem I have in running the whole dataset is the following:
        Whereas there are 61,000+ actors that I need to expect to record their filmography on,

        The reshape long command (which is necessary) expands the dataset by an extreme amount (basically, it is the number of films ^2 * number of genres^2, so for a single actor with 50 films it would expand to 50 * 50 * 27 * 27 observations),
        so every time I try a subgroup of 1,000+ actors, it gives me "op. sys. refuses to provide memory".

        Would there be a way to avoid the memory issue, or calculate the Jaccard coefficient measure without reshaping it to wide?

        The following is the measure I am trying to get at, which I did with the help of your valuable input in my last post.
        - the only difference this time is that I am doing it on the actor population which is significantly large in terms of the number of actors & the number of films each actor participates in
        -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

        I want to measure how similar are the genres used by each director on her/his films - on each point of the director's career.
        The genre similarity scores will be calculated using two steps:

        1) I will measure the Jaccard Similarity between the movie pairs, which would be calculated by dividing the number of genres in both movies by the number of genres in either movie.
        2) I will take the average Jaccard Similarity of the movies for each career point of the director.


        For example, director 1 has directed 17 films in total.

        at the point where the director has directed only one movie (movies_tag ==1), the similarity score would be blank, because there needs to be at least two movies to calculate the similarity score.
        at the point where the director has directed two movies (movies_tag == 2), the similarity score would be the Jaccard Similarity of the 1st and 2nd movie.
        at the point where the director has directed two movies (movies_tag == 3), the similarity score would be the average Jaccard Similarity of 1st/2nd movie, 2nd/3rd movie, and 1st/3rd movie (three movie pairs)

        This would continue until the director's 17th film, where the similarity score would be the average Jaccard Similarity between the 17 movies, or 17C2 movie pairs equating 136 movie combinations.

        -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input float actorid str121 imdbgenre str35 imdbcastlink str36 imdbfilmlink float(gen_Action gen_Adventure movies_tag)
        1 "Drama"                                              "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0040021"  0 0  1
        1 "Drama"                                              "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0042727"  0 0  2
        1 "Drama"                                              "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0044081"  0 0  3
        1 "Biography, Drama, History, Western"                 "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0045296"  0 0  4
        1 "Drama, History"                                     "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0045943"  0 0  5
        1 "Crime, Drama, Romance"                              "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0047677"  0 0  6
        1 "Crime, Drama, Thriller"                             "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0047296"  0 0  7
        1 "Biography, Drama, History, Romance"                 "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0046903"  0 0  8
        1 "Drama, History, Music"                              "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0044284"  0 0  9
        1 "Comedy, Crime, Musical, Romance"                    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0048140"  0 0 10
        1 "Comedy, Drama"                                      "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0049830"  0 0 11
        1 "Drama, Romance"                                     "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0050933"  0 0 12
        1 "Action, Drama, War"                                 "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0052415"  1 0 13
        1 "Drama, Romance"                                     "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0052832"  0 0 14
        1 "Drama, Western"                                     "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0055257"  0 0 15
        1 "Adventure, Drama, History, Romance"                 "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0056264"  0 1 16
        1 "Adventure, Drama, Thriller"                         "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0056632"  0 1 17
        1 "Comedy"                                             "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0057878"  0 0 18
        1 "Action, Drama, Thriller, War"                       "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0059470"  1 0 19
        1 "Crime, Drama, Thriller"                             "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0060232"  0 0 20
        1 "Action, Drama, Romance, Western"                    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0060120"  1 0 21
        1 "Comedy, Romance"                                    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0061523"  0 0 22
        1 "Drama, Romance, Thriller"                           "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0062185"  0 0 23
        1 "Adventure, Comedy, Fantasy"                         "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0062776"  0 1 24
        1 "Action, Crime, Drama, Thriller"                     "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0064728"  1 0 25
        1 "Action, Drama, War"                                 "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0064866"  1 0 26
        1 "Drama, Horror, Thriller"                            "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0069007"  0 0 27
        1 "Crime, Drama"                                       "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0068646"  0 0 28
        1 "Drama, Romance"                                     "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0070849"  0 0 29
        1 "Drama, Western"                                     "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0074906"  0 0 30
        1 "Crime, Drama, Thriller"                             "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0809488"  0 0 31
        1 "Action, Adventure, Sci-Fi"                          "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0078346"  1 1 32
        1 "Biography, Drama, History, War"                     "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0078678"  0 0 33
        1 "Drama, Mystery, War"                                "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0078788"  0 0 34
        1 "Crime, Thriller"                                    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0080754"  0 0 35
        1 "Action, Adventure, Romance, Sci-Fi"                 "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0839995"  1 1 36
        1 "Drama, Thriller"                                    "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0097243"  0 0 37
        1 "Comedy, Crime"                                      "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0099615"  0 0 38
        1 "Adventure, Biography, Drama, History"               "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0103962"  0 1 39
        1 "Comedy, Drama, Romance"                             "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0112883"  0 0 40
        1 "Horror, Sci-Fi, Thriller"                           "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0116654"  0 0 41
        1 "Drama"                                              "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0118768"  0 0 42
        1 "Comedy, Crime"                                      "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0120678"  0 0 43
        1 "Action, Crime, Drama, Thriller"                     "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0227445"  1 0 44
        1 "Action, Crime, Drama"                               "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt0442674"  1 0 45
        1 "Animation, Comedy"                                  "http://www.imdb.com/name/nm0000008" "http://www.imdb.com/title/tt10905860" 0 0 46
        2 "Crime"                                              "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0014187"  0 0  1
        2 "Adventure"                                          "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0015750"  0 1  2
        2 "Western"                                            "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0016449"  0 0  3
        2 "Western"                                            "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0016430"  0 0  4
        2 "Action, Drama, Western"                             "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0016288"  1 0  5
        2 "Action, Comedy, Romance, Western"                   "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0015766"  1 0  6
        2 "Western"                                            "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0016534"  0 0  7
        2 "Western"                                            "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0016052"  0 0  8
        2 "Western"                                            "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0016480"  0 0  9
        2 "Action, Adventure, Comedy, Drama, History, Romance" "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0015772"  1 1 10
        2 "Comedy, Drama, Western"                             "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0016453"  0 0 11
        2 "Western"                                            "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0177353"  0 0 12
        2 "Drama"                                              "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0017212"  0 0 13
        2 "Adventure, Drama, Romance"                          "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0016641"  0 1 14
        2 "Romance"                                            "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0017465"  0 0 15
        2 "Western"                                            "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0016823"  0 0 16
        2 "Drama"                                              "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0017010"  0 0 17
        2 "Western"                                            "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0017400"  0 0 18
        2 "Drama, Comedy"                                      "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0017530"  0 0 19
        2 "Western"                                            "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0170688"  0 0 20
        2 "Drama, Romance, Western"                            "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0017567"  0 0 21
        2 "Drama, History"                                     "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0017226"  0 0 22
        2 "Western"                                            "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0016618"  0 0 23
        2 "Comedy, Romance"                                    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0018033"  0 0 24
        2 "Drama, Romance"                                     "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0017751"  0 0 25
        2 "Western"                                            "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0017637"  0 0 26
        2 "Drama, Romance, War, Action"                        "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0018578"  1 0 27
        2 "Western"                                            "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0018080"  0 0 28
        2 "Western"                                            "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0018199"  0 0 29
        2 "Adventure, Romance"                                 "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0018681"  0 1 30
        2 "Drama, Romance"                                     "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0018846"  0 0 31
        2 "Action, Drama, Romance, War"                        "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0019080"  1 0 32
        2 "Adventure, Drama, Romance"                          "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0018971"  0 1 33
        2 "Drama, Romance, War"                                "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0019098"  0 0 34
        2 "Drama, Romance"                                     "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0018892"  0 0 35
        2 "Drama, Romance, War"                                "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0019375"  0 0 36
        2 "Drama, Western"                                     "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0020595"  0 0 37
        2 "Drama"                                              "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0019687"  0 0 38
        2 "Romance, Western"                                   "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0020556"  0 0 39
        2 "Adventure, Drama, War"                              "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021357"  0 1 40
        2 "Drama, Romance, War"                                "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021219"  0 0 41
        2 "Comedy, Music"                                      "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021232"  0 0 42
        2 "Western"                                            "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021463"  0 0 43
        2 "Action, Drama, Romance, War"                        "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021112"  1 0 44
        2 "Action, Western"                                    "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021412"  1 0 45
        2 "Drama, Romance"                                     "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021156"  0 0 46
        2 "Western"                                            "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021861"  0 0 47
        2 "Crime, Drama, Film-Noir, Romance"                   "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021750"  0 0 48
        2 "Drama, Romance"                                     "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021988"  0 0 49
        2 "Drama, Romance"                                     "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0021963"  0 0 50
        2 "Comedy, Drama, Romance"                             "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0023175"  0 0 51
        2 "Drama"                                              "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0022814"  0 0 52
        2 "Comedy, Drama"                                      "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0023049"  0 0 53
        2 "Drama, Romance, War"                                "http://www.imdb.com/name/nm0000011" "http://www.imdb.com/title/tt0022879"  0 0 54
        end
        label values actorid actorid
        label def actorid 1 "http://www.imdb.com/name/nm0000008", modify
        label def actorid 2 "http://www.imdb.com/name/nm0000011", modify

        Comment


        • #5
          I see your dilemma. -runby- can solve this problem for you. The code below shows how. Note that I have assumed that the name of the data set you are starting with is actors_and_movies.dta. Where you see actors_and_movies in the code, replace that with the actual name of your data set. The code as written will both produce individual files for each actor and one combined file. There is probably no reason to do both, so delete the line that does the one you don't need.

          Code:
          capture program drop one_actor
          program define one_actor
              local i = actorid[1]
              rangejoin movies_tag 1 . using actors_and_movies, by (actorid) prefix(U_)
              reshape long gen_ U_gen_, i(actorid movies_tag U_movies_tag) j(genre) string
              by movies_tag U_movies_tag, sort: egen both = total(gen_ & U_gen_)
              by movies_tag U_movies_tag: egen either = total(gen_ | U_gen_)
              gen Jaccard = both/either
              by movies_tag U_movies_tag, sort: keep if _n == 1
              gen long seq = _n
              gen wanted = sum(Jaccard)/seq
              by U_movies_tag (seq), sort: keep if _n == _N
              keep actorid U_movies_tag wanted
              save Actor_Filmography_`i', replace // IF YOU WANT SEPARATE FILES FOR EACH ACTOR
          exit
          end
          
          
          runby one_actor, by(actorid) status   
          save All_Actors_Filmographies, replace  // IF YOU WANT A SINGLE FILE FOR ALL
          The trick here is that with -runby-, only the data from the current actor being processed is in active memory, so the -reshape- won't get you into memory trouble.

          Comment

          Working...
          X