Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Increasing Efficiency on Multiple Loops

    Dear Statalisters,

    I would like to ask for your help on using multiple loops on Stata.

    I want to measure how similar are the genres used by each director on her/his films - on each point of the director's career.
    The genre similarity scores will be calculated using two steps:

    1) I will measure the Jaccard Similarity between the movie pairs, which would be calculated by dividing the number of genres in both movies by the number of genres in either movie.
    2) I will take the average Jaccard Similarity of the movies for each career point of the director.


    For example, director 1 has directed 17 films in total.

    at the point where the director has directed only one movie (movies_tag ==1), the similarity score would be blank, because there needs to be at least two movies to calculate the similarity score.
    at the point where the director has directed two movies (movies_tag == 2), the similarity score would be the Jaccard Similarity of the 1st and 2nd movie.
    at the point where the director has directed two movies (movies_tag == 3), the similarity score would be the average Jaccard Similarity of 1st/2nd movie, 2nd/3rd movie, and 1st/3rd movie (three movie pairs)

    This would continue until the director's 17th film, where the similarity score would be the average Jaccard Similarity between the 17 movies, or 17C2 movie pairs equating 136 movie combinations.

    The illustrative sample is given below, consisting of 10 directors with 18 different movie genres.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(directorid movies_tag gen_Action gen_Adult gen_Adventure gen_Animation gen_Biography gen_Comedy gen_Crime gen_Documentary gen_Drama gen_Family gen_Fantasy gen_FilmNoir gen_GameShow gen_History gen_Horror gen_Music gen_Musical gen_Mystery)
    1  1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
    1  2 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0
    1  3 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0
    1  4 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0
    1  5 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1  6 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0
    1  7 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    1  8 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0
    1  9 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    1 10 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    1 11 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    1 12 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    1 13 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0
    1 14 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0
    1 15 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0
    1 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1 17 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0
    2  1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
    2  2 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
    2  3 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
    2  4 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
    2  5 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0
    2  6 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0
    2  7 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
    2  8 0 0 1 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0
    2  9 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0
    2 10 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    2 11 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    2 12 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
    2 13 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
    2 14 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0
    2 15 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
    2 16 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1
    3  1 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
    3  2 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
    3  3 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    3  4 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    3  5 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    3  6 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    3  7 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    3  8 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    3  9 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
    3 10 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0
    3 11 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0
    3 12 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    3 13 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    3 14 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    3 15 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0
    3 16 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0
    3 17 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    3 18 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
    3 19 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
    3 20 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0
    3 21 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0
    3 22 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    3 23 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    3 24 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0
    3 25 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
    3 26 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    3 27 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
    3 28 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0
    3 29 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0
    3 30 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    3 31 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0
    3 32 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0
    3 33 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
    3 34 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1
    3 35 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    3 36 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0
    3 37 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    3 38 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    3 39 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0
    3 40 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
    3 41 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 1
    3 42 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0
    3 43 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0
    3 44 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    3 45 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0
    3 46 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0
    3 47 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
    3 48 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0
    3 49 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0
    3 50 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0
    3 51 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0
    3 52 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0
    3 53 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    3 54 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
    3 55 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    3 56 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
    3 57 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
    4  1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
    4  2 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
    4  3 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0
    4  4 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
    4  5 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
    5  1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
    5  2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    5  3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
    5  4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
    5  5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    end

    Because I need to calculate the genre similarity score on each director at each point of the director's career,
    I have come up with a loop (that walks over each director id) to which another loop is embedded (that walks over each point of the director's career)

    xtset directorid movies_tag
    su directorid, meanonly

    gen Temporary_Jaccard = 0
    gen Jaccard = 0
    gen denominator = 0
    gen consistency = 0

    forvalues i=1/`r(max)' {
    levelsof movies_tag if directorid == `i', local(movielist)
    foreach j of local movielist {
    local consider = `j'-1
    forvalues k = 1/`consider' {
    gen Temporary = 0
    gen Union = 0
    replace Temporary = Temporary + 1 if directorid == `i' & gen_Action == 1 & L`k'.gen_Action == 1
    replace Temporary = Temporary + 1 if directorid==`i' & gen_Adult== 1 & L`k'.gen_Adult==1
    replace Temporary = Temporary + 1 if directorid==`i' & gen_Adventure== 1 & L`k'.gen_Adventure==1
    replace Temporary = Temporary + 1 if directorid==`i' & gen_Animation== 1 & L`k'.gen_Animation==1
    replace Temporary = Temporary + 1 if directorid==`i' & gen_Biography== 1 & L`k'.gen_Biography==1
    replace Temporary = Temporary + 1 if directorid==`i' & gen_Comedy== 1 & L`k'.gen_Comedy==1
    replace Temporary = Temporary + 1 if directorid==`i' & gen_Crime== 1 & L`k'.gen_Crime==1
    replace Temporary = Temporary + 1 if directorid==`i' & gen_Documentary== 1 & L`k'.gen_Documentary==1
    replace Temporary = Temporary + 1 if directorid==`i' & gen_Drama== 1 & L`k'.gen_Drama==1
    replace Temporary = Temporary + 1 if directorid==`i' & gen_Family== 1 & L`k'.gen_Family==1
    replace Temporary = Temporary + 1 if directorid==`i' & gen_Fantasy== 1 & L`k'.gen_Fantasy==1
    replace Temporary = Temporary + 1 if directorid==`i' & gen_FilmNoir== 1 & L`k'.gen_FilmNoir==1
    replace Temporary = Temporary + 1 if directorid==`i' & gen_GameShow== 1 & L`k'.gen_GameShow==1
    replace Temporary = Temporary + 1 if directorid==`i' & gen_History== 1 & L`k'.gen_History==1
    replace Temporary = Temporary + 1 if directorid==`i' & gen_Horror== 1 & L`k'.gen_Horror==1
    replace Temporary = Temporary + 1 if directorid==`i' & gen_Music== 1 & L`k'.gen_Music==1
    replace Temporary = Temporary + 1 if directorid==`i' & gen_Musical== 1 & L`k'.gen_Musical==1
    replace Temporary = Temporary + 1 if directorid==`i' & gen_Mystery== 1 & L`k'.gen_Mystery==1

    replace Union = Union + 1 if directorid==`i' & (gen_Action==1 | L`k'.gen_Action==1)
    replace Union = Union + 1 if directorid==`i' & (gen_Adult==1 | L`k'.gen_Adult==1)
    replace Union = Union + 1 if directorid==`i' & (gen_Adventure==1 | L`k'.gen_Adventure==1)
    replace Union = Union + 1 if directorid==`i' & (gen_Animation==1 | L`k'.gen_Animation==1)
    replace Union = Union + 1 if directorid==`i' & (gen_Biography==1 | L`k'.gen_Biography==1)
    replace Union = Union + 1 if directorid==`i' & (gen_Comedy==1 | L`k'.gen_Comedy==1)
    replace Union = Union + 1 if directorid==`i' & (gen_Crime==1 | L`k'.gen_Crime==1)
    replace Union = Union + 1 if directorid==`i' & (gen_Documentary==1 | L`k'.gen_Documentary==1)
    replace Union = Union + 1 if directorid==`i' & (gen_Drama==1 | L`k'.gen_Drama==1)
    replace Union = Union + 1 if directorid==`i' & (gen_Family==1 | L`k'.gen_Family==1)
    replace Union = Union + 1 if directorid==`i' & (gen_Fantasy==1 | L`k'.gen_Fantasy==1)
    replace Union = Union + 1 if directorid==`i' & (gen_FilmNoir==1 | L`k'.gen_FilmNoir==1)
    replace Union = Union + 1 if directorid==`i' & (gen_GameShow==1 | L`k'.gen_GameShow==1)
    replace Union = Union + 1 if directorid==`i' & (gen_History==1 | L`k'.gen_History==1)
    replace Union = Union + 1 if directorid==`i' & (gen_Horror==1 | L`k'.gen_Horror==1)
    replace Union = Union + 1 if directorid==`i' & (gen_Music==1 | L`k'.gen_Music==1)
    replace Union = Union + 1 if directorid==`i' & (gen_Musical==1 | L`k'.gen_Musical==1)
    replace Union = Union + 1 if directorid==`i' & (gen_Mystery==1 | L`k'.gen_Mystery==1)


    replace Temporary_Jaccard = (Temporary/Union) + Temporary_Jaccard if directorid==`i' & movies_tag == `j'
    drop Temporary Union
    }
    egen Sum_Temporary_jaccard = sum(Temporary_Jaccard) if directorid == `i' & movies_tag <= `j'
    replace Jaccard = Sum_Temporary_jaccard
    replace denominator = `consider' * (`consider'+1) / 2 if directorid ==`i' & movies_tag == `j'
    replace consistency = Jaccard / denominator if directorid ==`i' & movies_tag == `j'
    drop Sum_Temporary_jaccard
    }
    }


    My own translation of code is as follows:

    1. For each director (identified by 'directorid'),
    2. For each career point (identified by 'movielist'),
    3. Calculate the sum of pairwise Jaccard Similarity score between the director's latest movie in the career point and prior movies and save it in the variable 'Temporary_Jaccard'.
    For example, if j = 3, pairwise similarity of movie 1/3 and movie 2/3 would be calculated.
    4. Calculate the sum of 'Temporary_Jaccard' until the career point, which would return the sum of pairwise Jaccard Similarity between all movie combinations until the career point.
    Save the value in the variable 'Sum_Temporary_jaccard', then divide it by the the number of movie combinations, using the variable 'denominator'. Save the final similarity score in the variable `consistency'.

    While my code returns the correct similarity score, it incorporates three loops (foreach, foreach, forvalues) so that it takes too long time to go through the whole dataset incorporating thousands of directors.
    Would there be a way to derive the desired output without using so many loops embedded together? I would really appreciate it if there's an alternative way that is much faster and efficient.

    Thank you for reading.

















  • #2
    I don't think you need any loops at all to do this.

    Code:
    preserve
    tempfile copy
    save `copy'
    restore
    
    rangejoin movies_tag 1 . using `copy', by(directorid) prefix(U_)
    reshape long gen_ U_gen_, i(directorid movies_tag U_movies_tag) j(genre) string
    by directorid movies_tag U_movies_tag, sort: egen both = total(gen_ & U_gen_)
    by directorid movies_tag U_movies_tag: egen either = total(gen_ | U_gen_)
    gen Jaccard = both/either
    
    by directorid movies_tag U_movies_tag, sort: keep if _n == 1
    keep directorid *_tag Jaccard
    -rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

    If your data set is very large, you may encounter a bottleneck at the -reshape- command. You can get a further improvement in performance by instead using Mauricio Caceres' -greshape-, part of his -gtools- suite, which is also available from SSC. It accepts the same syntax as -reshape- and runs much faster.
    Last edited by Clyde Schechter; 02 Jan 2023, 09:17.

    Comment


    • #3
      Re-reading what I wrote in #2, I see that I did not include the part about averaging the Jaccard index up to the present movie. That just requires one additional command at the end:
      Code:
      rangestat (mean) avg_Jaccard_to_present = Jaccard, by(directorid movies_tag) interval(U_movies_tag . 0)

      Comment


      • #4
        Dear Clyde, thank you so much for your answers in #2 and #3.

        While it took some time to run the code on the whole dataset, the code in #2 worked perfectly to calculate the pairwise Jaccard Similarity between the movies, as in the sample dataset below:

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input float(directorid movies_tag U_movies_tag Jaccard) double avg_Jaccard_to_present
        1  1  2         0                   0
        1  1  3         0                   0
        1  2  3         1                   1
        1  1  4        .2  .06666666766007741
        1  2  4       .75                .875
        1  3  4       .75                 .75
        1  1  5         0  .05000000074505806
        1  2  5        .2   .6500000009934107
        1  3  5        .2   .4750000014901161
        1  4  5 .16666667   .1666666716337204
        1  1  6  .3333333   .1066666692495346
        1  2  6         0  .48750000074505806
        1  3  6         0   .3166666676600774
        1  4  6         0   .0833333358168602
        1  5  6       .25                 .25
        1  1  7         0  .08888889104127884
        1  2  7  .3333333   .4566666692495346
        1  3  7  .3333333  .32083333656191826
        1  4  7       .25  .13888889054457346
        1  5  7         0                .125
        1  6  7         0                   0
        1  1  8       .25  .11190476374966758
        1  2  8         0   .3805555577079455
        1  3  8         0  .25666666924953463
        1  4  8 .16666667   .1458333358168602
        1  5  8         0  .08333333333333333
        1  6  8         0                   0
        1  7  8         0                   0
        1  1  9         0  .09791666828095913
        1  2  9  .6666667   .4214285761117935
        1  3  9  .6666667   .3250000054637591
        1  4  9        .5  .21666666865348816
        1  5  9         0               .0625
        1  6  9         0                   0
        1  7  9        .5                 .25
        1  8  9         0                   0
        1  1 10         0  .08703703847196367
        1  2 10  .3333333  .41041667200624943
        1  3 10  .3333333  .32619048229285647
        1  4 10       .25   .2222222238779068
        1  5 10         0                 .05
        1  6 10         0                   0
        1  7 10         1                  .5
        1  8 10         0                   0
        1  9 10        .5                  .5
        1  1 11         0   .0783333346247673
        1  2 11  .3333333   .4018518577019374
        1  3 11  .3333333   .3270833399146795
        1  4 11       .25   .2261904776096344
        1  5 11         0 .041666666666666664
        1  6 11         0                   0
        1  7 11         1                .625
        1  8 11         0                   0
        1  9 11        .5                  .5
        1 10 11         1                   1
        1  1 12         0   .0712121223861521
        1  2 12 .16666667   .3783333390951157
        1  3 12 .16666667   .3092592656612396
        1  4 12 .14285715  .21577381156384945
        1  5 12       .75  .14285714285714285
        1  6 12        .2 .033333333830038704
        1  7 12         0                  .5
        1  8 12         0                   0
        1  9 12         0   .3333333333333333
        1 10 12         0                  .5
        1 11 12         0                   0
        1  1 13         0  .06527777885397275
        1  2 13  .6666667  .40454546158963983
        1  3 13  .6666667  .34500000774860384
        1  4 13        .5  .24735449916786617
        1  5 13         0                .125
        1  6 13         0 .028571428997176036
        1  7 13        .5                  .5
        1  8 13         0                   0
        1  9 13         1                  .5
        1 10 13        .5                  .5
        1 11 13        .5                 .25
        1 12 13         0                   0
        1  1 14         0    .060256411249821
        1  2 14         0   .3708333397905032
        1  3 14         0  .31363637068054895
        1  4 14         0  .22261904925107956
        1  5 14       .25   .1388888888888889
        1  6 14  .3333333  .06666666828095913
        1  7 14         0  .42857142857142855
        1  8 14         0                   0
        1  9 14         0                  .4
        1 10 14         0                .375
        1 11 14         0  .16666666666666666
        1 12 14        .5                 .25
        1 13 14         0                   0
        1  1 15         0  .05595238187483379
        1  2 15       .75   .4000000059604645
        1  3 15       .75  .35000000645716983
        1  4 15        .6    .256926410577514
        1  5 15 .16666667  .14166666716337203
        1  6 15         0  .05925926069418589
        1  7 15       .25              .40625
        1  8 15         0                   0
        1  9 15        .5   .4166666666666667
        end
        One problem, however, that I am experiencing with the code in #3 is that I am in need to calculate the average Jaccard index between
        all the possible movie combinations until the present movie.

        That is, for the 4th movie (U_movies_tag == 4),
        I would need to average not only #1 and #4, #2 and #4, and #3 and #4,
        but also #1 and #2, #1 and #3, and #2 and #3.

        Would it be possible to use the Rangestat function to calculate all possible movie combinations up to the present movie?

        Thank you again for your time and consideration.











        Comment


        • #5
          Ah, #3 does not quite do that. It averages #1#2, and #1#3, but not #2#3. So I misunderstood.

          I don't know whether you now want to reduce this to one single observation per directorid#U_movies_tag containing the average of all "lesser" Jaccard indices, or if you want to retain all the present observations, but have the newly created variable contain, for each observation in the directorid#U_movies_tag group to contain the result. Assuming the latter, it's:
          Code:
          by directorid (U_movies_tag movies_tag), sort: gen long seq = _n
          rangestat (mean) wanted = Jaccard, by(directorid) interval(seq . 0)
          by directorid U_movies_tag (seq), sort: replace wanted = wanted[_N]
          If you wanted just a single observation per directorid#U_movies_tag, change the final command to the following two commands
          Code:
          by directorid U_movies_tag (seq), sort: keep if _n == _N
          keep directorid U_movies_tag wanted
          Last edited by Clyde Schechter; 02 Jan 2023, 16:08.

          Comment


          • #6
            Actually, I just realized it's simpler than that. You don't even need -rangestat- for this part.

            Code:
            by directorid (U_movies_tag movies_tag), sort: gen long seq = _n
            by directorid (U_movies_tag movies_tag), sort: gen wanted = sum(Jaccard)/seq
            
            //  ENDING FOR A SINGLE OBSERVATION
            by directorid U_movies_tag (seq), sort: keep if _n == _N
            keep directorid U_movies_tag wanted
            
            //  ENDING FOR MULTIPLE OBSERVATIONS WITH COMMON VALUE OF wanted
            by directorid U_movies_tag (seq), sort: replace wanted = wanted[_N]

            Comment

            Working...
            X