Hello Statalists,
I am trying to generate a variable, 'total_votes_cum,' which calculates the cumulative 'total_votes' received in previous reviews for the same product ('prouct_parent') before the given review was posted. Please note that some products may have multiple reviews posted on the same date. I have provided sample data below:
product_parent review_date total_votes total_votes_cum
1 3/2/2015 10 0
1 3/2/2015 5 0
1 3/3/2015 5 15
1 3/4/2015 5 20
1 4/1/2015 7 25
2 5/1/2015 8 0
2 5/2/2015 6 8
2 5/2/2015 6 8
2 5/5/2015 6 20
2 6/1/2015 7 26
2 7/1/2015 12 33
I have tested the following code to create 'total_votes_cum,' and it works fine. But the issue is that it is taking too long to process for a very large dataset with millions of reviews:
by product_parent: gen total_votes_cum = total_votes
local n = _N
forval i = 1/`n' {
forval j = 1/`n' {
if product_parent[`i'] == product_parent[`j'] & review_date[`i'] > review_date[`j'] {
replace total_votes_cum = total_votes_cum + total_votes[`j'] if _n == `i'
}
}
}
replace total_votes_cum = total_votes_cum - total_votes
Is there any way that I can modify the code to expedite the process?
Thank you,
Sun
I am trying to generate a variable, 'total_votes_cum,' which calculates the cumulative 'total_votes' received in previous reviews for the same product ('prouct_parent') before the given review was posted. Please note that some products may have multiple reviews posted on the same date. I have provided sample data below:
product_parent review_date total_votes total_votes_cum
1 3/2/2015 10 0
1 3/2/2015 5 0
1 3/3/2015 5 15
1 3/4/2015 5 20
1 4/1/2015 7 25
2 5/1/2015 8 0
2 5/2/2015 6 8
2 5/2/2015 6 8
2 5/5/2015 6 20
2 6/1/2015 7 26
2 7/1/2015 12 33
I have tested the following code to create 'total_votes_cum,' and it works fine. But the issue is that it is taking too long to process for a very large dataset with millions of reviews:
by product_parent: gen total_votes_cum = total_votes
local n = _N
forval i = 1/`n' {
forval j = 1/`n' {
if product_parent[`i'] == product_parent[`j'] & review_date[`i'] > review_date[`j'] {
replace total_votes_cum = total_votes_cum + total_votes[`j'] if _n == `i'
}
}
}
replace total_votes_cum = total_votes_cum - total_votes
Is there any way that I can modify the code to expedite the process?
Thank you,
Sun
Comment