Dear Statalist,
I want to calculate the coefficient of variance in a large dataset (>60 million observations). My dataset includes individuals' performance in a given team and firm. For each team, I would like to calculate the coefficient of variance (or an alternative dispersion measure) with respect to performance. The teams can have different sizes (2 or more members). In the following example, "team_performance_disparity" is the variable of interest.
Theoretically, the following command would do the trick
except that the command is to slow for large datasets and the values only appear in the output window and not as new column in the dataset.
If there is loop to generate the coefficient of variance and store the results in a separate column, it would be extremely helpful.
Thanks,
Marvin
I want to calculate the coefficient of variance in a large dataset (>60 million observations). My dataset includes individuals' performance in a given team and firm. For each team, I would like to calculate the coefficient of variance (or an alternative dispersion measure) with respect to performance. The teams can have different sizes (2 or more members). In the following example, "team_performance_disparity" is the variable of interest.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str2 ind_ID str1(team_ID firm_ID) byte ind_performance double team_performance_disparity "#1" "a" "x" 9 .1767767 "#2" "a" "x" 7 .1767767 "#3" "b" "x" 0 .91879348 "#4" "b" "x" 2 .91879348 "#5" "b" "x" 4 .91879348 "#6" "b" "x" 7 .91879348 "#1" "a" "y" 9 1.1313708 "#2" "a" "y" 1 1.1313708 "#3" "b" "y" 1 .95839372 "#4" "b" "y" 2 .95839372 "#5" "b" "y" 3 .95839372 "#6" "b" "y" 9 .95839372 end
Theoretically, the following command would do the trick
Code:
by firm_ID team_ID: cv2 ind_performance
If there is loop to generate the coefficient of variance and store the results in a separate column, it would be extremely helpful.
Thanks,
Marvin
Comment