calculating the sq root of the summed squared differences between an individual and other members of the individual's group

johnmccarthyjr

Join Date: May 2014

Posts: 6
#1

calculating the sq root of the summed squared differences between an individual and other members of the individual's group

23 Oct 2016, 14:54

Hi all,

I have a dataset where individuals (i) are nested within work units comprised of other individuals (js). Each individual has a value for a particular attribute (e.g., age). Other members in their unit have their own values for the same attribute. A highly simplified illustration of the database structure is available on this Google Spreadsheet. In the real database, there is occasionally missing data, i.e., respondents do not indicate a value for a given attribute.

If possible, I need to use Stata to calculate: "the square root of the summed squared differences between an individual Si's value on a specific demographic variable and the value on the same variable for every other individual Sj in the sample for the work unit, divided by the total number of respondents in the unit (n)" (per Anne Tsui et al.'s 1992 ASQ: http://www.jstor.org/stable/2393472?...n_tab_contents). The equation form is available here.

I'm not sure how to accomplish this in Stata. I'm hoping someone has done something similar and can point me in the right direction.

Thank you.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

23 Oct 2016, 16:15

Well, you can come close to this with a simple one-liner:

Code:

egen rssd = sd(age), by(work_unit)

However, the sd() function uses the square root of n-1 rather than n in its denominator. So to actually get what you want you have to follow that up with a correction:

Code:

by work_unit, sort: replace rssd = rssd*sqrt((_N-1)/_N)
Comment
johnmccarthyjr

Join Date: May 2014

Posts: 6
#3

23 Oct 2016, 16:58

Thank you, Clyde. I posted on here a long while ago and remember you helped me back then, too. I greatly appreciate your time.

Your code creates a unit level variable. The complicating aspect of this variable (and creating it) is that it should vary at the individual level -- it should be a function of an individual's characteristics relative to his or her unit. I've added a clearer description of the variable by taking a screen clipping from the article here.

I'm perplexed by how to accomplish this in Stata. My ideas are fuzzy, but, to create it, I think everyone would need an identifier reflecting their team, sans them.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

23 Oct 2016, 18:49

OK, I didn't understand your first post correctly, but now I see what you need. Actually, this is fairly straightforward, if you do some algebra first. The formula in the article can be "simplified" algebraically to an expression involving age, group mean age, and group age variance. It's painful to type out summations and subscripts and the like, so I'll skip it here. But in code it reduces to this:

Code:

by work_group, sort: egen age_bar = mean(age) by work_group, sort: egen var_age = sd(age) replace var_age = var_age ^2 by work_group, sort: gen rssd = age^2 - 2*age*age_bar + ((_N-1)/_N)*var_age + age_bar^2

If you want to work out the algebra yourself, start by expanding (age_i-age_j)² with the binomial theorem. Then the first term just becomes age^2, and the second term becomes 2*age*age_bar. The third term then gets expanded relying on the fact that the sum of squares is equal to the variance (calculated with N, not N-1) in the denominator) + N times the square of the mean. You put that all together and, adjust for the fact that Stata's variance is calculated with N-1, not N, in the denominator, and you get my formula.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#5

23 Oct 2016, 21:38

Oops, I just noticed I forgot to take the square root at the end. Also noticed a further simplification that comes from re-grouping terms and applying the binomial theorem in reverse. So replace that last line with:

Code:

by work_group, sort: gen rssd = sqrt((age-age_bar)^2 + ((_N-1)/_N)*var_age)
Comment

Announcement

calculating the sq root of the summed squared differences between an individual and other members of the individual's group

Comment

Comment

Comment

Comment