Using reldist for distance/divergence of two distributions x two indicators -- how to aggregate?

Max Hartz

Join Date: Dec 2023

Posts: 6
#1

Using reldist for distance/divergence of two distributions x two indicators -- how to aggregate?

14 May 2025, 01:57

Dear Statalist,

I have the following setup: Two independent samples, one baseline (census) and one survey sample.

The baseline (census data) comes in aggregated form. I therefore aggregated the survey data likewise.

A mock data structure example would be as below.
For this mock data I can estimate distance of the two distributions by two indicators.
I can then average these distances.

Code:

**if not already installed, install reldist and other required packages /* ssc install reldist, replace // thanks to Benn Jann for providing these!!! ssc install moremata, replace ssc install kmatch, replace ssc install kdens, replace */ ** mock census and survey data in aggregate form, differing on two categorical variables (gender, agegroup) sysuse pop2000 , clear // mock census data keep agegrp maletotal femtotal expand 2, gen(male) gen total = maletotal if male == 1 replace total = femtotal if male == 0 drop maletotal femtotal expand 2, gen(sample) // mock survey data set seed 1234 replace total =runiformint(200,400) if sample == 1 & male == 1 replace total = runiformint(250,450) if sample == 1 & male == 0 ** comparison of chi2 distance between census baseline (sample==0) and survey (sample==1) for two categorical variables (agegrp; male) reldist divergence agegrp [fweight=total], by(sample) categorical chi2 local est1 = e(b)[1,1] reldist divergence male [fweight=total], by(sample) categorical chi2 local est2 = e(b)[1,1] **average distance of sample 1 and sample 0 on both di (`est1' + `est2')/2

This gives me two ensuing chi2 distance estimates
- estimate 1: 0.23 (se: 0.013), the distance between sample 0 and sample 1 on the agegroup indicator.
- estimate 2: 0.01 (se: 0.0019), the distance between sample 0 and sample 1 on the gender indicator.

If I understand this correctly, these have the property of being additive.
Hence I can properly interpret the average distance of 0.12 ((`est1' + `est2')/2) as multivariate distance of sample 0 and 1.
But how can I get a confidence interval around this average of 0.12?

Any insight would be much appreciated! I need this confidence interval as in my real data I have one baseline and three survey samples. I compare all three survey samples to baseline and then want to be able to say which survey sample is closest/or farthest from the baseline, and whether the three samples are significantly farther/closer away. Hence, confidence intervals for the average distance would be key.

Thanks so much for any ideas.

Last edited by Max Hartz; 14 May 2025, 02:06.
Tags: None
Max Hartz

Join Date: Dec 2023

Posts: 6
#2

14 May 2025, 09:27

After researching a bit (see e.g. https://stats.stackexchange.com/ques...dard-deviation) , I wonder whether

Code:

sqrt((`se1'^2 + `se2'^2)/4)

would be appropriate. I.e., the sum of the squared standard errors divided by 2^2, then the square root. Hope this is correctly deduced...
Comment

Announcement

Using reldist for distance/divergence of two distributions x two indicators -- how to aggregate?

Comment