Measure of similarity between observed values on different variables

Erik Aadland

Join Date: Jul 2014

Posts: 64
#1

Measure of similarity between observed values on different variables

28 Mar 2023, 03:18

Dear Statalist.

I am trying to generate a new variable (“wanted”) that captures for each individual (“id”) the extent to which the observed characteristic “char” on “id” is similar to an observed characteristic “char_r” on “id” that is measured from a reference group.
The observed characteristics “char” and “char_r” may take on either positive or negative values. I am planning to use “wanted” in a regression model as a measure of how similar “id” is to its reference group. In this respect, the values of “wanted” need to be meaningful or comparable across different “id”. Here is a small and simplified toy dataset that illustrates the nature of the data:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte id double(char char_r) 1 -.1751724 -.0768999 2 -.1751724 .0705089 3 .0213727 .0642464 4 .0213727 -.0277636 end

Any and all suggestions or comments regarding this problem are very welcome.
Thanks!
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35637
#2

28 Mar 2023, 04:22

I am not sure that I follow this, but it seems that each identifier occurs once, in which case the difference between value and reference value is to me the most obvious measure to use.

I don't know whether having a different sign is a big deal, or even a little deal, or whether other values mean that some relative measure makes more sense.
Comment
Erik Aadland

Join Date: Jul 2014

Posts: 64
#3

28 Mar 2023, 05:05

Thank you, Nick, for making me realize that the previous example probably was too simplified. The data structure is actually panel data with one observation of "id" per period. The reference group observation value stays the same for all "id" in a given period, but changes across period. I also thought about the value difference by "id" and "period", but is this the best or only option? Revised data structure example:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte(id period) double(char char_r) 1 1 -.1751724 -.0768999 2 1 .0213727 -.0768999 3 1 .0413727 -.0768999 1 2 .2751724 .0705089 2 2 .3451724 .0705089 3 2 .0032599 .0705089 4 2 .012287 .0705089 end

Thanks again!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35637
#4

28 Mar 2023, 07:19

OK, so this seems to turn into a twist on standard questions about measuring variability. You might want to work with say

Code:

egen variab1 = mean(abs(char - char_r)), by(id) egen variab2= mean((char - char_r)^2), by(id) replace variab2 = sqrt(variab2)
Comment
Erik Aadland

Join Date: Jul 2014

Posts: 64
#5

28 Mar 2023, 08:02

Thank you so much for your suggestions Nick. I greatly appreciate it. Would it be possible to calculate something along these lines (not the mean) by id and period? So that I get a period specific measure for each id? Thanks again.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35637
#6

28 Mar 2023, 08:14

Just change what you feed to by().
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35637
#7

28 Mar 2023, 16:25

But if you have one observation per identifier per period, aren't we back where we were at #2?
Comment
Erik Aadland

Join Date: Jul 2014

Posts: 64
#8

29 Mar 2023, 01:00

Almost back to #2.
I am thinking about this tweak of your code, Nick, to get the positive value difference (distance) between the values for all observations (including the negative ones):

Code:

bys id period: gen variab = (char - char_r)^2 replace variab = sqrt(variab)

Unless it is a really bad idea on my part, I might try this one.
Thanks again.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35637
#9

29 Mar 2023, 02:57

The bysort id period: will do nothing there to change the result from what you would get directly with

Code:

gen variab = abs(char - char_r)
Comment
Erik Aadland

Join Date: Jul 2014

Posts: 64
#10

29 Mar 2023, 03:14

Thank you Nick!
I was not aware of the abs( ) option. Very nice.
Best regards.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35637
#11

29 Mar 2023, 04:13

Good, but it's a function and documented as such.

Code:

help abs()
Comment
Rabia Ibrar

Join Date: Mar 2023

Posts: 2
#12

29 Mar 2023, 15:18

I am trying to estimate fix effect by using the dummy variable of countries but when I run regression i get the following results, can anyone help me regarding this.
Attached Files
Comment

Announcement

Measure of similarity between observed values on different variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment