Matching observations with similar characteristics

Allan Andreotti

Join Date: Nov 2019

Posts: 14
#1

Matching observations with similar characteristics

18 Sep 2020, 16:13

Hi, im trying to match variables with similar characteristics. By the way, this is the first time that i use any matching method, so please be patient. For example i have a dataset with countries and a bunch of characteristics, and i want to group countries that share similar charactersitics.
+---------+--------+------------+--+ | Country | income | population | | +---------+--------+------------+--+ | US | 100 | 1500 | | | UK | 100 | 1200 | | | Spain | 90 | 1100 | | +---------+--------+------------+--+
I have reading some literature and what i found is the "Propensity score matching" using the psmatch2 command in STATA and what i was thinking was to group countries with higher Propensity scores.

Thank you
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

18 Sep 2020, 16:33

And your question is ...?

In addition to stating what help you would like, you should probably explain why you are trying to match observations: what use you will make of the resulting matches. That will shed light on what the best approach to matching the observations might be. Also do you want matched pairs, or a fixed n:1 match (n > 2) or a variable n:1 match?
1 like
Comment
Allan Andreotti

Join Date: Nov 2019

Posts: 14
#3

18 Sep 2020, 17:43

Originally posted by Clyde Schechter View Post

And your question is ...?

In addition to stating what help you would like, you should probably explain why you are trying to match observations: what use you will make of the resulting matches. That will shed light on what the best approach to matching the observations might be. Also do you want matched pairs, or a fixed n:1 match (n > 2) or a variable n:1 match?

Hi. The main question is if i can use the propensity score matching to group the countries with higher propensity scores? For example i want to divide the sample in two groups: Group A with a score higher than K. And group B with a score smaller than K. So if i have 36 countries i want to split them into two groups using the propensity score to choose a cutoff so i get 18 countries in group A and 18 countries in group B.
It is hard to explain why i want to match the observations. But in a few words, i want to compare the effect of a certain variable in countries that share similar characteristics. I mean is not the main purpose of my work, but it is more like arobustness check to see if the effects that i found in some countries are still present in countries that share similar characteristics.
I know is hard to explain, but thats basically the main idea.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

18 Sep 2020, 18:48

Well, I still don't have a clear sense of what you're trying to do. So I'll just give you a sense of what propensity score matching is typically used for. The usual application is that you want to test the effect of predictor X (a 0/1 variable) on outcome Y. But the data are observational, and you have concerns that the X = 0 units (countries in your case) differ from the X = 1 units in ways that may be relevant to outcome Y and thereby bias the estimation of the effect (i.e. confounding, aka omitted-variable bias). So under some strong assumptions, one way of reducing that confounding is to develop a model of the probability, based on other variables you have measured, that a given unit will have X = 0 or X = 1. That probability is called the propensity score, and it is typically estimated using a logistic or probit regression of X on those other variables. Propensity scores can be used in several ways, and pairing up units that have similar propensity scores but opposite values of X and then performing a matched-pairs analysis of the effect of X on Y is then done. Is this what you have in mind? If so, it is a pretty standard approach.

Now, part of what you say in #3 sounds instead like you want to just partition the data into two halves by splitting it at the median value of some similarity measure. You don't really say anything about studying an X and Y relationship of the kind I mentioned in the first paragraph. In that case, I don't even see how you would go about using propensity scores. Without that X, there is no such thing as a propensity score. If you are just looking to partition the data set into clusters that agree, within the clusters, on the variables in your data set but differ between the clusters, then that sounds like a task for cluster analysis, for which Stata has the -cluster- command.

it is more like arobustness check to see if the effects that i found in some countries are still present in countries that share similar characteristics.

I'm sorry, but I really don't grasp what this means.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35721
#5

18 Sep 2020, 19:03

Cross-posted at https://stackoverflow.com/questions/...haracteristics

Please note our policy on cross-posting, which is that you are asked to tell us about it.

I am active there too and advise that your question there is not a good fit for Stack Overflow. It's really for questions where you have code already and a specific problem with it. It is not a help line when you are unsure about and totally new to any language.

You're better off here, but as Clyde Schechter points out it's very unclear what you're seeking.
Comment
Allan Andreotti

Join Date: Nov 2019

Posts: 14
#6

18 Sep 2020, 19:18

Originally posted by Clyde Schechter View Post

Well, I still don't have a clear sense of what you're trying to do. So I'll just give you a sense of what propensity score matching is typically used for. The usual application is that you want to test the effect of predictor X (a 0/1 variable) on outcome Y. But the data are observational, and you have concerns that the X = 0 units (countries in your case) differ from the X = 1 units in ways that may be relevant to outcome Y and thereby bias the estimation of the effect (i.e. confounding, aka omitted-variable bias). So under some strong assumptions, one way of reducing that confounding is to develop a model of the probability, based on other variables you have measured, that a given unit will have X = 0 or X = 1. That probability is called the propensity score, and it is typically estimated using a logistic or probit regression of X on those other variables. Propensity scores can be used in several ways, and pairing up units that have similar propensity scores but opposite values of X and then performing a matched-pairs analysis of the effect of X on Y is then done. Is this what you have in mind? If so, it is a pretty standard approach.

Now, part of what you say in #3 sounds instead like you want to just partition the data into two halves by splitting it at the median value of some similarity measure. You don't really say anything about studying an X and Y relationship of the kind I mentioned in the first paragraph. In that case, I don't even see how you would go about using propensity scores. Without that X, there is no such thing as a propensity score. If you are just looking to partition the data set into clusters that agree, within the clusters, on the variables in your data set but differ between the clusters, then that sounds like a task for cluster analysis, for which Stata has the -cluster- command.

I'm sorry, but I really don't grasp what this means.

Thanks a lot for the explanation
Comment

Announcement

Matching observations with similar characteristics

Comment

Comment

Comment

Comment

Comment