Matching across three groups of a dependent variable

Tia Denek

Join Date: Jan 2021

Posts: 17
#1

Matching across three groups of a dependent variable

28 Nov 2021, 05:37

My glucose measure has three groups low glucose, normal glucose levels and high glucose levels. I have an independent variable of interest whose effect on glucose distribution i would like to measure. I have 5 other variables which i know are con founders that i would like to match across my three glucose groups, so their effects are contained. One of these confounders is categorical (gender) and the other continuous. How can i perform matching that matches for all the 5 variables across the three groups? Does Caliper matching work if one of the confounding variables is categorical? and also my dependent variable is not binary, would that be a problem for caliper matching?
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2450
#2

28 Nov 2021, 09:11

When you say "match across my three glucose groups," it sounds like you want each individual to be matched with others on glucose group and the various confounders. Doing that -- which I may have misunderstood -- would be a distinctly bad idea. This would reduce variation in the very thing you want to analyze, i.e., glucose level, and would bias your findings of effect toward zero. Using a categorized measure of glucose level likely has a similar effect. Analyzing a continuous variable after categorizing involves throwing away information, which is rarely a good choice. I understand, though, that the glucose variable may only be available to you in categorical form.

Regarding how to perform matching, what was suggested in response to your posting in a related thread still applies, that is, that there are a great many postings on StataList about programs and techniques for matching. Looking at some of those would help you. You might look at -ssc describe calipmatch- to offer a start.
Comment
Tia Denek

Join Date: Jan 2021

Posts: 17
#3

29 Nov 2021, 02:13

Yes you are right but I think I may have explained it vaguely. Just to be clear that my method makes sense and i should go ahead with it, let me try to explain a little more clearly . I have the exact measurements of glucose for all these individuals. But i want to look at the influence of a variable (which I can call Var1) on the variation of glucose in this group. However, i dont want age, weight etc to influence this and i would like to have a group of individuals who are matched for these confounders. i have a large cohort from which i am taking a certain amount from(equal amounts of people with low, normal, and high glucose levels so my group isnt composed mostly of people with high glucose or low glucose etc). So for instance for a person who has a glucose level considered low (within the low range) and who is in a certain age group, I would want there to be one person in the normal and in the high range of a similar age group as well and so on with all the relevant confounders. At the end, I hope to have a cohort of individuals with varying levels of glucose (not categorized...just as it is) but who have their confounders matched and so I can properly measure the influence of Var1 on glucose variability.

I am going to try to work on calipmatch, thank you for that suggestion. i just wanted to clarify above in case you might find my technique flawed or have any suggestions. Thank you
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2450
#4

29 Nov 2021, 12:38

Your description of your plan and rationale still sound to me like you have some confusions, and I think you may be planning to do some things that will work against your interests.

1) Stratifying on the *response* variable, i.e., glucose level, is probably not necessary and might produce problems. Doing so would lead to you having to do a weighted analysis, which is not always necessarily trivial. I'd further mention that as the literature on case-control studies shows, the functioning of matched samples when there is stratification on the response variable does *not* necessarily reduce bias as it does in studies matched/stratified on predictor variables. (I know this is counter-intuitive.) If this is unfamiliar to you, I'd encourage you to be wary, and to seek out direct person to person consultation with someone with the relevant expertise.

2) Your concern that a simple random sample might be unrepresentative with respect to the glucose variable is probably unfounded, or at least somewhat misunderstood. A simple random sample will not be biased, but it could have random variation in the glucose variable that is undesirable for your purposes. If that's a concern to you, the easier and better choice would be to take a larger sample rather than stratifying. This also reduces random variation and is desirable for other reasons. Perhaps you are concerned that taking a larger sample would be expensive in time and money, in which case there *might* be some reason to stratify as you suggest. Or, if you the distribution of glucose is very "lumpy," with only small numbers of people in the problem regions, then *perhaps* stratifying might be worthwhile. If expense/lumpy distribution is your issue, my thinking is that a good choice of design will require one-on-one consultation with an expert about your research design. Spending money on expert advice now could save you a lot later in your study.

That being said, we *might* be able to advise you here if you gave a sense of how big a sample you were planning to take, how much cost considerations constrains the size of your sample, and if you gave us some information about the distribution of the glucose level variable.
Comment

Announcement

Matching across three groups of a dependent variable

Comment

Comment

Comment