Case Control Design: weighting

Daniel Allam

Join Date: Nov 2017

Posts: 13
#1

Case Control Design: weighting

03 Jun 2021, 04:17

Hey guys,

I'm conducting a study that analyses the antecedents of corporate divestitures based on a case control design. The sampling is done based on the dependent variable. Cases are firms that engaged in a divestiture over a certain period of time and controls are firms that did not. My sample was designed as such: for each case firm, a control firm was matched, leaving me with a ratio of case to control firms of 1:1.
I found studies that followed a similar approach and subsequently weighted the ratio of cases to control firms to minimise biases in the model parameters. For example the initial sample ratio was 1:1 and they ended up with a ratio of 1:5 using the stata weight command.
I am a bit uncertain on which weight command to use and how to create a variable that incorporates the weight of e.g. 1:5.

I would really appreciate your support.

Thank you very much

Daniel
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#2

03 Jun 2021, 08:35

Your description of weighting in a case-control study doesn't sound to me like anything I've heard of, which might be either the description *or* my ignorance. Citing one such article might be helpful here.

Also, regarding something you didn't ask about: Many people would argue for so-called "incidence density" sampling, in which controls are sampled from the risk set (cases and non-cases) that prevails at the time each case occurred. See, e.g.:
Greenland, S. and Thomas, D.C., 1982. On the need for the rare disease assumption in case-control studies. American journal of epidemiology, 116(3), pp.547-553.
Comment
Daniel Allam

Join Date: Nov 2017

Posts: 13
#3

03 Jun 2021, 10:02

This following paper did what I am trying to do: Shimizu, K., & Hitt, M. A. (2005). What constrains or facilitates divestitures of formerly acquired firms? The effects of organizational inertia. Journal of Management, 31(1), 50-72.

On page 58 it says:
"The sampling is based on the dependent variable, a case control design (Palepu, 1986). To control the potential biases in such a design, an equivalent number of companies that acquired another firm but did not divest it within the same period were randomly selected based on the year of acquisition (Seabright, Levinthal, & Fichman, 1992). Furthermore, following Palepu (1986), we weighted the ratio of control firms to sample firms to minimize the possible biases in estimating the model parameters. We assumed that 20% of the acquisitions were divested (i.e., we quadrupled the control group, using the STATA weight command)..."
Comment
Daniel Allam

Join Date: Nov 2017

Posts: 13
#4

03 Jun 2021, 12:52

I assume it works with the pweight command, but I'm not really sure how to set it up. What do you guys think?
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#5

03 Jun 2021, 17:14

I took a look at the article you cited -- thanks --, and I would not rely on it as an exemplar of case-control methods. (Epidemologists/biostatisticians--which I'm not--are the experts here, and they were thinking, theorizing, and practicing case-control designs for about 30-50 years before other social scientists got onto them. That article and the Palepu do not seem to countenance that literature.) My best guess from those articles is that you want to weight to adjust for the disproportionate stratified sampling on the response variable. Doing so would require that you know the marginal distribution of cases and controls in the population. ( In many cases in which one *does* know that marginal, it's just as easy to collect data without stratifying on the response variable, depending on the costs of data collection, and thus avoid such issues.) If that adjustment is your goal and you do know the probability of selection for cases and controls, it should be just like weighting for any other stratified sample, so a p-weight should work. Note, though, that if you are using logistic regression, the odds ratios (though not the predicted probabilities) are valid without weighting, a finding that goes back about 70 years.

One source you might find useful, both for Stata software and explanation, is: https://gking.harvard.edu/relogit My recollection is that that software has facilities for adjusting for unequal prob. of selection so as to yield correct predicted probabilities, as well as handle some other issues.

You might also look at my article, https://doi.org/10.2307/1389496 which is now out of date in certain ways, but is regarded by some people <grin> as useful in its clarity.
Comment
Daniel Allam

Join Date: Nov 2017

Posts: 13
#6

04 Jun 2021, 04:47

Thanks Mike!

My best guess from those articles is that you want to weight to adjust for the disproportionate stratified sampling on the response variable. Doing so would require that you know the marginal distribution of cases and controls in the population

Yes exactly! The distribution in my sample is 50% cases and 50% controls (or 1:1). I don't know the exact distribution in the population, however I can make a very good assumption based on studies that are similar to mine. So let's assume the distribution of cases to controls in the population is 1:4. How can I calculate the weights that I need to insert into the pweight command?

Thanks and best regards,

Daniel
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#7

04 Jun 2021, 10:45

The distribution in the population is what you lack know the probability of selection, as would be required to weight any other sample with unequal probabilities of selection.

Per -help weight-:
"pweights, or sampling weights, are weights that denote the inverse of the probability that the observation is included because of the sampling design."

My recollection is that there is a literature on using sample-estimated information about the population distribution of the response to estimate probabilities of selection and incorporate them into a case-control study. On that subject and other related matters, you might check the articles in the gking link I gave above. If your weights derive from a *guess* rather than an empirically-estimated quantity, something I haven't seen before but for which I have not been searching, I'd encourage you to try some different weights to see how sensitive your predicted probabilities are to the choice of weight. I think that choice would matter a lot, based on some little experiments I just did with simulated data.
Comment
Daniel Allam

Join Date: Nov 2017

Posts: 13
#8

04 Jun 2021, 11:35

Sorry Mike, I'm a bit lost. Let's say I want to replicate what the paper I cited in post #3 did.

We assumed that 20% of the acquisitions were divested (i.e., we quadrupled the control group, using the STATA weight command)..."

How would I derive the pweight? Is it just 4 for the control group then? Assuming that the the case control ratio in my sample is 1:1.
Also, would pweights affect my summary statistics?

Thanks a lot!

Daniel
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#9

04 Jun 2021, 13:37

From the sentence you give, I don't know how to tell whether the article is correct or not, as it doesn't talk about sampling fractions, i.e., probabilities of selection. Again, from scanning that article and the one by Palepu, I don't see them talking in ways that connect with the longstanding case-control literature, and I think they may have skipped over potentially important issues, so I would not necessarily rely on them being useful. To my understanding, talking about the ratio of cases and controls is possibly wrong, since that ratio does not by itself determine the sampling fractions.

So, let's try this example and see if it helps:

Let's say that you selected *all* cases of divestiture in your time frame, so the probability of selection has to be 1, making the pweight as explained above = 1/1 = 1. Then, you make some *guess* that there were 10,000 corporations in your control population, and let's assume for illustration that there Ncontrols = Ncases = 275, making the probability of selection for controls = 275/10,000. The pweight = 1/(275/10000) = 36.36 for the controls. With that information you could do this:

Code:

gen MyPweight = cond(case = 1, 1.0, 36.36) logit case X1 X2 X3 [pweight = MyPweight]

Note that the scale of pweights is generally arbitrary, as the PDF documentation for -help weight- (u.pdf, p. 89) explains.

Perhaps you can work out how your original source made its guesses, and how 20% (1/5) translates to "quadrupled" because I don't know the answer to either of those.

In relation to my uneasiness about guessing the probability of selection and hence the pweight values, here's an example you might try, in which there are substantial difference in predicted probabilities with different weights but relatively small differences in estimated slopes (omitted from display below):

Code:

clear sysuse auto gen wt = cond(foreign, 1, 36) logit foreign price length [pweight = wt] predict p36 replace wt = cond(foreign, 1, 100) logit foreign price length [pweight = wt] predict p100 list p36 p100 in 1/10 // snip, snip +---------------------+ | p36 p100 | |---------------------| 1. | .0026844 .0009899 | 2. | .0142682 .005189 | 3. | .0164634 .0059759 | 4. | .0012259 .0004568 | 5. | .0002573 .0000983 | |---------------------| 6. | .0001693 .0000648 | 7. | .0174023 .006322 | 8. | .0009286 .0003475 | 9. | .0037826 .0014027 | 10. | .0005848 .0002197 | +---------------------+
Comment
Daniel Allam

Join Date: Nov 2017

Posts: 13
#10

04 Jun 2021, 15:28

Thank you very mich Mike! I will look into this and try it based on your example.
I am aware that the results will be highly dependent on the weights assigned. I might provide some kind of sensitivity analysis.
Once again, thank you so much, really appreciated! I'll keep you updated
Comment

Announcement

Case Control Design: weighting

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment