How to make STATA estimate different coefficients in the same regression model using different sample sizes?

yangyang liu

Join Date: Mar 2019

Posts: 2
#1

How to make STATA estimate different coefficients in the same regression model using different sample sizes?

05 Mar 2019, 15:56

Hi All,

I am running a regression model in STATA. The regression equation is as following:

y=a+b1*X1+b2*X2+b3*X3+...

In my case, it makes no sense to impute missing values for one variable X3 because missings in this variable are logical skipping. However, I don't want the missings in X3 to reduce the sample size when I estimate b1 & b2. Does anyone know how I can let STATA know it should use a different sample size when estimating b1,b2, and b3?

Your input will be greatly appreciated!
Tags: None
Bruce Weaver

Join Date: May 2014

Posts: 1141
#2

05 Mar 2019, 16:28

As I read your post, I found myself wondering if this might be one situation where the missing indicator method of dealing with missing data might actually work fairly well. (Generally, it is frowned on, because it produces biased estimates. But see this CMAJ article.) After a bit of Googling, I found these notes by Richard Williams. See page 5, where Richard quotes a footnote in Paul Allison's Sage monograph, Missing Data. It says that dummy variable adjustment (i.e., the missing indicator method) "may still be appropriate in cases where the unobserved value simply does not exist" (emphasis on may added).

Richard, are you aware of any further developments since you wrote those notes?

Cheers,
Bruce

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
yangyang liu

Join Date: Mar 2019

Posts: 2
#3

07 Mar 2019, 12:23

Hi Bruce,

Thank you so much for your input! I know that the missing dummy approach is now critiqued a lot in the literature. But I think I will run separate models for samples with/without missing and use this dummy variable approach as a robustness check. Thank you very much for your response!

Best,
Yangyang
1 like
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1141
#4

07 Mar 2019, 13:45

Hello Yangyang. I've just written to Paul Allison to ask about his current thoughts on this issue. If I get a response, I'll share it here (if he is agreeable to that).

Cheers,
Bruce

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2492
#5

07 Mar 2019, 19:00

I think it would be possible to estimate your original model using -ml-
being possible doesn’t mean it would make sense tho
thinking you can estimate a model as follows

Code:

program mywols args lnf xb1 xb2 replace ‘lnf’ = -($ML_y1-‘xb1’)^2 if $ML_y2==1 replace ‘lnf’ = -($ML_y1-‘xb1’-‘xb2’)^2 if $ML_y2==2 end

so this way you can estimate a model indicating the sample with complete data and the sample with the missing data.

my concern however is that while the above code may estimate something ,it may be different from what you actually want to find
hth
Fernando
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4453
#6

08 Mar 2019, 00:47

Originally posted by yangyang liu View Post

it makes no sense to impute missing values for one variable X3 because missings in this variable are logical skipping.

If it's part of a logical skip pattern, then wouldn't X3 at least implicitly be part of an interaction involving another predictor? Expand that interaction and then fit the regression model with the one (expanded-interaction) predictor. See below.

.
.ÿversionÿ15.1

.ÿ
.ÿclearÿ*

.ÿ
.ÿsetÿseedÿ`=strreverse("1486772")'

.ÿ
.ÿquietlyÿsetÿobsÿ6

.ÿgenerateÿbyteÿX2ÿ=ÿ_nÿ>ÿ_Nÿ/ÿ2

.ÿlabelÿdefineÿSexesÿ0ÿMÿ1ÿF

.ÿlabelÿvaluesÿX2ÿSexes

.ÿ
.ÿgenerateÿbyteÿX3ÿ=ÿmod(_n,ÿ2)ÿifÿX2ÿ==ÿ"F":Sexes
(3ÿmissingÿvaluesÿgenerated)

.ÿlabelÿdefineÿStatusÿ0ÿ"Notÿpregnant"ÿ1ÿPregnant

.ÿlabelÿvaluesÿX3ÿStatus

.ÿ
.ÿgenerateÿdoubleÿyÿ=ÿ1ÿ+ÿX2ÿ+ÿcond(!mi(X3),ÿX3,ÿ0)ÿ+ÿrnormal()

.ÿ
.ÿlist,ÿnoobsÿsepby(X2)

ÿÿ+-------------------------------+
ÿÿ|ÿX2ÿÿÿÿÿÿÿÿÿÿÿÿÿX3ÿÿÿÿÿÿÿÿÿÿÿyÿ|
ÿÿ|-------------------------------|
ÿÿ|ÿÿMÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿ1.0530543ÿ|
ÿÿ|ÿÿMÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿ.06728675ÿ|
ÿÿ|ÿÿMÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿ.86216538ÿ|
ÿÿ|-------------------------------|
ÿÿ|ÿÿFÿÿÿNotÿpregnantÿÿÿ1.2566586ÿ|
ÿÿ|ÿÿFÿÿÿÿÿÿÿPregnantÿÿÿ3.3965953ÿ|
ÿÿ|ÿÿFÿÿÿNotÿpregnantÿÿÿ3.1896777ÿ|
ÿÿ+-------------------------------+

.ÿ
.ÿ*
.ÿ*ÿBeginÿhere
.ÿ*
.ÿgenerateÿbyteÿX23ÿ=ÿcond(X2ÿ==ÿ"F":Sexes,ÿX3,ÿ3)

.ÿlabelÿcopyÿStatusÿExpanded

.ÿlabelÿdefineÿExpandedÿ3ÿMale,ÿadd

.ÿlabelÿvaluesÿX23ÿExpanded

.ÿlist,ÿnoobsÿsepby(X2)

ÿÿ+----------------------------------------------+
ÿÿ|ÿX2ÿÿÿÿÿÿÿÿÿÿÿÿÿX3ÿÿÿÿÿÿÿÿÿÿÿyÿÿÿÿÿÿÿÿÿÿÿÿX23ÿ|
ÿÿ|----------------------------------------------|
ÿÿ|ÿÿMÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿ1.0530543ÿÿÿÿÿÿÿÿÿÿÿMaleÿ|
ÿÿ|ÿÿMÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿ.06728675ÿÿÿÿÿÿÿÿÿÿÿMaleÿ|
ÿÿ|ÿÿMÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿ.86216538ÿÿÿÿÿÿÿÿÿÿÿMaleÿ|
ÿÿ|----------------------------------------------|
ÿÿ|ÿÿFÿÿÿNotÿpregnantÿÿÿ1.2566586ÿÿÿNotÿpregnantÿ|
ÿÿ|ÿÿFÿÿÿÿÿÿÿPregnantÿÿÿ3.3965953ÿÿÿÿÿÿÿPregnantÿ|
ÿÿ|ÿÿFÿÿÿNotÿpregnantÿÿÿ3.1896777ÿÿÿNotÿpregnantÿ|
ÿÿ+----------------------------------------------+

.ÿ
.ÿregressÿyÿi.X23

ÿÿÿÿÿÿSourceÿ|ÿÿÿÿÿÿÿSSÿÿÿÿÿÿÿÿÿÿÿdfÿÿÿÿÿÿÿMSÿÿÿÿÿÿNumberÿofÿobsÿÿÿ=ÿÿÿÿÿÿÿÿÿ6
-------------+----------------------------------ÿÿÿF(2,ÿ3)ÿÿÿÿÿÿÿÿÿ=ÿÿÿÿÿÿ4.13
ÿÿÿÿÿÿÿModelÿ|ÿÿ6.64205142ÿÿÿÿÿÿÿÿÿ2ÿÿ3.32102571ÿÿÿProbÿ>ÿFÿÿÿÿÿÿÿÿ=ÿÿÿÿ0.1377
ÿÿÿÿResidualÿ|ÿÿ2.41495079ÿÿÿÿÿÿÿÿÿ3ÿÿ.804983596ÿÿÿR-squaredÿÿÿÿÿÿÿ=ÿÿÿÿ0.7334
-------------+----------------------------------ÿÿÿAdjÿR-squaredÿÿÿ=ÿÿÿÿ0.5556
ÿÿÿÿÿÿÿTotalÿ|ÿÿ9.05700221ÿÿÿÿÿÿÿÿÿ5ÿÿ1.81140044ÿÿÿRootÿMSEÿÿÿÿÿÿÿÿ=ÿÿÿÿ.89721

------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿyÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿtÿÿÿÿP>|t|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿX23ÿ|
ÿÿÿPregnantÿÿ|ÿÿÿ1.173427ÿÿÿ1.098852ÿÿÿÿÿ1.07ÿÿÿ0.364ÿÿÿÿÿ-2.32361ÿÿÿÿ4.670464
ÿÿÿÿÿÿÿMaleÿÿ|ÿÿ-1.562333ÿÿÿ.8190358ÿÿÿÿ-1.91ÿÿÿ0.152ÿÿÿÿÿ-4.16887ÿÿÿÿ1.044205
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ2.223168ÿÿÿ.6344224ÿÿÿÿÿ3.50ÿÿÿ0.039ÿÿÿÿÿ.2041529ÿÿÿÿ4.242183
------------------------------------------------------------------------------

.ÿ
.ÿexit

endÿofÿdo-file

.

What you end up with is what is called in the old ANOVA literature as a "cell means model". For example, they would take a 2 × 2 factorial with a structurally empty cell and lay it out as a one-way ANOVA with three levels of the one factor. They would then make the sensible comparisons using a contrast afterward. You could do the same here with a postestimation contrast command.
1 like
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1141
#7

08 Mar 2019, 07:11

Joseph, I think that what you're suggesting in #6 works well when x3 is a categorical variable with a relatively small number of categories. But what if x3 is a quantitative variable (e.g.,severity of morning sickness on a scale from 1 to 100)? That is the situation for which I was wondering if the missing (or not applicable) indicator method might work, despite its known limitations in other situations.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4453
#8

08 Mar 2019, 07:23

Originally posted by Bruce Weaver View Post

But what if x3 is a quantitative variable (e.g.,severity of morning sickness on a scale from 1 to 100)?

Bruce, acknowledged, but consider: what is the logical skip pattern that would give rise to missing in such cases? In the case you cite, a logical skip pattern would be something like Morning sickness? (Y/N), if N, then skip next. This would (logically) give a nonmissing value (0) for morning sickness when the answer is N.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1141
#9

11 Mar 2019, 12:03

Originally posted by Bruce Weaver View Post

I've just written to Paul Allison to ask about his current thoughts on this issue.

Dr. Allison has replied to my message. His thoughts are still as expressed in that footnote. He has not written further on that particular issue. And he is not aware of any other discussions of it.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment

Announcement

How to make STATA estimate different coefficients in the same regression model using different sample sizes?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment