Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to make STATA estimate different coefficients in the same regression model using different sample sizes?

    Hi All,

    I am running a regression model in STATA. The regression equation is as following:

    y=a+b1*X1+b2*X2+b3*X3+...

    In my case, it makes no sense to impute missing values for one variable X3 because missings in this variable are logical skipping. However, I don't want the missings in X3 to reduce the sample size when I estimate b1 & b2. Does anyone know how I can let STATA know it should use a different sample size when estimating b1,b2, and b3?

    Your input will be greatly appreciated!

  • #2
    As I read your post, I found myself wondering if this might be one situation where the missing indicator method of dealing with missing data might actually work fairly well. (Generally, it is frowned on, because it produces biased estimates. But see this CMAJ article.) After a bit of Googling, I found these notes by Richard Williams. See page 5, where Richard quotes a footnote in Paul Allison's Sage monograph, Missing Data. It says that dummy variable adjustment (i.e., the missing indicator method) "may still be appropriate in cases where the unobserved value simply does not exist" (emphasis on may added).

    Richard, are you aware of any further developments since you wrote those notes?

    Cheers,
    Bruce
    --
    Bruce Weaver
    Email: [email protected]
    Version: Stata/MP 19.5 (Windows)

    Comment


    • #3
      Hi Bruce,

      Thank you so much for your input! I know that the missing dummy approach is now critiqued a lot in the literature. But I think I will run separate models for samples with/without missing and use this dummy variable approach as a robustness check. Thank you very much for your response!

      Best,
      Yangyang

      Comment


      • #4
        Hello Yangyang. I've just written to Paul Allison to ask about his current thoughts on this issue. If I get a response, I'll share it here (if he is agreeable to that).

        Cheers,
        Bruce
        --
        Bruce Weaver
        Email: [email protected]
        Version: Stata/MP 19.5 (Windows)

        Comment


        • #5
          I think it would be possible to estimate your original model using -ml-
          being possible doesn’t mean it would make sense tho
          thinking you can estimate a model as follows
          Code:
          program mywols
          args lnf xb1 xb2 
          replace ‘lnf’ = -($ML_y1-‘xb1’)^2 if $ML_y2==1
          replace ‘lnf’ = -($ML_y1-‘xb1’-‘xb2’)^2 if $ML_y2==2
          end
          so this way you can estimate a model indicating the sample with complete data and the sample with the missing data.

          my concern however is that while the above code may estimate something ,it may be different from what you actually want to find
          hth
          Fernando

          Comment


          • #6
            Originally posted by yangyang liu View Post
            it makes no sense to impute missing values for one variable X3 because missings in this variable are logical skipping.
            If it's part of a logical skip pattern, then wouldn't X3 at least implicitly be part of an interaction involving another predictor? Expand that interaction and then fit the regression model with the one (expanded-interaction) predictor. See below.

            .
            .ÿversionÿ15.1

            .ÿ
            .ÿclearÿ*

            .ÿ
            .ÿsetÿseedÿ`=strreverse("1486772")'

            .ÿ
            .ÿquietlyÿsetÿobsÿ6

            .ÿgenerateÿbyteÿX2ÿ=ÿ_nÿ>ÿ_Nÿ/ÿ2

            .ÿlabelÿdefineÿSexesÿ0ÿMÿ1ÿF

            .ÿlabelÿvaluesÿX2ÿSexes

            .ÿ
            .ÿgenerateÿbyteÿX3ÿ=ÿmod(_n,ÿ2)ÿifÿX2ÿ==ÿ"F":Sexes
            (3ÿmissingÿvaluesÿgenerated)

            .ÿlabelÿdefineÿStatusÿ0ÿ"Notÿpregnant"ÿ1ÿPregnant

            .ÿlabelÿvaluesÿX3ÿStatus

            .ÿ
            .ÿgenerateÿdoubleÿyÿ=ÿ1ÿ+ÿX2ÿ+ÿcond(!mi(X3),ÿX3,ÿ0)ÿ+ÿrnormal()

            .ÿ
            .ÿlist,ÿnoobsÿsepby(X2)

            ÿÿ+-------------------------------+
            ÿÿ|ÿX2ÿÿÿÿÿÿÿÿÿÿÿÿÿX3ÿÿÿÿÿÿÿÿÿÿÿyÿ|
            ÿÿ|-------------------------------|
            ÿÿ|ÿÿMÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿ1.0530543ÿ|
            ÿÿ|ÿÿMÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿ.06728675ÿ|
            ÿÿ|ÿÿMÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿ.86216538ÿ|
            ÿÿ|-------------------------------|
            ÿÿ|ÿÿFÿÿÿNotÿpregnantÿÿÿ1.2566586ÿ|
            ÿÿ|ÿÿFÿÿÿÿÿÿÿPregnantÿÿÿ3.3965953ÿ|
            ÿÿ|ÿÿFÿÿÿNotÿpregnantÿÿÿ3.1896777ÿ|
            ÿÿ+-------------------------------+

            .ÿ
            .ÿ*
            .ÿ*ÿBeginÿhere
            .ÿ*
            .ÿgenerateÿbyteÿX23ÿ=ÿcond(X2ÿ==ÿ"F":Sexes,ÿX3,ÿ3)

            .ÿlabelÿcopyÿStatusÿExpanded

            .ÿlabelÿdefineÿExpandedÿ3ÿMale,ÿadd

            .ÿlabelÿvaluesÿX23ÿExpanded

            .ÿlist,ÿnoobsÿsepby(X2)

            ÿÿ+----------------------------------------------+
            ÿÿ|ÿX2ÿÿÿÿÿÿÿÿÿÿÿÿÿX3ÿÿÿÿÿÿÿÿÿÿÿyÿÿÿÿÿÿÿÿÿÿÿÿX23ÿ|
            ÿÿ|----------------------------------------------|
            ÿÿ|ÿÿMÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿ1.0530543ÿÿÿÿÿÿÿÿÿÿÿMaleÿ|
            ÿÿ|ÿÿMÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿ.06728675ÿÿÿÿÿÿÿÿÿÿÿMaleÿ|
            ÿÿ|ÿÿMÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿ.86216538ÿÿÿÿÿÿÿÿÿÿÿMaleÿ|
            ÿÿ|----------------------------------------------|
            ÿÿ|ÿÿFÿÿÿNotÿpregnantÿÿÿ1.2566586ÿÿÿNotÿpregnantÿ|
            ÿÿ|ÿÿFÿÿÿÿÿÿÿPregnantÿÿÿ3.3965953ÿÿÿÿÿÿÿPregnantÿ|
            ÿÿ|ÿÿFÿÿÿNotÿpregnantÿÿÿ3.1896777ÿÿÿNotÿpregnantÿ|
            ÿÿ+----------------------------------------------+

            .ÿ
            .ÿregressÿyÿi.X23

            ÿÿÿÿÿÿSourceÿ|ÿÿÿÿÿÿÿSSÿÿÿÿÿÿÿÿÿÿÿdfÿÿÿÿÿÿÿMSÿÿÿÿÿÿNumberÿofÿobsÿÿÿ=ÿÿÿÿÿÿÿÿÿ6
            -------------+----------------------------------ÿÿÿF(2,ÿ3)ÿÿÿÿÿÿÿÿÿ=ÿÿÿÿÿÿ4.13
            ÿÿÿÿÿÿÿModelÿ|ÿÿ6.64205142ÿÿÿÿÿÿÿÿÿ2ÿÿ3.32102571ÿÿÿProbÿ>ÿFÿÿÿÿÿÿÿÿ=ÿÿÿÿ0.1377
            ÿÿÿÿResidualÿ|ÿÿ2.41495079ÿÿÿÿÿÿÿÿÿ3ÿÿ.804983596ÿÿÿR-squaredÿÿÿÿÿÿÿ=ÿÿÿÿ0.7334
            -------------+----------------------------------ÿÿÿAdjÿR-squaredÿÿÿ=ÿÿÿÿ0.5556
            ÿÿÿÿÿÿÿTotalÿ|ÿÿ9.05700221ÿÿÿÿÿÿÿÿÿ5ÿÿ1.81140044ÿÿÿRootÿMSEÿÿÿÿÿÿÿÿ=ÿÿÿÿ.89721

            ------------------------------------------------------------------------------
            ÿÿÿÿÿÿÿÿÿÿÿyÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿtÿÿÿÿP>|t|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
            -------------+----------------------------------------------------------------
            ÿÿÿÿÿÿÿÿÿX23ÿ|
            ÿÿÿPregnantÿÿ|ÿÿÿ1.173427ÿÿÿ1.098852ÿÿÿÿÿ1.07ÿÿÿ0.364ÿÿÿÿÿ-2.32361ÿÿÿÿ4.670464
            ÿÿÿÿÿÿÿMaleÿÿ|ÿÿ-1.562333ÿÿÿ.8190358ÿÿÿÿ-1.91ÿÿÿ0.152ÿÿÿÿÿ-4.16887ÿÿÿÿ1.044205
            ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
            ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ2.223168ÿÿÿ.6344224ÿÿÿÿÿ3.50ÿÿÿ0.039ÿÿÿÿÿ.2041529ÿÿÿÿ4.242183
            ------------------------------------------------------------------------------

            .ÿ
            .ÿexit

            endÿofÿdo-file


            .


            What you end up with is what is called in the old ANOVA literature as a "cell means model". For example, they would take a 2 × 2 factorial with a structurally empty cell and lay it out as a one-way ANOVA with three levels of the one factor. They would then make the sensible comparisons using a contrast afterward. You could do the same here with a postestimation contrast command.

            Comment


            • #7
              Joseph, I think that what you're suggesting in #6 works well when x3 is a categorical variable with a relatively small number of categories. But what if x3 is a quantitative variable (e.g.,severity of morning sickness on a scale from 1 to 100)? That is the situation for which I was wondering if the missing (or not applicable) indicator method might work, despite its known limitations in other situations.
              --
              Bruce Weaver
              Email: [email protected]
              Version: Stata/MP 19.5 (Windows)

              Comment


              • #8
                Originally posted by Bruce Weaver View Post
                But what if x3 is a quantitative variable (e.g.,severity of morning sickness on a scale from 1 to 100)?
                Bruce, acknowledged, but consider: what is the logical skip pattern that would give rise to missing in such cases? In the case you cite, a logical skip pattern would be something like Morning sickness? (Y/N), if N, then skip next. This would (logically) give a nonmissing value (0) for morning sickness when the answer is N.

                Comment


                • #9
                  Originally posted by Bruce Weaver View Post
                  I've just written to Paul Allison to ask about his current thoughts on this issue.
                  Dr. Allison has replied to my message. His thoughts are still as expressed in that footnote. He has not written further on that particular issue. And he is not aware of any other discussions of it.
                  --
                  Bruce Weaver
                  Email: [email protected]
                  Version: Stata/MP 19.5 (Windows)

                  Comment

                  Working...
                  X