Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • margins after sem, method(mlmv)

    I'm using Stata 15. I have four continuous and normally distributed variables, which I'll call y, x1, x2, and x3. 20% of the cases have missing values on x3, and I suspect missingness is MAR. I'd like to fit a regression model using FIML, i.e.,
    Code:
    sem(y <- x1 x2 x3), method(mlmv)
    This part isn't a problem. The problem is, after fitting the model, I'd like to use the -margins- command to obtain predictive margins, i.e.,
    Code:
    margins, at(x3=(50(10)50))
    When I do so, margins defaults to the subset of cases with non-missing values, producing what I believe to be biased estimates. I suspect there's no way around this issue, but I thought I'd ask here in case anyone has ideas on how to proceed.

    Thanks in advance,

    IYH

  • #2
    In this specific case that you describe above, you could probably

    Code:
    sem ...
    replace x3 = -123456789 if mi(x3)
    margins , at(x3 = ...)
    Unless I am missing something this should give you valid results given the exact situation and the exact code that you describe. The above does not necessarily generalize to situations where any of these aspects differ.

    Best
    Daniel

    Comment


    • #3
      Thanks, Daniel. That does work, but I'm wondering if you might provide a bit of intuition as to why? I don't quite follow the logic / mechanics behind what's going on.

      Comment


      • #4
        It seems margins (or perhaps predict, on which the former is based) omits observations with missing values on the predictors in the model. This usually makes sense, because you cannot predict values if the predictor is missing. The suggested code merely plugs in an arbitrary (non-missing) value in place of the missing values to trick margins. I believe the results are valid [but see Edit below] because you explicitly set values for the predictor that had missing values using the at() option. This implies that the values that we have plugged in are never used which is why you may chose any value. Even better would probably be

        Code:
        generate missing_x3 = mi(x3) if e(sample)
        replace x3 = 42 if (missing_x3 == 1)
        margins ...
        (using the ultimate answer 42 as an arbitrary value) to make sure you are really using only the observations used during estimation and so you can afterwards easily

        Code:
        replace x3 = . if (missing_x3 == 1)
        Important edit:

        On second thought, I am not quite sure whether the reported standard errors you obtain from margins will be correct. Probably the uncertainty in the estimation that stems from the missing values in the original data should somehow be reflected. This would likely lead to standard errors that are too small.

        Best
        Daniel
        Last edited by daniel klein; 19 Sep 2017, 09:58. Reason: Concerns about standard error estimation

        Comment


        • #5
          I had a similar concern about the SEs, so I set up a quick simulation where I first generated the data and fit the model/obtained predictive margins, then injected missingness (where the probability of missingness was proportionate to y, thus MAR), and then refit the model using sem ..., method(mlmv). The SEs on the second set of margins were larger and roughly comparable to the SEs I get using mimrgns after multiply imputing.

          Comment


          • #6

            I am hoping I am making a a basic mistake in specifying margins
            after a multi equation sem using mlmv as in the following:
            (based on: Williams, Richard, Paul D Allison and Enrique Moral-Benito. 2018. "Linear Dynamic Panel-Data Estimation Using Maximum Likelihood and Structural Equation Modeling." The Stata Journal 18(2):293-326.)

            Here is a model

            . #delimit ;
            delimiter now ;
            . sem (cESDW2 <- cESDW1@b1 b_cESDW1@b2 smokeNowW2@b3
            > smokeNowW1@b4 b_smokeNowW1@b5 smoke_cESDW1@b6
            > b_smoke_cESDW1@b7 black@b8 Alpha@1 E2@1 ) (cESDW3 <-
            > cESDW2@b1 b_cESDW2@b2 smokeNowW3@b3 smokeNowW2@b4
            > b_smokeNowW2@b5 smoke_cESDW2@b6 b_smoke_cESDW2@b7 black@b8
            > Alpha@1 E3@1 ) (cESDW4 <- cESDW3@b1 b_cESDW3@b2
            > smokeNowW4@b3 smokeNowW3@b4 b_smokeNowW3@b5
            > smoke_cESDW3@b6 b_smoke_cESDW3@b7 black@b8 Alpha@1 E4@1 )
            > (cESDW5 <- cESDW4@b1 b_cESDW4@b2 smokeNowW5@b3
            > smokeNowW4@b4 b_smokeNowW4@b5 smoke_cESDW4@b6
            > b_smoke_cESDW4@b7 black@b8 Alpha@1 ), var(e.cESDW2@0
            > e.cESDW3@0 e.cESDW4@0) var(Alpha) cov(Alpha*(black)@0
            > Alpha*(E2 E3 E4)@0 _OEx*(E2 E3 E4)@0 E2*(E3 E4)@0
            > E3*(E4)@0 smokeNowW3*(E2) smokeNowW4*(E2 E3)
            > smokeNowW3*(E2) b_smokeNowW3*(E2) smoke_cESDW3*(E2)
            > b_smoke_cESDW3*(E2) smokeNowW5*(E2 E3 E4) smokeNowW4*(E2
            > E3) b_smokeNowW4*(E2 E3) smoke_cESDW4*(E2 E3)
            > b_smoke_cESDW4*(E2 E3)) difficult iterate(750)
            > technique(nr 25 bhhh 25) noxconditional method(mlmv) vce(robust)
            >
            > ;

            I am able to get estimates.
            However,
            this is what I get when I try various synx for margins:


            . margins black
            factor 'black' not found in list of covariates
            r(322);


            . display e(oxvars)
            cESDW1 b_cESDW1 smokeNowW2 smokeNowW1 b_smokeNowW1 smoke_cESDW1 b_smoke_cESDW1 black b_cESDW2 sm
            > okeNowW3 b_smokeNowW2 smoke_cESDW2 b_smoke_cESDW2 b_cESDW3 smokeNowW4 b_smokeNowW3 smoke_cESDW
            > 3 b_smoke_cESDW3 b_cESDW4 smokeNowW5 b_smokeNowW4 smoke_cESDW4 b_smoke_cESDW4


            and the following suggests that perhaps
            I should specify


            . margins [cESDW2:black]
            weights not allowed
            r(101);



            | Coef. Legend
            ------------------------------+----------------------------------------------------------------
            Structural |
            cESDW2 |
            cESDW1 | .4948291 _b[cESDW2:cESDW1]
            b_cESDW1 | -.7296562 _b[cESDW2:b_cESDW1]
            smokeNowW2 | .044272 _b[cESDW2:smokeNowW2]
            smokeNowW1 | .2934091 _b[cESDW2:smokeNowW1]
            b_smokeNowW1 | -.1923866 _b[cESDW2:b_smokeNowW1]
            smoke_cESDW1 | -.767546 _b[cESDW2:smoke_cESDW1]
            b_smoke_cESDW1 | .7351711 _b[cESDW2:b_smoke_cESDW1]
            black | .4025573 _b[cESDW2:black]
            Alpha | 1 _b[cESDW2:Alpha]
            E2 | 1 _b[cESDW2:E2]
            _cons | .1765367 _b[cESDW2:_cons]
            ----------------------------+----------------------------------------------------------------
            cESDW3 |
            cESDW2 | .4948291 _b[cESDW3:cESDW2]
            smokeNowW2 | .2934091 _b[cESDW3:smokeNowW2]
            black | .4025573 _b[cESDW3:black]
            b_cESDW2 | -.7296562 _b[cESDW3:b_cESDW2]
            smokeNowW3 | .044272 _b[cESDW3:smokeNowW3]
            b_smokeNowW2 | -.1923866 _b[cESDW3:b_smokeNowW2]
            smoke_cESDW2 | -.767546 _b[cESDW3:smoke_cESDW2]
            b_smoke_cESDW2 | .7351711 _b[cESDW3:b_smoke_cESDW2]
            Alpha | 1 _b[cESDW3:Alpha]
            E3 | 1 _b[cESDW3:E3]
            _cons | .1377348 _b[cESDW3:_cons]
            <CUT>


            also
            . margins _b[cESDW2:black]
            variable _b not found
            r(111);


            Any advice appreciated, thanks Bill

            Comment


            • #7
              I know the last specification doesn't make sense --- I meant to say I also tried cESDW2:black with no [ ]

              Comment


              • #8
                You have not specified black as a factor variable (e.g. i.black) nor could you since sem does not allow factor variables. Further only factor variables are allowed to the left of the comma with the margins command. You would have to do something like

                margins, at(black=(0,1))

                i haven't tried margins after multi-equation sem so there may be other issues.
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 19.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #9
                  Thanks (again) Richard, I did try that, and let it spin for an hour (model estimate in 10 minutes), before hitting break.

                  I also tried graphic predictions from the model (towny lfit). The problem there is that missing-ness of x variables over waves results in different graphs for each wave, even though the model estimates are constant across waves (except for intercept), resulting in different graphs when they should vary only by intercepts.

                  You have been a great help, answering my questions in a number of threads (and other fora) thanks again. I am inching towards buying Mplus for these analyses, if I can find the funds...

                  Comment


                  • #10
                    I don't know if mplus can do what you want either. There is a very limited demo version you can try if you want.
                    -------------------------------------------
                    Richard Williams, Notre Dame Dept of Sociology
                    StataNow Version: 19.5 MP (2 processor)

                    EMAIL: [email protected]
                    WWW: https://www3.nd.edu/~rwilliam

                    Comment

                    Working...
                    X