Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing Standard Errors in XTREG, mle

    Hi Stata users,

    I'm trying to run a model with School Mean Test Scores as the outcome variable, and school characteristics as explanatory variable. The school ID is repeated over 4 waves.

    Therefore I'm using:

    xtreg depvar [indepvars] [if] [in] [weight] , mle [MLE_options]

    My PROBLEM: some variables appear in the output with missing standard errors (as you can see in the output attached to this message).

    Here is my data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(sc_mean_tsmat_mz year_pb_or) byte(urban federal state south midwest northeast north) float(sc_pmale sc_pwhi sc_mean_pcahamofae sc_ptuni sc_ptpostedu sc_bfloor75 sc_afloor75 sc_insufund sc_shortadmstaf sc_lackpedaresour sc_teahighabs)
      .10982365 0 1 0 1 0 0 0 1 47 34   2.405395        1  30 1 1 1 3 1 1
      .10982365 2 1 0 1 0 0 0 1 47 34   2.405395        1  30 1 1 2 3 2 1
      -.2093975 0 1 0 1 0 0 0 1 42 26  1.1467584 .7721239  76 1 0 1 2 1 3
      -.2093975 1 1 0 1 0 0 0 1 42 26  1.1467584 .7721239  76 1 0 1 2 1 2
      -.2093975 2 1 0 1 0 0 0 1 42 26  1.1467584 .7721239  76 1 0 1 3 1 2
      -.2093975 3 1 0 1 0 0 0 1 42 26  1.1467584 .7721239  76 1 0 1 3 1 2
     -.27584708 2 0 0 0 0 0 0 1 39 52 -.09321615        1 100 1 0 1 3 1 2
      -.1081955 0 1 0 1 0 0 0 1 36 28   .8743236 .9893617  25 1 1 1 3 1 1
      -.1081955 1 1 0 1 0 0 0 1 36 28   .8743236 .9893617  25 1 1 2 2 1 2
      -.1081955 2 1 0 1 0 0 0 1 36 28   .8743236 .9893617  25 1 1 1 1 1 1
      -.1081955 3 1 0 1 0 0 0 1 36 28   .8743236 .9893617  25 1 1 1 1 1 1
      -.3622141 0 1 0 1 0 0 0 1 44 22   .4408509 .9886792  99 1 0 2 3 1 1
      -.3622141 1 1 0 1 0 0 0 1 44 22   .4408509 .9886792  99 1 0 2 1 1 1
      -.3622141 2 1 0 1 0 0 0 1 44 22   .4408509 .9886792  99 1 0 2 3 1 1
      -.3622141 3 1 0 1 0 0 0 1 44 22   .4408509 .9886792  99 1 0 1 1 1 2
      -.4329991 1 0 0 1 0 0 0 1 38 25   .1797918 .7068965  69 1 1 2 2 2 2
    end
    Stata command I used:

    xtreg sc_mean_tsmat_mz year_pb_or urban federal state south midwest northeast north sc_pmale sc_pwhi sc_mean_pcahamofae sc_ptuni sc_ptpostedu sc_bfloor75 sc_afloor75 sc_insufund sc_shortadmstaf sc_la ckpedaresour sc_teahighabs, i(id_sc) mle

    The Stata output with missing Standar Errors attached.

    Any help in that would be highly appreciated!
    Attached Files

  • #2
    I think we need more information here. You show the output table of coefficients, but there is a header above that which shows things like number of observations, number of clusters, and other statistics that are descriptive of the regression as a whole. Sometimes there are important clues there. It would also be helpful to know what the distribution of sc_teahighabs looks like, and, if it is a discrete variable, how the distribution of sc_mean_tsmat_mz looks in each of its levels.

    Also, your posted data are not compatible with the command you say you used, because there is no variable id_sc.

    Comment


    • #3
      Dear Clyde,

      Please see attached the header of the output table of coefficients.

      sc_teahighabs is an ordinal variable with values: 1 2 3

      The outcome sc_mean_tsmat_mz is a continuous variable:

      Variable Obs Mean Std. Dev. Min Max
      -------------+----------------------------------------------------------------------------------
      sc_mean_ts~z | 114976 -.1653762 .3425609 -2.009945 2.116442

      Here the distribution of the outcome sc_mean_tsmat_mz in each of the levels of the sc_teahighabs

      tab sc_teahighabs if sc_mean_tsmat_mz

      sc_teahigha |
      bs | Freq. Percent Cum.
      ------------+-----------------------------------
      1 | 59,378 54.17 54.17
      2 | 35,924 32.77 86.94
      3 | 14,322 13.06 100.00
      ------------+-----------------------------------
      Total | 109,624 100.00


      I'm just wondering if there is a problem in the fact that in this model the cluster is school id_sc and all outcome and explanatory variables are all at school level as well?

      Sorry, here is my data in the dataex format with the ID variable id_sc included:

      [CODE]
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long id_sc float(sc_mean_tsmat_mz year_pb_or) byte(urban federal state south midwest northeast north) float(sc_pmale sc_pwhi sc_mean_pcahamofae sc_ptuni sc_bfloor75 sc_afloor75 sc_insufund sc_shortadmstaf sc_lackpedaresour sc_teahighabs)
      11000260 .10982365 0 1 0 1 0 0 0 1 47 34 2.405395 1 1 1 1 3 1 1
      11000260 .10982365 2 1 0 1 0 0 0 1 47 34 2.405395 1 1 1 2 3 2 1
      11000317 -.2093975 0 1 0 1 0 0 0 1 42 26 1.1467584 .7721239 1 0 1 2 1 3
      11000317 -.2093975 1 1 0 1 0 0 0 1 42 26 1.1467584 .7721239 1 0 1 2 1 2
      11000317 -.2093975 2 1 0 1 0 0 0 1 42 26 1.1467584 .7721239 1 0 1 3 1 2
      11000317 -.2093975 3 1 0 1 0 0 0 1 42 26 1.1467584 .7721239 1 0 1 3 1 2
      11000368 -.27584708 2 0 0 0 0 0 0 1 39 52 -.09321615 1 1 0 1 3 1 2
      11000376 -.1081955 0 1 0 1 0 0 0 1 36 28 .8743236 .9893617 1 1 1 3 1 1
      11000376 -.1081955 1 1 0 1 0 0 0 1 36 28 .8743236 .9893617 1 1 2 2 1 2
      11000376 -.1081955 2 1 0 1 0 0 0 1 36 28 .8743236 .9893617 1 1 1 1 1 1
      11000376 -.1081955 3 1 0 1 0 0 0 1 36 28 .8743236 .9893617 1 1 1 1 1 1
      11000384 -.3622141 0 1 0 1 0 0 0 1 44 22 .4408509 .9886792 1 0 2 3 1 1
      11000384 -.3622141 1 1 0 1 0 0 0 1 44 22 .4408509 .9886792 1 0 2 1 1 1
      11000384 -.3622141 2 1 0 1 0 0 0 1 44 22 .4408509 .9886792 1 0 2 3 1 1
      11000384 -.3622141 3 1 0 1 0 0 0 1 44 22 .4408509 .9886792 1 0 1 1 1 2
      11000457 -.4329991 1 0 0 1 0 0 0 1 38 25 .1797918 .7068965 1 1 2 2 2 2
      Attached Files

      Comment


      • #4
        Well, if it were really true that all of your model variables were constant within id_sc it would raise questions about the model, but I don't think it would result in a missing standard error for any one particular variable. But it isn't even true: year_pb_or, sc_insufund, sc_shortadmstaf, sc_lackpedaresour, and sc_teahighabs all show some amount of within-school variation in your example data.

        The header information doesn't betray any problems: the convergence went smoothly. Your sample size is adequate, as is the number of groups. The overall chi2 statistic was calculated without difficulty. So there is something about the distribution of sc_teahighabs that is problematic here. Perhaps one of the levels occurs in only singleton observations. What do you get if you run this:
        Code:
        by id_sc, sort: gen n_obs = _N
        tab sc_teahighabs if n_obs == 1, miss

        Comment


        • #5
          Thanks for your feedback.

          If I run the code you have suggested, I get this:

          by id_sc, sort: gen n_obs = _N

          . tab sc_teahighabs if n_obs == 1, miss

          sc_teahigha |
          bs | Freq. Percent Cum.
          ------------+-----------------------------------
          1 | 3,901 64.66 64.66
          2 | 1,470 24.37 89.03
          3 | 357 5.92 94.94
          . | 305 5.06 100.00
          ------------+-----------------------------------
          Total | 6,033 100.00

          Comment


          • #6
            Maybe the fact that the outcome variable sc_mean_tsmat_mz has no within variation is a problem for this model. Please check xtsum sc_mean_tsmat_mz attached to this message.
            Attached Files

            Comment


            • #7
              That shouldn't actually be a problem. And even if it were, I wouldn't expect it to show up as a missing standard deviation for one particular predictor. I would consider that an explanation if the standard errors of the sigma_e random effect and rho were missing. But not sc_teahighabs.

              I have to say I'm at a loss here. A couple of thoughts, which may or may not work out:

              1. Is there an approximate colinearity relationship between sc_teahighabs and the other regressors? An exact collinearity would cause Stata to omit something (and warn you about it)--but a "very near miss" could produce a result such as you are getting. In that case you might consider dropping sc_teahighabs from the model, or dropping one or more of the variables with which it is nearly collinear. (I think this is unlikely, because in this situation typically one or more of the other variables participating in the collinearity would also have a missing standard error or show a very large standard error, and that is not the case in your output.)

              2. Since your model converges after 4 iterations, you might consider rerunning it with the -iterate()- option successively specified for 1, 2, or 3 iterations. You might see that before the end there is some estimate for the standard error, and it is perhaps on a trajectory heading towards infinity or something like that.

              3. I would try rerunning the model excluding sc_teahighabs (but restricted to e(sample) from your original model) and do a likelihood ratio test. Given your enormous sample size, if that likelihood ratio test did not reject the hypothesis that the models are the same, I would feel comfortable with permanently dropping sc_teahighabs from the model and moving on from there.

              4. Since sc_teahighabs is an ordinal variable with 3 levels, try re-running the model with i.sc_teahighabs specified instead. That eliminates the ordinality of the variable, but also if there is something odd happening with one particular level, it might show up that way.

              By the way, in dismissing your concerns about the non-variation of so many of your variables, including the outcome variable, within id_sc, let me be clear that I'm only saying that I don't think it should cause the particular problem you're getting. It does make me question whether your model is sensible, but that is a separate issue. Sensible or not, I still think its parameters should be estimable, and that what you are encountering reflects some other peculiarity of the distributions of the variables in the data.

              Comment


              • #8
                Thank you for your time. I will go through the points you've written and take the issues to my supervisor.

                When I re-run my model without sc_teahighabs, another variable appears in the outcome without the standards error, and so on. So have to make some more tests as it does not look there is a straightforward solution for that.
                Attached Files

                Comment

                Working...
                X