Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multi level xtmelogit command

    Hi,

    I am implementing a multi level model in Stata. I have some questions regarding whether the STATA command is right.

    I am analysing the use of school discipline and the dependent variable (W1ExcludeYP) is binary, i.e. whether a student received a discipline or not. I want to see how individual (truancy, substance, use, delinquency, poverty, etc.) as well as school level variables (private school vs. public school) and region-level variables (region deprivation) affect the dependent variable, including a random intercept for schools and regions.

    Should the command be:

    xtmelogit W1ExcludeYP i.W1ethgrpYP i.W1truantYP substance_use delinquency i.in_poverty || school: IndSchool || region: IDACIRSCORE

    or

    xtmelogit W1ExcludeYP i.W1ethgrpYP i.W1truantYP substance_use delinquency i.in_poverty IndSchool IDACIRSCORE || school: || region:

    Also, is there way to speed up getting outputs in Stata when getting results from multilevel regressions ?

  • #2
    The short answer to your question is that for the model you have described, your second command is the better implementation.

    At greater length, it is important to understand that in Stata syntax for multilevel models, the level at which a variable is defined (school vs region vs student) has nothing at all to do with where it should appear in the command. Variables are mentioned in the higher levels of the model only if you wish to include a random slope for that variable. As you describe your model as having random intercepts, not random slopes, no variables should appear in the higher levels of the model.

    As an aside, unless you are using a very old version of Stata, -xtmelogit- has been renamed -meqrlogit-. While the older name still works, in the future it may not, so you would be better off using the modern name.

    As for speeding things up, there may be a potential for a great speedup in your case. It looks like many of your independent variables are discrete, as you have prefixed them with i. The ones you have not prefixed with i. may also be discrete: their names, substance_use, delinquency, IndSchool and IDCIRSCORE sound like they may also be discrete. If so, and if your data set is large, you can -collapse- your data set and then run -meqrlogit- on that with the -binomial()- option. Like this:
    Code:
    collapse (count) freq = W1ExcludeYP (sum) dv = W1ExcludeYP, by(W1ethgrpYP W1truantYP ///
        substance_use delinquency in_poverty IndSchool IDACIRSCORE school region)
    
    meqrlogit dv i.(W1ethgrpYP W1truantYP substance_use delinquency in_poverty IndSchool IDACIRSCORE) ///
        || region: || school:, binomial(freq)
    Notes:
    I have changed the order in which region and school appear in the equation compared to what you wrote. I'm doing this because in most definitions of region I have encountered we would have schools nested in regions, not the other way around. Your equation models regions nested in schools--which would be an unusual situation. If you really have regions nested in schools then stay with the order you used. By the way, it's important to note that if you use the wrong order for your actual situation you will, at best, get incorrect results. In addition, the model may be very slow to converge, or fail to converge altogether.

    If any of these variables is continuous you can't do this, and if your data set is small, it wouldn't produce a noticeable speedup.

    Last edited by Clyde Schechter; 04 Nov 2023, 10:13.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      The short answer to your question is that for the model you have described, your second command is the better implementation.

      At greater length, it is important to understand that in Stata syntax for multilevel models, the level at which a variable is defined (school vs region vs student) has nothing at all to do with where it should appear in the command. Variables are mentioned in the higher levels of the model only if you wish to include a random slope for that variable. As you describe your model as having random intercepts, not random slopes, no variables should appear in the higher levels of the model.

      As an aside, unless you are using a very old version of Stata, -xtmelogit- has been renamed -meqrlogit-. While the older name still works, in the future it may not, so you would be better off using the modern name.

      As for speeding things up, there may be a potential for a great speedup in your case. It looks like many of your independent variables are discrete, as you have prefixed them with i. The ones you have not prefixed with i. may also be discrete: their names, substance_use, delinquency, IndSchool and IDCIRSCORE sound like they may also be discrete. If so, and if your data set is large, you can -collapse- your data set and then run -meqrlogit- on that with the -binomial()- option. Like this:
      Code:
      collapse (count) freq = W1ExcludeYP (sum) dv = W1ExcludeYP, by(W1ethgrpYP W1truantYP ///
      substance_use delinquency in_poverty IndSchool IDACIRSCORE school region)
      
      meqrlogit dv i.(W1ethgrpYP W1truantYP substance_use delinquency in_poverty IndSchool IDACIRSCORE) ///
      || region: || school:, binomial(freq)
      Notes:
      I have changed the order in which region and school appear in the equation compared to what you wrote. I'm doing this because in most definitions of region I have encountered we would have schools nested in regions, not the other way around. Your equation models regions nested in schools--which would be an unusual situation. If you really have regions nested in schools then stay with the order you used. By the way, it's important to note that if you use the wrong order for your actual situation you will, at best, get incorrect results. In addition, the model may be very slow to converge, or fail to converge altogether.

      If any of these variables is continuous you can't do this, and if your data set is small, it wouldn't produce a noticeable speedup.
      Hi Clyde, massive thanks for your detailed explanation! Very informative and useful - as always .

      I have several variables such as school sector (binary variable), urbanicity (categorical variable) and district socio-economic level (continuous variable). Does it make sense, in your opinion, to create a random intercept for both schools and regions? Or having one intercept for school will suffice?

      Comment


      • #4
        I have no special knowledge of education, and especially not about how schools might vary in their disciplinary approaches. As an informed layman, though, my intuition is that there would be important differences at both the school and region level, so I would be inclined to include both. If the results of that analysis suggest that one of the levels contributes negligibly to outcome variation, then for the sake of simplicity, I would drop that level. But absent that kind of evidence to the contrary, I would expect both levels to be important and I would include both.

        Comment

        Working...
        X