Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • lose statistical significance after running crossed random effects model

    Hi everyone,

    I am running a 2 level crossed random-effects logistic regression model in Stata (crossed random effect for variables id and variable dataset), which I think is the right type of model given the type of data that I have. However, I originally ran a logistic regression random-intercept model (clustering of level 1 units within id variable) and found almost all the variables were statistically significant, but after running the 2 level crossed random-effects model none of the variables are statistically significant, however the odds ratios between the two types of models are similar and reasonably large in some cases (e.g. OR=9 or OR=6). Below I provide some more information about my data:
    • Distribution of dependent variable: takes 1 in 102 cases, 0 in 6146 cases.
      • # of level 1 units - n= 6,248
      • # of level 2 units - id variable - first crossed random effect -> n= 3,834
      • # of level 2 units - dataset variable - second crossed random effect ->n=16 groups (note however, that when it comes to the outcome variable, several of these groups only have 1 observation where the dependent variable =1)
    Does anyone have a potential explanation as to why I lose all statistical significance when I run the 2 level crossed random-effects model? I am wondering if, for example, the fact that the second crossed random effect group only has 16 groups (and some of the groups only had 1 observation where the dependent variable was 1) could make this analysis approach problematic.

    I am trying to make a decision about which results to present. Any advice would be much appreciated.

    Many thanks,
    Caroline

  • #2
    Well, these are two different designs, and the decision which set of results to present depends on the one thing you haven't explained: the actual co-occurrence relationships among the level 1 units and the level 2 units. It means nothing that variables change significance from one analysis to the other: (at least) one of these analyses is simply a misrepresentation of reality and should be discarded.

    So the question is whether the level 1 units are nested in the level 2 units or crossed with them. To say that they are crossed means that each level 1 unit can occur in combination with each level 2 unit (in principle, in practice not every pairing may actually be present in the data set). To say that they are nested means that each level 1 unit can only occur in a single level 2 unit.

    One way to think about this is that in the nested analysis, the level 1 variable represents a within ID factor, whereas in the crossed analysis, the level 1 variable is a factor that is orthogonal to the ID variable. So it's like the difference between a paired t-test and an unpaired t-test: one of them is always wrong for the data. (And sometimes they both are, but that's a different issue.)

    Comment


    • #3
      Thank you very much for your help.
      I am working with data on job applications. My dependent variable is whether or not the application led to a hire. Some applicants submitted multiple job applications to multiple announcements. I was thinking that I need to take into account which announcement the application went to since some announcements resulted in more hires than others (e.g. a few announcements only had 1 hire, while others had 15).

      Therefore, the dataset seems to have the structure of applications nested in applicants, and also applications nested in announcements. A given applicant can apply to more than one announcement. So I was thinking a cross classified model might be appropriate, but this doesn’t seem to exactly correspond with what you are saying about level 1 units nested in level 2 units.

      Comment


      • #4
        So it sounds like you have applications nested in applicants. As for applicants and announcements, since a given applicant could apply to several announcements, it is definitely not nesting. It isn't full cross-classification either. This is called a multiple-membership structure. For the purposes of representing it in your mixed-effects regression command, you should treat it as cross-classified.

        So, it seems the answer to your original question is that the cross-classified version of the model is correct and you should disregard the spurious results from the nested version.

        Comment


        • #5
          Thank you so much, this is extremely helpful!!

          Comment


          • #6
            I am sorry, I have another question. I decided it made more sense to split the sample into two groups. After running the cross-classified version of the model on the first group, the random effects for the announcement variable (which I called dataset2 in my first post) come back as 0. For the second group, the # of clusters for the announcement variable is 7, and I wonder if this is too few clusters to run the cross-classified of the model.

            However, I have a couple of announcement-level variables that I would like to include in the model.

            Given these factors, is it still advised that I continue with the cross-classified version of the model? thank you very much for any input.

            Comment


            • #7
              If you only have 7 clusters for the announcement variable, you are certainly pushing your luck estimating variance components at that level. You really don't have much of a sample of that level variation to go on. That's true regardless of whether the random effects are nested, multiple-membership, or crossed.

              It might make more sense in this situation to turn announcement into just a fixed effect in the model and eliminate its level entirely. Announcement-level covariates can still be included in the model.
              Last edited by Clyde Schechter; 22 Apr 2016, 15:04. Reason: Correct typo.

              Comment


              • #8
                Wonderful, thank you for your invaluable advice!!

                Comment


                • #9
                  I'm sorry, I thought of a few more questions about this:

                  1. I realized that splitting my sample into two has led to a small number of values in the ‘1’ category of the dependent variable. More specifically, in the first model, 62 of 5,807 (1%) of the observations had a value of 1; the rest had a value of 0. For the second model, 41 of 2,347 observations (1.7%) had the value of 1; the rest had the value of 0. Can I run a standard multilevel model with such low numbers in the '1' category? It seems like a rare events issue. Note, I created a thread today asking about this (http://www.statalist.org/forums/foru...nd-rare-events), but decided to repost the question here since this thread contained information about the data.

                  2. As mentioned above, applications are clustered within applicants (it is common for a single applicant to put an application forward to more than one announcement). However, only ONE of the applicant's applications can lead to an actual hire, which means that, if the applicant got hired, the values of the dependent variable for that applicant's OTHER applications must be 0. Does this create a problem with my analysis?

                  I wonder if it would be easier to do this analysis at the person level rather than the application level, but that would not seem to account for the fact that some announcements hire more people than others, so an applicant's likelihood of getting hired is influenced by which announcement(s) they apply to.

                  Any advice?
                  Last edited by Caroline Wilson; 07 Oct 2016, 18:46.

                  Comment


                  • #10
                    Can I run a standard multilevel model with such low numbers in the '1' category? It seems like a rare events issue. Note, I created a thread today asking about this (http://www.statalist.org/forums/foru...nd-rare-events), but decided to repost the question here since this thread contained information about the data.
                    I would say it's iffy. Try it and see what happens. You may well run into convergence issues. And even if you don't, there is a decent chance that some of your effect estimates will be of absurdly large magnitude. But you might get lucky. There is no bright line that defines troublesome rare events. I'm not aware of any other logistic regression commands in Stata that will handle rare events well and also accommodates multi-level modeling.

                    If you can't get convergence (and don't forget to try -meqrlogit- if -melogit- fails), or get obviously absurd results, you might consider switching from a logistic model to a Poisson model (-mepoisson-). With rare events, this often works well, and the assumptions of the Poisson model are reasonably well satisfied when the events rate is very low but the total N is large, as in your case.

                    As mentioned above, applications are clustered within applicants (it is common for a single applicant to put an application forward to more than one announcement). However, only ONE of the applicant's applications can lead to an actual hire, which means that, if the applicant got hired, the values of the dependent variable for that applicant's OTHER applications must be 0. Does this create a problem with my analysis?
                    No, not at all. In fact this is a very common situation when within-entity logistic regressions are carried out.

                    I wonder if it would be easier to do this analysis at the person level rather than the application level, but that would not seem to account for the fact that some announcements hire more people than others, so an applicant's likelihood of getting hired is influenced by which announcement(s) they apply to.
                    It seems to me that your data are inherently multi-level, and I would be extremely reluctant to not model it as such. If you succeed in fitting a multi-level model and the results show you that there is no material amount of variance at certain levels, then, sure you could simplify the model and exclude those levels. But I wouldn't start out with that approach.

                    Comment


                    • #11
                      Thank you! I will check the results again, but both models did converge & none of the ORs looked absurdly large. Some variables which I expected to be statistically significant based on the descriptive statistics were not, but other variables were statistically significant.

                      Thanks again for this invaluable advice...It's wonderful that this forum has such experts!

                      Comment

                      Working...
                      X