No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Random Effect Ordered Logit Bayesian


    I am trying to estimate the group residuals (RE) of the following simple model:

    HTML Code:
    Y_{ij} = f( X_{ij}b + a_{j} + e_{ij})
    Where the Y is a categorical ordered variable.

    May idea is running a ordered logit re regression, estimating the predicted posterior means (a_{j}) and then use simulations to get
    HTML Code:
    E[a_{j} | Y_{ij}, X_{ij}b].
    Any idea on how to do that?


  • #2
    In the conventional iterative maximum likelihood implementation for a hierarchical ordered-logistic regression model, isn't the random effect taken to be distributed as N(0, σu)? So, wouldn't its expectation be zero?

    Are you trying to do some kind of predictive check on this model assumption? Why not just take the average of the empirical Bayes estimates?
    bysort A: generate byte first = _n == 1
    meologit Y i.x || A:
    predict double A_u, reffects
    summarize A_u if first, meanonly
    generate double A_res = A_u - r(mean)
    histogram A_res if first


    • #3
      Hi Joseph,

      Thanks a lot for your suggestion. I agree with you that the random effects are distributed with mean 0. But I am interested in the conditional mean, which I guess I can obtain through Bayes rule and simulations. It is just that I am not sure that this is what I obtain with the command predict ., re since I get different random effects for two individuals who had identical observations. Do you know why?



      • #4
        Originally posted by Luis Arellan View Post
        I get different random effects for two individuals who had identical observations. Do you know why?
        Without your dataset and code, I can only guess. Off the top of my head, the possibilities are:

        1. there is something about your dataset that you don't know about and about which you assume incorrectly otherwise

        2. an error somewhere in your code

        3. the production of the empirical Bayes estimates didn't converge, but (i) there probably would be a warning message and (ii) this would imply #1, #2 or both of above


























        • #5
          Hi Joseph,

          Thank you a lot for your explanation. Actually I got were my mistake was coming from: I was using a subsample to run the meologit (yes, I am running an ordered logit regression on a 3 categories dependent variable), and the whole sample for the prediction. Now I fixed it

          This in your example would be :

          quietly melogit pos i.tim if tim<4 || pid: , nolrtest nolog
          I have another question. I have two subsamples (sample1 and sample2) and I do not observe the dependent variable Y in sample 1. For this reason what I am going to do is running my mlogit in sample1, computing random effects using Y and X in sample 1 and then impute them in sample 2 through the individual ID. Here the code

          quietly predict double re if sample1==1, reffects ebmeans
          egen re2 = mean(re), by(id)
          Do you have in mind a better way to do that? Would it makes sense to predict the RE by using only the X and not also the Y, if possible?



          • #6
            I'm not really sure of what you're trying to do, but predict . . ., reffects will do out-of-sample predictions, and so you don't need to add the if sample ==1 and the egen patch-up afterward.

            As to whether it makes sense, you can do a quick exploration of how and by how much the empirical Bayes estimates differ between full-sample and partial-sample conditions. And you can explore whether model fit (poor fitting model versus well fitting model) makes a difference.

            I show some code below that generates fictional data with known parameters, fits a hierarchical ordered-logistic regression model to only the time points earlier than 4 (as you indicate), predicts empirical Bayes estimates, and then fits the model to the complete dataset (all time points), predicts empirical Bayes estimates, and then graphically compares the two.

            It repeats this exercise when the regression model includes a poorly explanatory predictor.

            You can do the same with fictional datasets that mimic your dataset's peculiarities and determine for yourself whether what you propose seems acceptable for your intended purpose.
            version 15.1
            clear *
            set seed `=strreverse("1483381")'
            program define simem
                version 15.1
                syntax , Fit(string)
                drop _all
                quietly set obs 250
                generate int pid = _n
                generate double pid_u = rnormal()
                quietly expand 5
                bysort pid: generate byte tim = _n
                if "`fit'" == "good" generate double xbu = (tim - 3) / 5 + pid_u
                else generate double xbu = pid_u
                grologit xbu, generate(sco) cuts(`=logit(`=1/3')' `=logit(`=2/3')')
                quietly generate byte sco_sample1 = sco if tim < 4
                quietly meologit sco_sample1 i.tim || pid: , nolrtest nolog
                quietly predict double ebe_sample1, reffects
                quietly meologit sco i.tim || pid: , nolrtest nolog
                quietly predict double ebe, reffects
                label variable ebe_sample1 "EBEs with incomplete data"
                label variable ebe "EBEs with complete data"
                local title = strproper("`fit'") + " Fit"
                graph twoway ///
                    scatter ebe_sample1 ebe if tim == 4, mcolor(black) msize(vsmall) || ///
                    line ebe ebe if tim == 4, sort lcolor(black) lpattern(dash) ///
                        title("`title'", position(12) ring(0)) ///
                        ytitle(`: variable label ebe_sample1') ///
                            ylabel( , angle(horizontal) nogrid) ///
            // Good fit
            simem , fit(good)
            pause on
            // Poor fit
            simem , fit(poor)
            The command grologit that is called above is an ado-file (attached) that generates ordered-categorical logistic / probit data that conform to proportional-odds / parallel-lines assumptions.
            Attached Files


            • #7
              Hi Joseph,

              Thank you for your answer and the example.

              I agree with you that if I have missing values for the y variable outside the sample, the prediction does the same job as predicting and then imputing.

              Your example looks good, I might divide the sample1 into different parts and test my prediction within it, since I do not observe the y in sample2. Even though, I am not sure that this will insure that my prediction will be good also for the sample2.

              Assuming I will be able to have a good predictor, I am interested in categorize the random effect in three categories, like the dependent variable. I would like to know if there is a standard way to do that.



              • #8
                Originally posted by Luis Arellan View Post
                I am interested in categorize the random effect in three categories, like the dependent variable. I would like to know if there is a standard way to do that.
                I don't know. It sounds as if you're trying to reify the random effect. Maybe there is some merit in the exercise, but I get the impression that others going down that path have run into a lot of grief when they try to make too much of empirical Bayes estimators.


                • #9

                  You are right, but if the assumptions are satisfied it should work. I want to compute the unexplained part of the dependent variable that is attributable to the individuals and use it to predict what individuals would do outside the sample, where I do not observe the dependent.

                  Going back to your testing example: are you aware of any algorithm that splits the sample, makes prediction, tests and does this repeatedly until getting the best estimates?

                  Last edited by Luis Arellan; 14 Feb 2019, 10:20.