Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Out-of-Sample Prediction using Multinomial Logit Model

    Hi I have a question about the out-of-sample prediction using multinomial logit model.

    <Situation>
    I have a cross-section data set including eleven countries.
    Let the dependent variable be y (a nominal variable having three vaelus, 1, 2, and 3), and suppose that we only have two regressors, x1 and x2.

    <What I want to show>
    The response pattern of a choice probability of the alternative one corresponding to the changes in x1 is different in the 11th country

    <What I have done>
    First, I ran the multinomial logit regression using only the first ten countries like below (For simplicity, I do not write the details of options)
    Code:
    mlogit y x1 x2 if 1 <= country_id & country_id =< 10
    Second, I predict the choice probability of the alternative one in the 11th country using the previous estimate for each value of x1 (from 0 to 5)
    Code:
    keep if country_id == 11
    margins, at(x1=(0(1)5)) predict(outcome(1))
    Third, I kept the sample means of a dummy of the alternative one
    Code:
    gen y1 = (y == 1)
    summarize y1 if x1 == 0
    summarize y1 if x1 == 1
    .
    .
    summarize y1 if x1 == 5
    Finally, I compared the predicted choice probability of alternative one to the sample means at each value of x1

    The results show that the predicted choice probabilities are pretty different from the sample means at each x1 value.
    Here, can I conclude that the response of choice probability are different in the 11th country? (Since we cannot well predict the choice probability in 11th country by using the other countries)

    For simplicity, my explanation could be too simple.
    I am sorry if you cannot understand what I'm saying.

    Thank you for your time spending to read this question.
    Last edited by Minchul Park; 20 Nov 2020, 00:23.

  • #2
    Originally posted by Minchul Park View Post
    can I conclude that the response of choice probability are different in the 11th country? (Since we cannot well predict the choice probability in 11th country by using the other countries)
    Either that or that the eleventh country's choice probabilities don't actually differ, but rather your model doesn't include one or more important explanatory variables that if included would have enabled the fitted model to predict the eleventh country's choices well.

    Comment


    • #3
      Joseph Coveney Thank you for your response! the answer is pretty informative. Thank you again.

      Comment


      • #4
        I have a question for you.

        What you went through above seems like a lot of work to show that the regression coefficients for the two predictors for all of the other ten countries don't predict the choice profile for the eleventh.

        My question: could you accomplish the same objective (or an equivalent objective) in the following manner?
        Code:
        mlogit y c.x2 ib11.country_id##i.x1
        and scanning down the list of coefficients (and their standard errors) for the interaction terms, which by virtue of Stata's factor notation are the differences (log-odds ratios) from country eleven's intercept coefficient for each of the nonredundant outcome choices.

        Maybe even compute a test statistic as a summary of the lack of generalization to country eleven from the other ten, using something like
        Code:
        testparm i.country_id#i.x1
        immediately after fitting the multinomial logistic regression model above.

        Comment


        • #5
          Joseph Coveney Your advice is pretty clever. I did not think about this approach. I ran the regression and I found a statistically significant differences. Nevertheless, I faced a problem that some variables are omitted. If I solve this problem, this approach is what I was finding out. Thank you very much.

          Comment


          • #6
            Joseph Coveney I want to ask an additional question. How do you think about the code below
            Code:
            gen d11 = (country_id == 11)
            mlogit y i.x1 c.x2 d11#i.x1
            That is, we just assume that the other ten countries have same coefficient.
            Of course, the code you recommended is more general. But, I am facing some omission problem.

            Thank you.
            Last edited by Minchul Park; 20 Nov 2020, 08:55.

            Comment


            • #7
              I think that you mean
              Code:
              mlogit y i.d11 i.x1 c.x2 i.d11#i.x1
              or more succinctly
              Code:
              mlogit y c.x2 i.d11##i.x1
              That you have an omission problem would give me pause. I would wonder whether the dataset is adequate to assess similarity of profile of choices between countries across levels of x1.

              Comment

              Working...
              X