Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • cmp for instrumental multinomial probit when there are many categories

    Dependent variable Y1 has three categories 1, 2, 3

    Dependent variable Y2 has four categories 1, 2, 3, 4

    Y2 is simply created by dividing "3" category in Y1 into 3 and 4.

    The code with Y1

    Code:
    cmp (Y1=X C, iia) (X=Z C) if C<100, ind($cmp_mprobit $cmp_cont) vce(cluster D)
    ran quickly without error, whereas the following code

    Code:
    cmp (Y2=X C, iia) (X=Z C) if C<100, ind($cmp_mprobit $cmp_cont) vce(cluster D)
    throws me the following:

    Fitting full model.
    Likelihoods for 522614 observations involve cumulative normal distributions above dimension 2.
    Using ghk2() to simulate them. Settings:
    Sequence type = halton
    Number of draws per observation = 1446
    Include antithetic draws = no
    Scramble = no
    Prime bases = 2 3
    Each observation gets different draws, so changing the order of observations in the data set would change the results.
    and it is taking forever to run.

    I have a few questions.

    First, when it says "Likelihoods for 522614 observations involve cumulative normal distributions above dimension 2", is it saying "above dimension 2" because Y2 has 4 categories? So if I don't want to spend this long time for running one regression, should I stick to at most 3 categories, like Y1 ?

    Second, it says "Each observation gets different draws, so changing the order of observations in the data set would change the results." Then is it safe to believe the regression results?

    Third, I read this from help cmp.

    "If the estimation problem requires the GHK algorithm (see above), change the number of draws per observation in the simulation sequence using the ghkdraws() option. By default, cmp uses twice the square root of the number of observations for which the GHK algorithm is needed, i.e., the number of observations that are censored in at least three equations. Raising simulation accuracy by increasing the number of draws is sometimes necessary for convergence and can even speed it by improving search precision. On the other hand, especially when the number of observations is high, convergence can be achieved, at some loss in precision with remarkably few draws per observations--as few as 5 when the sample size is 10,000 (Cappellari and Jenkins 2003). And taking more draws can also greatly extend execution time."
    How can I reconcile

    Sentence 1 "increasing the number of draws is sometimes necessary for convergence and can even speed it by improving search precision."
    and

    Sentence 2 "On the other hand, especially when the number of observations is high, convergence can be achieved, at some loss in precision, with remarkably few draws per observations (...) And taking more draws can also greatly extend execution time."
    ?

    Sentence 1 is saying increase # in ghkdraws(#) to speed it up, and Sentence 2 is saying decrease it. Can I reconcile these two as "When N is big, choose low #, when N is small, choose high #"?

    Also, as the guideline says, when 5 is enough for 10,000 observations, then will ghkdraws(5) also be enough for my 522614 observations? If yes, why is cmp using 1446 draws per observations by default?

  • #2
    You reconcile them by observing that one says "sometimes" and the other "can be". They're both possible. Try stuff and see how it goes. Start with fewer draws because it's better to start simple.
    Yes, in a multinomial probit, there's one equation in the model for every outcome possibility other than the base case. So fewer possible outcomes means fewer dimensions.

    Comment


    • #3
      Thank you for your advice again. I set ghkdraws(1) and it's doesn't even take that much more time than 3-category dependent variable. 200 seconds. ghkdraws(10) was also fine. 360 seconds.But I guess cmp wouldn't have set this equal to 1446 if could set it to 1 or 10 and get correct results. So I guess that wouldn't necessarily mean the results from 1 or 10 draw is correct.

      Indeed, ghkdraws(1) and ghkdraws(10) produced different coefficients and standard errors.

      As per your advice, I will try various numbers for ghkdraws(#) and see how it goes. But how can I tell which is correct and which is not?
      Last edited by James Park; 23 Mar 2019, 01:14.

      Comment

      Working...
      X