No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple-answer questions: When to reduce multivariate to univariate probit?

    Problem description

    Assume there are individuals j who each need to choose from two alternatives i={1,2}. Individuals can select any combination of alternatives (no alternative, alternative 1, alternative 2, or alternative 1 and 2). All choices are just explained by a single attribute x. Each individual is shown a randomly-generated pair of alternatives with respect to x. Here is an example:
    Click image for larger version

Name:	1.jpg
Views:	1
Size:	21.9 KB
ID:	1526321


    Can this be written as univariate probit model
    Click image for larger version

Name:	5.JPG
Views:	2
Size:	18.3 KB
ID:	1526327.
    with clustered standard errors, rather than a multivariate probit? Using the example above, x_ij is the randomly-generated number shown as i'th alternative to individual j.


    A multi-answer choice problem is called a menu-based choice problem in the literature (Liechty et al. 2001, Manchanda et al. 1999), which can be modelled by a multivariate (here actually bivariate) probit model:
    Click image for larger version

Name:	2.JPG
Views:	1
Size:	38.8 KB
ID:	1526322
    β_11 and β_12 are the main effect coefficients for x, and β_21 and β_22 the cross effect coefficients for x for alternative 1 and 2, respectively. The justification for the model is as follows:
    • Why fitting a simultaneous equation model? The two choices for the alternatives are connected through the correlated error terms ε_1 and ε_2 (i.e. the parameter for cov(ε_1, ε_2) is free). The correlated errors effectively control for the residual ε_2j of equation 2 in equation 1 (i.e. it controls for the choice of y_2j except the part already explained by x_2j and x_1j). That means if alternative 2 has any fixed characteristics which influence the choice for alternative 1, this will be taken care of. For instance, if shoppers buy tomatoes they are more likely to buy onions, too, because onions are an important ingredient to make tomato salad, thus cov(ε_1, ε_2)>0 in this case.
    • Why to control for cross effects? Because we control for x_2 in equation 2, the variable of interest is removed from the error term ε_2. Because x_2 is not part of ε_2, it is not carried over as control for equation 1. Thus if x_2 is both correlated with x_1 and y_1 it would have otherwise become an omitted variable in the first equation.
    Assumption 1: x_1j and x_2j are generated randomly.
    Thus cov(x_1,x_2)=0 and removing the cross effects does not bias our coefficients of interest β_11 and β_12 .

    Assumption 2: Alternative 2 (1) does not have any varying characteristics whatsoever which in any plausible way are correlated with x_1 (x_2). This is quite a strong assumption which typically just holds in a lab experiment.
    The choice for alternative 2 has an influence on the coefficient of interest β_11 if and only if cov(x_1, ε_2) ≠ 0. We can substitute ε_2 with y*_2 - β_02 - β_12*x_2, where y*_2 is the latent utility of alternative 2. Thus,
    Click image for larger version

Name:	3.JPG
Views:	1
Size:	46.9 KB
ID:	1526323

    As we assumed that alternative 2 has no varying characteristics which in any way could be correlated with x_1, we can safely assume that cov(x_1,y*_2)=0. Because Assumption 1 already means that cov(x_1,x_2)=0, it follows cov(x_1, ε_2)=0. Consequently, β_11 and β_12 are unbias.

    Assumption 3: All alternatives 1 and 2 are completely identical except for attribute x.
    As the alternatives are completely identical, x should have the same effect on y for both alternatives. That means β_01= β_02 and β_11= β_12. Thus, we obtain the new multivariate probit model
    Click image for larger version

Name:	4.JPG
Views:	1
Size:	29.7 KB
ID:	1526324

    The only reason for not merging the multivariate probit model into the univariate probit model
    Click image for larger version

Name:	5.JPG
Views:	2
Size:	18.3 KB
ID:	1526326
    are the correlated error terms ε_1 and ε_2. However, this can be easily dealt with by fitting the univariate probit model clustered by individuals j. The STATA code is simply logit y x, cluster(j).


    Admittedly, the above survey example does not make much sense. But I actually need exactly such a model to analyse data from a behaviour experiment for the knapsack problem (called capital allocation problem in finance), where I find that people are strongly bias towards picking smaller projects/investments after controlling for their return-on-investment ratio (also called value for money or bang for the buck).

    I would be very grateful if somebody could give me a quick feedback whether I am on the right track in terms of the econometric model.
    Last edited by Tom Pape; 25 Nov 2019, 09:52.