**Problem description**

Assume there are individuals j who each need to choose from two alternatives i={1,2}. Individuals can select any combination of alternatives (no alternative, alternative 1, alternative 2, or alternative 1 and 2). All choices are just explained by a single attribute x. Each individual is shown a randomly-generated pair of alternatives with respect to x. Here is an example:

**Question**

Can this be written as univariate probit model

with clustered standard errors, rather than a multivariate probit? Using the example above, x_ij is the randomly-generated number shown as i'th alternative to individual j.

**Rational**

A multi-answer choice problem is called a menu-based choice problem in the literature (Liechty et al. 2001, Manchanda et al. 1999), which can be modelled by a multivariate (here actually bivariate) probit model: β_11 and β_12 are the main effect coefficients for x, and β_21 and β_22 the cross effect coefficients for x for alternative 1 and 2, respectively. The justification for the model is as follows:

*Why fitting a simultaneous equation model?*The two choices for the alternatives are connected through the correlated error terms ε_1 and ε_2 (i.e. the parameter for cov(ε_1, ε_2) is free). The correlated errors effectively control for the residual ε_2j of equation 2 in equation 1 (i.e. it controls for the choice of y_2j except the part already explained by x_2j and x_1j). That means if alternative 2 has any fixed characteristics which influence the choice for alternative 1, this will be taken care of. For instance, if shoppers buy tomatoes they are more likely to buy onions, too, because onions are an important ingredient to make tomato salad, thus cov(ε_1, ε_2)>0 in this case.*Why to control for cross effects*? Because we control for x_2 in equation 2, the variable of interest is removed from the error term ε_2. Because x_2 is not part of ε_2, it is not carried over as control for equation 1. Thus if x_2 is both correlated with x_1 and y_1 it would have otherwise become an omitted variable in the first equation.

*Assumption 1:**x_1j and x_2j are generated randomly.*

*Assumption 2:**Alternative 2 (1) does not have any varying characteristics whatsoever which in any plausible way are correlated with x_1 (x_2). This is quite a strong assumption which typically just holds in a lab experiment.*

As we assumed that alternative 2 has no varying characteristics which in any way could be correlated with x_1, we can safely assume that cov(x_1,y*_2)=0. Because Assumption 1 already means that cov(x_1,x_2)=0, it follows cov(x_1, ε_2)=0. Consequently, β_11 and β_12 are unbias.

*Assumption 3:**All alternatives 1 and 2 are completely identical except for attribute x.*

The only reason for not merging the multivariate probit model into the univariate probit model are the correlated error terms ε_1 and ε_2. However, this can be easily dealt with by fitting the univariate probit model clustered by individuals j. The STATA code is simply logit y x, cluster(j).

**Context**

Admittedly, the above survey example does not make much sense. But I actually need exactly such a model to analyse data from a behaviour experiment for the knapsack problem (called capital allocation problem in finance), where I find that people are strongly bias towards picking smaller projects/investments after controlling for their return-on-investment ratio (also called value for money or bang for the buck).

I would be very grateful if somebody could give me a quick feedback whether I am on the right track in terms of the econometric model.