Dear Statalist:
I'm looking at running a McFadden choice-style model. It is unclear to me whether what I am proposing is technically a McFadden choice model or a variant of it, and it is similarly unclear whether -cmclogit- is the right choice here. I'm wondering if other -cm- commands, -clogit-, or even -xt- commands are better choices. I have two stumbling blocks.
1) This is a silly example so please don't ask "what is the scientific purpose of this?" Let's say we have the carchoice dataset in the cmclogit example (-webuse carchoice-) and I merge in data on the cost of each car nationality (cost). This is probably not accurate since Korean automakers now have viable electric vehicles, but let's assume that this is the right data.
I'm going to create a new variable that is the distance between cost and income, which is roughly analogous to the "dealership" data where the value is officially outcome-specific but really is jointly the result of the consumer and the outcome. And then I'm going to regress purchase choice on this distance.
I thought that the cmclogit would be a way to do this, but when I run the code for it I get a long stretch of not-concave iterations followed by an error I have never seen before when I exit out: dimension of beta incorrect (r(503).
At first I thought I must have created an identification issue for the model since all of the responses are a mechanically the distance of cost and income. If that were the case, I think it might work if I had distance between income and cost at the actual purchase option level rather than relying on the proxy? But I'm also starting to wonder if I might be simply misunderstanding what can vary inside of -cmclogit- as a command. Dealerships varies in a similar way so I thought it would work but maybe I am missing something. Is there something about -cmclogit- that would make this not work compared to running it in other -cm- commands or potentially -xt- commands?
Or, more generally: How exactly do people set up -cmclogit- to look at the distance between different options and an individual-level variable? Because I have seen models like that in the past.
2) One of the things I've seen with choice models is that it's not possible to estimate all of the options at one time. For example, if you're trying to predict car choice at the make / model level you would have hundreds (thousands?) of different options available. There's no way that would converge, so instead of estimating one out of 300 options an analyst will randomly select 10-20% of the alternatives. This would lead to something analogous to an unbalanced panel dataset. Would I just...randomly select 10-20% of the alternatives and drop the rest? Or is there something in the -cmclogit- command that will make this easier?
If what I'm doing just makes more sense with a different command I would love to hear it.
Thanks in advance,
Jonathan
I'm looking at running a McFadden choice-style model. It is unclear to me whether what I am proposing is technically a McFadden choice model or a variant of it, and it is similarly unclear whether -cmclogit- is the right choice here. I'm wondering if other -cm- commands, -clogit-, or even -xt- commands are better choices. I have two stumbling blocks.
1) This is a silly example so please don't ask "what is the scientific purpose of this?" Let's say we have the carchoice dataset in the cmclogit example (-webuse carchoice-) and I merge in data on the cost of each car nationality (cost). This is probably not accurate since Korean automakers now have viable electric vehicles, but let's assume that this is the right data.
I'm going to create a new variable that is the distance between cost and income, which is roughly analogous to the "dealership" data where the value is officially outcome-specific but really is jointly the result of the consumer and the outcome. And then I'm going to regress purchase choice on this distance.
Code:
clear webuse carchoice *Example from manual cmset consumerid car cmclogit purchase dealers, casevars(i.gender income) *Creating example data of cost for each car nationality and difference with respondent income gen cost=15 if car==4 replace cost=20 if car==2 replace cost=25 if car==1 replace cost=30 if car==3 gen inc_cost_diff=income-cost cmclogit purchase inc_cost_diff
At first I thought I must have created an identification issue for the model since all of the responses are a mechanically the distance of cost and income. If that were the case, I think it might work if I had distance between income and cost at the actual purchase option level rather than relying on the proxy? But I'm also starting to wonder if I might be simply misunderstanding what can vary inside of -cmclogit- as a command. Dealerships varies in a similar way so I thought it would work but maybe I am missing something. Is there something about -cmclogit- that would make this not work compared to running it in other -cm- commands or potentially -xt- commands?
Or, more generally: How exactly do people set up -cmclogit- to look at the distance between different options and an individual-level variable? Because I have seen models like that in the past.
2) One of the things I've seen with choice models is that it's not possible to estimate all of the options at one time. For example, if you're trying to predict car choice at the make / model level you would have hundreds (thousands?) of different options available. There's no way that would converge, so instead of estimating one out of 300 options an analyst will randomly select 10-20% of the alternatives. This would lead to something analogous to an unbalanced panel dataset. Would I just...randomly select 10-20% of the alternatives and drop the rest? Or is there something in the -cmclogit- command that will make this easier?
If what I'm doing just makes more sense with a different command I would love to hear it.
Thanks in advance,
Jonathan
