Help with lclogit

Megan Waldrop

Join Date: Jul 2019
Posts: 3

27 Jul 2019, 05:57

Hello Everyone,

I have conducted a discrete choice experiment in multiple countries. Each participant answered 12 choice sets (total 24 choice sets in two blocks). Each choice set had 3 alternatives (2 options and a none). There were 4 attributes: price (4 levels), production a (yes/no), production b (yes/no), and production type (categorical variable with 4 levels). I have a total of around 3000 participants. Also, there was a constraint in place that when production a was yes (=1) production b also had to be yes (=1). Is there anything I need to do to account for this constraint?

I have some questions about the best way to run the analysis. I specifically have questions regarding using lclogit as I think this is the best way forward because there seems to be a lot of heterogeneity in the data. Also, pooling the data also looks to be best.

Here is example of the dataset and code I have used:

Code:

 ID
country
choiceset
alternative
choice
price
ProdA
ProdB
ProdType
Level1
Level2
Level3
Level4
ASC_none
identifier

1
1
1
1
0
1.55
0
0
1
1
0
0
0
0
101

1
1
1
2
1
1.15
1
0
2
0
1
0
0
0
101

1
1
1
3
0
0
0
0
0
0
0
0
0
1
101

1
1
2
1
1
1.15
1
1
4
0
0
0
1
0
102

1
1
2
2
0
0.95
1
0
3
0
0
1
0
0
102

1
1
2
3
0
0
0
0
0
0
0
0
0
1
102


lclogit choice price ASC_none prodA prodB level2 level3 level4, group(identifier) id(ID) nclasses(2) 
lclogitml, iterate (40)
wtp price prodA prodB level2 level3 level4, equation(choice1) krinsky reps(10000)

1. What is meant by the share constant estimates? See example:

Code:

-------------+----------------------------------------------------------------
share1       |
       _cons |  -.0617537   .1364026    -0.45   0.651     -.329098    .2055905
-------------+----------------------------------------------------------------
share2       |
       _cons |  -.7815727   .2272384    -3.44   0.001    -1.226952   -.3361935
------------------------------------------------------------------------------

2. After getting the class estimates I use the following code for probabilities:

Code:

lclogitpr cp, cp
egen double cpmax=rowmax(cp1-cp2)
summarize cpmax, sep(0)

//create the class membership based on the highest probability
gen byte class=.
forval c=1/class#{
replace class= `c' if cpmax==cp`c'
}

forvalues c = 1/class# {
quietly summarize pr if class == `c' & choice==1
local n=r(N)
local a=r(mean)
quietly summarize pr`c' if class == `c' & choice==1
local b=r(mean)
matrix pr = nullmat(pr) \ `n', `c', `a', `b'
}
matrix colnames pr = "Obs" "Class" "Uncond_Pr" "Cond_PR"
matlist pr, name(columns)

Is it correct to then report the posterior probabilities estimating who belongs in each class?

3. Is it correct to profile the classes by using cross tabs (tabs variable class, column). I would like to know the best way to profile the classes. When I compare to output from other programs the interpretation seems to be different (i.e. the variables are included in the class models).

4. How does the seed affect the results? Is it necessary to include? If so, what would be an appropriate seed value?

Also, one of the countries I get different results. For example, the price variable is not significant and latent class analysis does not shed any insights. Is it possible there could be many outliers or its just people did not care at all about any of the choices or are not price sensitive? Most of the other attributes are also not significant.

Is it better to use effects or dummy coding? I get slightly different results between the two and I am a little confused on how exactly to interpret the effects coding. Is it the difference between the utility means for all variables rather than the baselevels?

Thank you in advance! Any help on any of the above questions is much appreciated

Best,
Megan

Tags: None

Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

29 Jul 2019, 11:42

Megan, I'm not familiar with lclogit, nor am I familiar with choice analysis in general (or whatever the proper terminology is; pardon my imprecision, as I'm not an economist) . Since you haven't got any responses, I will make one point.

What you are doing in the quoted bit of code is modal class assignment, i.e. you assign people to the latent class they are most likely to be in.

Code:

//create the class membership based on the highest probability gen byte class=. forval c=1/class#{ replace class= `c' if cpmax==cp`c' } forvalues c = 1/class# { quietly summarize pr if class == `c' & choice==1 local n=r(N) local a=r(mean) quietly summarize pr`c' if class == `c' & choice==1 local b=r(mean) matrix pr = nullmat(pr) \ `n', `c', `a', `b' } matrix colnames pr = "Obs" "Class" "Uncond_Pr" "Cond_PR" matlist pr, name(columns)

In latent class analysis outside of economics, best practice is not to do this. (I have no idea if this is more broadly accepted in econ.) It ignores the uncertainty inherent in our class assignments. Remember, we might think that Mrs. Smith has a 90% of being in class 1, Mrs. Chen has an 80% probability, Mrs. Williams has an 85% probability, etc. Alternatively, if your classes are pretty similar, you might see your modal probability be, for example, 40% in a 3-class solution.

If lclogit has a method to produce model-based class characteristics, I would highly recommend using it. If your latent classes are well-separated and lclogit lacks this facility, this may be an acceptable approximation; you will want to check the model entropy. It describes how well separated the classes are, and it's a bit like having a high Hirschman Herfindahl index (note: I believe Simpson was first to the concept measured by Hirschman and Herfindahl).

I did not know that lclogit used a random seed. Here's what I suspect (keeping in mind that I don't know the package). Latent class likelihoods are multimodal. The current best practice is to try many widely-varied starting parameter estimates and to use the solution with the highest log-likelihood that has consistent convergence (i.e. you replicated it at least a few times). I suspect the seed controls the parameter start values. It's just a number you use to initialize a pseudorandom number generator. The particular value of the seed isn't important, it will just enable you to replicate your results. What is important is that you have to do the parameter search. With 2 or 3 classes, chances are it's not that important. As you get to 4 or more classes, you will want to make sure to do it.

I'm afraid I have no clue about your other questions. If you still get no feedback, I would contact the package authors.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Megan Waldrop

Join Date: Jul 2019

Posts: 3
#3

31 Jul 2019, 03:33

Hello Weiwen,

Thank you for your reply! I will look more into the class assignment issues and measuring the model entropy. And thank you for clarifying about the seed. Right now I am looking at 3 or 4 classes as optimal. Could you explain a bit more about the parameter search? I am new to latent class and am still learning the correct way to assess and analyze the classes.

Appreciate the help!

Megan
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#4

01 Aug 2019, 12:03

Megan,

Here's a very good read by Kathryn Masyn.

https://www.statmodel.com/download/Masyn_2013.pdf

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Megan Waldrop

Join Date: Jul 2019

Posts: 3
#5

02 Aug 2019, 02:16

Thank you!
Comment

Announcement

Help with lclogit

Comment

Comment

Comment

Comment