Good morning to everybody,
I have some doubts about latent classes. I am approaching it for the first time.
Below is the database used for the analysis.
----------------------- copy starting from the next line -----------------------
------------------ copy up to and including the previous line ------------------
A1, A2,A3, A4 and D1, D2, D3, D4 are variables that express values of a test at four different time points Logit is for binary 0-1 data. ZIP is for data with excess 0's so i choose cnorm model.
My doubts:
1) Is the model choice correct?
2) I used the written syntax. Is it correct to start from the choice of the linear polynomial (1) for all 4 groups or what is recommended. And is the syntax chosen to verify the choice of the best model (number of groups) also correct ( I FOUND IT ON THE WEB) ? If the syntax for the choice of groups is not correct, how to verify odds
of correct classification (>5 IN ALL GROUPS), APPA (0.7 for each group), mismatch between estimated and assigned group probabilities, percentage of individuals estimated to be assigned to the smallest group (cut off of 1%)
I have some doubts about latent classes. I am approaching it for the first time.
Below is the database used for the analysis.
----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte Codice_soggetto int(A1 D1 A2 D2 A3 D3) double(A4 D4) byte(t1 t2 t3 t4) 1 . . 70 89 107 113 100 108 1 2 3 4 2 . . 91 110 112 108 . . 1 2 3 4 3 . . 100 118 96 108 116 120 1 2 3 4 4 80 65 80 65 96 96 116 114 1 2 3 4 5 . . . . 102 113 118 96 1 2 3 4 6 . . . . 76 64 76 64 1 2 3 4 7 . . . . 91 108 100 99 1 2 3 4 8 . . . . 98 97 100 70 1 2 3 4 9 53 74 91 83 100 93 . . 1 2 3 4 10 70 62 74 72 84 104 . . 1 2 3 4 11 . . 74 50 90 104 . . 1 2 3 4 12 . . 91 78 150 113 . . 1 2 3 4 13 . . . . 107 126 92 98 1 2 3 4 14 . . . . 95 104 102 99 1 2 3 4 15 . . . . 115 88 100 88 1 2 3 4 16 . . 92 118 . . 102 115 1 2 3 4 17 . . . . 118 117 100 100 1 2 3 4 18 . . 112 112 118 126 119 113 1 2 3 4 19 . . 87 98 95 126 . . 1 2 3 4 20 . . 97 106 78 89 . . 1 2 3 4 21 . . 58 50 . . 76 71 1 2 3 4 22 . . 85 78 90 113 . . 1 2 3 4 23 . . 99 106 119 113 . . 1 2 3 4 24 . . 96 100 102 96 . . 1 2 3 4 25 98 84 78 72 90 87 100 85 1 2 3 4 26 111 94 95 112 100 103 127 100 1 2 3 4 27 87 93 70 88 100 98 79 87 1 2 3 4 28 89 99 103 89 107 118 79 92 1 2 3 4 29 89 95 101 84 . . 116 102 1 2 3 4 30 107 . 108 112 107 122 107 107 1 2 3 4 31 107 104 124 100 107 113 116 92 1 2 3 4 32 107 104 120 92 119 118 135 112 1 2 3 4 33 111 119 121 100 107 137 . . 1 2 3 4 34 . . 58 116 90 104 88 112 1 2 3 4 35 . . 120 95 102 122 . . 1 2 3 4 36 80 99 61 60 124 73 126 102 1 2 3 4 37 93 99 91 78 96 73 107 97 1 2 3 4 38 . . 97 100 . . 107 92 1 2 3 4 39 . . 103 89 102 78 116 123 1 2 3 4 40 98 89 82 100 106 126 98 102 1 2 3 4 41 102 83 74 83 107 113 98 107 1 2 3 4 42 107 113 109 133 118 126 116 102 1 2 3 4 43 . . 95 78 90 96 126 76 1 2 3 4 44 93 99 69 84 83 75 . . 1 2 3 4 45 . . 97 111 118 108 . . 1 2 3 4 46 . . 103 95 102 91 98 71 1 2 3 4 47 . . 105 90 118 131 126 92 1 2 3 4 48 . . 108 89 106 93 107 102 1 2 3 4 49 124 99 108 112 84 113 . . 1 2 3 4 50 95 91 105 111 113 122 126 107 1 2 3 4 51 . . 91 106 102 117 116 112 1 2 3 4 52 107 130 78 72 . . 51 50 1 2 3 4 53 . . 78 106 89 103 116 118 1 2 3 4 54 98 . 66 72 89 108 98 107 1 2 3 4 55 89 119 95 118 107 127 107 118 1 2 3 4 56 . . 82 95 106 84 98 107 1 2 3 4 57 . . 104 98 112 108 107 97 1 2 3 4 58 127 118 105 95 96 100 . . 1 2 3 4 59 . . 116 106 130 104 126 81 1 2 3 4 60 117 113 117 84 102 86 . . 1 2 3 4 61 . . 91 83 123 126 . . 1 2 3 4 62 111 114 109 68 105 83 . . 1 2 3 4 63 132 113 103 140 102 122 . . 1 2 3 4 64 . . 108 78 113 87 . . 1 2 3 4 65 . . . . 125 116 107 102 1 2 3 4 66 . . 117 95 102 96 . . 1 2 3 4 67 . . 81 68 83 89 . . 1 2 3 4 68 . . 87 86 96 100 . . 1 2 3 4 69 . . . . 96 122 135 118 1 2 3 4 70 . . 87 98 . . 100.375 87.91 1 2 3 4 end
A1, A2,A3, A4 and D1, D2, D3, D4 are variables that express values of a test at four different time points Logit is for binary 0-1 data. ZIP is for data with excess 0's so i choose cnorm model.
My doubts:
1) Is the model choice correct?
2) I used the written syntax. Is it correct to start from the choice of the linear polynomial (1) for all 4 groups or what is recommended. And is the syntax chosen to verify the choice of the best model (number of groups) also correct ( I FOUND IT ON THE WEB) ? If the syntax for the choice of groups is not correct, how to verify odds
of correct classification (>5 IN ALL GROUPS), APPA (0.7 for each group), mismatch between estimated and assigned group probabilities, percentage of individuals estimated to be assigned to the smallest group (cut off of 1%)
Code:
traj, multgroups(4) var1(A1 A2 A3 A4) indep1(t1 t2 t3 t4) order1(1 1 1 1) model1(cnorm) min1(53) max1(150) var2(D1 D2 D3 D4) indep2(t1 t2 t3 t4) order2(1 1 1 1) model2(cnorm) min2(50) max2(140) gen Mp = 0 foreach i of varlist _traj_ProbG* { replace Mp = `i' if `i' > Mp } sort _traj_Group *and the odds of correct classification by _traj_Group: gen countG = _N by _traj_Group: egen groupAPP = mean(Mp) by _traj_Group: gen counter = _n gen n = groupAPP/(1 - groupAPP) gen p = countG/ _N gen d = p/(1-p) gen occ = n/d *Estimated proportion for each group scalar c = 0 gen TotProb = 0 foreach i of varlist _traj_ProbG* { scalar c = c + 1 quietly summarize `i' replace TotProb = r(sum)/ _N if _traj_Group == c } gen d_pp = TotProb/(1 - TotProb) gen occ_pp = n/d_pp *This displays the group number [_traj_~p], *the count per group (based on the max post prob), [countG] *the average posterior probability for each group, [groupAPP] *the odds of correct classification (based on the max post prob group assignment), [occ] *the odds of correct classification (based on the weighted post. prob), [occ_pp] *and the observed probability of groups versus the probability [p] *based on the posterior probabilities [TotProb] list _traj_Group countG groupAPP occ occ_pp p TotProb if counter == 1
Comment