No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Latent profile model. Inconsistent mean value

    Dear Stata users,
    I am running a latent profile model using the data of a questionnaire addressed to a sample of artists. In the questionnaire, among other things, it is asked individuals to indicate the percentage of income that derive by different source (government, artistic work, other): so there are 3 observed variables that refer to this. Similarly, we ask the percentage of time dedicated to different activities (art, education, other), forming three other variables. I normalize the percentage such that the sum is 100 for both set of variables. I have estimated the model and I have considered, for each class, the estimated mean. Summing the estimated mean for the three variables referring to income, I obtain 100% in all the classes. However, when it comes to the variables related to the time spent, I obtain for all the classes a sum greater than 100 (ranging from 104 to 108). It seems that these values are inconsistent. Is it so? Or this is plausible?
    Thank you very much for your help

  • #2

    Assuming I'm reading you correctly, you have two sets of 3 indicators (also called observed variables, manifest variables). In each set, the percent should sum to 100%. It's like you really have two dirichlet random variables, or fractional multinomial random variables, or something like that.

    The problem is, Stata doesn't know that each set of indicators must sum to 100%. It treated each individual indicator as independent from the other two in the set. Thus, your results seem entirely plausible to me. This is a bit different from the conditional independence assumption that is a key assumption of latent class analysis, but that can be relaxed in latent profile analysis by allowing the indicator error terms to be correlate (e.g. use the option covstructure(_OEn, unstructured) or similar).

    As the aphorism attributed to George Box states, "All models are wrong, but some are useful". The situation above is definitely an impediment, but I don't think it's feasible to improve this within Stata's existing architecture and set of supported distributions.
    Please use the code delimiters to show code and results - use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Please use the command -dataex- to show a representative sample of data; it is installed already if you have Stata 14.2 or 15.1, else you can install it by typing

    ssc install dataex


    • #3
      Thank you Weiwen for your reply! What surprise me is that for the set of variables related to income, the sum is 100,000 in all the classes. Anyway, I'll use the option covstructure for coherence. Thank you again!