Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Two step cluster analysis and its coefficient

    Hey all,

    hopefully this is the right place for my questions! :-) I've got the following situation:

    I use a relatively large database (~5000 cases) with variables that range from nominal to metric. Typical parameters such as gender, age and education level have been assessed, but also, for example, media usage and employment state. I want to create indices and commence a two-step cluster analysis, since important values such as gender or employment state cannot be interpreted as metric.

    Now I know that with "normal" cluster analysis, you can chose among various coefficients for the comparision of cases. One considers common non-values as similarities, one only considers present values as similar. This can be useful if dummies are used: For example, two people are both NOT a member of the Republicans, NOT a member of the Democrats, NOT a member of the Green Party but a member of The Libertarian Party. If only positive values are considered, that would mean they have one thing in common; if both negative and positive values are considered, they have four things in common (although it is really just one).

    Thus, I've got the following questions regarding two-step cluster analysis:

    => Question #1: Can I chose the coefficient used for binary variables when I use a two step cluster analysis?

    => Question #2: If not, which coefficient does that analysis use? Are common non-values considered a similarity?

    => Question #3: If common non-values are considered a similarity: Is there a way to reduce autocorrelation akin to the example above? Transforming the binary variables to metric ones is not feasible, is there anything else I could do about it?

    => Question #4: Does it "confuse" the algorithm if some variables are encoded with 0,1, and some with 1,2 as possible values? Or does it merely assess the distance between cases and not care about this at all?

    => Question #5: Should the binary and the metric variables used be about the same quantity? I use 3 binary variables, but way more metric ones. Will one binary (of only few) influence the cluster shaping more than one metric (of many)?

    I would be VERY happy if any of you could help me with these questions! I've already done literary research on them, sadly, I wound up with no answers yet.

    Thank you all! :-)
    -AR
Working...
X