Dear Community,
I have a question regarding the treatment of independent variables that are compositional, i.e. single variables are relative contributions to a whole and facing a sum constraint (they all sum up to 1 or 100%).
In an earlier post of mine, I was considering to just drop one variable as reference or to drop the constant term to avoid multicollinearity. However, after going a bit deeper into the topic, I came across some other problems that come with compositional independent variables. First, following Aitchison (1986), compositional data can also be seen as singular, saying they are data with a singular covariance matrix. Hence, the interpretation of the coefficients might be somewhat difficult. Because of the constant sum constraint, it is impossible to alter one proportion without changing the other. Hron, Filmoser and Thompson (2009) are speaking of a singularity problem of the data. Hence they suggest to log transform the data in order to present the data in the standard Euclidian space (whereas untransformed they are rather in a Simplex.)
For me, this all causes a bit of confusion. Is it right, that I have to treat compositional independent variables differently and to log transforms them? Further, if some of my compositional variables are zero (zero shares of spending on tobacco of a household, lets say) it is not possible to apply a log transformation. Aitchison (1986) proposed to replace zero values with very little non-negative values.
This entire thing seems to be a rather rarely handled topic, because a cannot find much about it. Does anybody know more about this topic and can suggest a solution on how to treat compositional independent variables with some values being zero? Can I just stick to drop one compositional part or the constant, or do I really have to find a solution with log-transforming the variables?
I hope for some fruitful points on this.
Thank you very much in Advance!
Lirerature:
Hron, Filzmoser and Thompson (2009) "A linear regression with compositional explanatory variables". Journal of Applied Statistics, Vol 00, No 00, pp. 1-15.
J. Aitchison (2003[1986]) "The statistical analysis of compositional data". Blackburn Press
I have a question regarding the treatment of independent variables that are compositional, i.e. single variables are relative contributions to a whole and facing a sum constraint (they all sum up to 1 or 100%).
In an earlier post of mine, I was considering to just drop one variable as reference or to drop the constant term to avoid multicollinearity. However, after going a bit deeper into the topic, I came across some other problems that come with compositional independent variables. First, following Aitchison (1986), compositional data can also be seen as singular, saying they are data with a singular covariance matrix. Hence, the interpretation of the coefficients might be somewhat difficult. Because of the constant sum constraint, it is impossible to alter one proportion without changing the other. Hron, Filmoser and Thompson (2009) are speaking of a singularity problem of the data. Hence they suggest to log transform the data in order to present the data in the standard Euclidian space (whereas untransformed they are rather in a Simplex.)
For me, this all causes a bit of confusion. Is it right, that I have to treat compositional independent variables differently and to log transforms them? Further, if some of my compositional variables are zero (zero shares of spending on tobacco of a household, lets say) it is not possible to apply a log transformation. Aitchison (1986) proposed to replace zero values with very little non-negative values.
This entire thing seems to be a rather rarely handled topic, because a cannot find much about it. Does anybody know more about this topic and can suggest a solution on how to treat compositional independent variables with some values being zero? Can I just stick to drop one compositional part or the constant, or do I really have to find a solution with log-transforming the variables?
I hope for some fruitful points on this.
Thank you very much in Advance!
Lirerature:
Hron, Filzmoser and Thompson (2009) "A linear regression with compositional explanatory variables". Journal of Applied Statistics, Vol 00, No 00, pp. 1-15.
J. Aitchison (2003[1986]) "The statistical analysis of compositional data". Blackburn Press
Comment