*how to* multiple variables (% shares) which add up to 100 in total

Tim Goetz

Join Date: Dec 2017

Posts: 8
#1

*how to* multiple variables (% shares) which add up to 100 in total

19 Feb 2018, 02:03

Dear statalist colleagues,

I am curently working on a Research Project in which I would need some help, logically as well as syntax-based. Following case:
- I have 7 variables (x1 to x7) in my dataset (each continous) which add up to 100% all together, which means, every single variable represents a share size. (e.g. X as the share of movement on your workplace; 1: % of time walking at work; 2: % of time standing at work; 3: % of time sitting at work; etc.; up to variable 7: others)
- So, now I wanted to check the effect of e.g. variable 3 ("time sitting at work") on health conditions (y) within a Regression

Problem:
How am I able to check for this? Because every variable depends on the size of the other variables and is not Independent in its size
For now I just simply did: reg y x3 controls, robust

By doing so, my beta coefficients simply describe the potential effect on y when increasing x by one unit, by holding all other conditions constant. (which is simply not possible in this case, because on unit increase in x3 subsequently leads to one unit decrease in one of the other x-variables).

My questions:
1.) Does is make sense to incorporate oll other x-variables as controls in the example above? (i dont think so, because alltogether add up to 100% in any case which may lead to omitting, but maybe i am wrong)

2.1.) Is there a possibility to somehow identify the "best set of x1 to x7 share sizes" which leads to the best y (health) Outcome in average across the whole sample? -> e.g. as result: x1:20%, x2:20%, x3:10%, x4:15%, x5:10%, x6:5% x7:20% leads in average to the best health condition perceived by employees across the whole sample)

2.2.) Additionally: Is it somehow possible to analyze this in a way to make such a Statement: "In Order to increase health condition for 10 Units (or 10%, not important for now), you must increase x3 for 3 Units and decrease x4 for 3 Units. (-> shifting between x conditions // -> also relies on the "Independence violation" in here..)

3.) Is there any specific expression for this "Problem" or Kind of Analysis? You know how its called or any paper someone did something this way or also know how to code this in this situation?

I would be vary grateful for any help, because I feel a bit lost..
Also feel free to add any comments about your thoughts, any hints, first impression, etc. - may be all helpful.

Thanks a lot for reading and thinking through..
Best Regards Tim

Last edited by Tim Goetz; 19 Feb 2018, 02:05.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35646
#2

19 Feb 2018, 04:42

Such data are often called compositional and there are several books on the field.

Simplest answer is that you don't (or shouldn't) use all those predictors, because one is redundant. Note that rounding and measurement error may blur this, but it's the principle.

This is on all fours with not using both % alive and % dead (say) because you know one variable suffices to carry all the information whenever there are just two categories.

What's quite likely is that the response can't behave linearly because those predictors aren't bounded, but I'd explore that graphically before I decided whether it really bites and how best to model it.
Comment

Announcement

how to multiple variables (% shares) which add up to 100 in total

Comment