Distribution of values among multiple variables

Kerstin Panz

Join Date: Nov 2019

Posts: 2
#1

Distribution of values among multiple variables

04 Nov 2019, 07:00

Hello there,

I have a question concerning my data set. I want to find out the total distribution of values which occur in multiple variables.

My sample is structured as the following:
A participant can have up to 4 different roles, which are represented in 4 different variables (RC1 (rolecode1) RC2 RC3 RC4). In addition, the participant can choose his position from 30 different rolecodes (e.g. Secretary, President, Vice President and so on). These values can also change from year to year, depending on changes of employment / promotion.

Therefore the dataset looks something like this:

participantid year RC1 RC2 RC3 RC4

1 2017 S VP P .
1 2018 S P . .
1 2019 S P . .
2 2017 CB EVP P D
2 2018 CB EVP P .
2 2019 CEO D . .

Now my problem is that I can look at the distribution of only 1 varibale by using: tabulate rolecode1
But I don't know how I can find out the total distribution of for example the rolecode "P" in the entire dataset. In my short example for rolecode2 it would be 0.4, for rolecode3 it would be 1 and for the entire example it would be 0.3125.

Does anybody have an idea how I can create an overview like the command 'tabulate' does, but for the distribution of all rolecodes in the 4 distinctive variables?

Like this:
Rolecodes Frequency Percent
S 3 0.1875
CB 2 0.125
CEO 1 0,0625
VP 1 0.0625
P 5 0.3125
EVP 2 0.125
D 2 0.125

Your help is very much appreciated.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17749
#2

04 Nov 2019, 07:11

Kerstin:
welcome to this forum.
This seems a job for -collapse-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35810
#3

04 Nov 2019, 07:14

Please read and act on https://www.statalist.org/forums/help#stata in giving data examples.

I don't quite agree with Carlo Lazzaro. At the simplest you want to pool RC? in a table, and for that tabm offers one basic method.

Code:

ssc install tab_chi help tabm

tabm is thus a search term to find many similar questions here on Statalist.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17749
#4

04 Nov 2019, 07:19

Nick:
thanks for the hint.
Admittedly, I was not aware of (your) community-contributed programme -tabm-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Kerstin Panz

Join Date: Nov 2019

Posts: 2
#5

04 Nov 2019, 08:20

Thank you both very much. It worked perfectly with the tabm command!

Nick Cox Sorry, for not providing a perfect data sample (I hoped my example was sufficient and I didn't realize the spaces in my exemplary tables disappeared and made them confusing).

If I may add one question?
After seeing the total distribution I can now see that about half of the 30 rolecodes appear less than 1% of the time. Is it possible to unite these (string) values from the 4 variables into 1 value (something like other job titles) if the total frequency is less than 1%?
Because they are spreaded across the 4 variables I wouldn't know how to tag and rename them.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35810
#6

04 Nov 2019, 08:50

it's not the spacing that is problematic in #1. It's the use of undefined value ;labels.

tabm has a documented replace option. You may ,most easily aggregate categories in the version of the data it produces.
Comment

Announcement

Distribution of values among multiple variables

Comment

Comment

Comment

Comment

Comment