Frequencies of Multiple Variables in one Table

Julian Emmler

Join Date: Mar 2017

Posts: 7
#1

Frequencies of Multiple Variables in one Table

22 Mar 2017, 11:25

Hello everyone,

I have a problem concerning displaying frequencies of different variables in a table. The problem sounds rather straightforward, but I haven't found a way to solve it yet. The (simplified) setting is as follows: I have two variables (and a person identifier) which contain the industry affiliation of a person before and after a certain event. Thus each line of the dataset contains the person identifier, a number (industry-code) indicating the industry the person was affiliated with before the event and a number indicating the industry a person was affiliated with after a certain event. What I would like to have now is a table which shows me how the people were distributed accross industries before and after the event, i.e. a table which shows me the share of people that was affiliated with the different industries before and after the event. To make it a little difficult, there might also be industries that no person was affiliated with before the event but there are persons affiliated with the industry after the event or the other way around. In a second step i would then also see the difference between the shares before and after the event for each industry. I have tried multiple things with tabulate and tabstat, but was not able to produce the desired result. If anyone has an idea or if anything is unclear I'm happy to clarify. Thank you for your help,

cheers,
J. Emmler
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35713
#2

22 Mar 2017, 11:35

Only a moment ago I posted (almost) this in another thread.

Your description will make sense to you, but it's much harder work for anyone else to read it, absorb it and imagine your situation.

One good data example is worth a volume of exegesis.

Please do read the FAQ Advice, as you were asked to do before posting, and come back with a data example.
Comment
Julian Emmler

Join Date: Mar 2017

Posts: 7
#3

22 Mar 2017, 12:00

Sorry, for the confusing post. The data I'm working on is confidential so I can't post an excerpt but I will provide a "fake example". I hope the following example will clarify.

The data that I have looks like this:
Pers (personal identifier) industry_b industry_a

5001 4 3

5002 3 3

5003 4 2

5004 1 5

5005 6 7

5006 6 2

5007 4 2

5008 4 4

5009 9 8

5010 3 3

- tabulate industry_b - would produce:

Frequency Percent

1 1 0.1

3 2 0.2

4 4 0.4

6 2 0.2

9 1 0.1

- tabulate industry_a - would produce:

Frequency Percent

2 3 0.3

3 3 0.3

4 1 0.1

5 1 0.1

7 1 0.1

8 1 0.1

What I would like now is a table like this: The first row displays the industry code, the second row displays how many individuals worked in the industry before the event and the third row shows how many individuals worked in the industry after the event and the fourth column the difference between the two shares (this is optional, just for ease of seeing how the shares evolved):

Industry Before After Difference

1 0,1 0 0,1

2 0 0,3 -0,3

3 0,2 0,3 -0,1

4 0,4 0,1 0,3

5 0 0,1 -0,1

6 0,2 0 0,2

7 0 0,1 -0,1

8 0 0,1 -0,1

9 0,1 0 0,1

I tried -tabstat- with the count option, but I can't handle the fact the there are industrie codes in "industry_b" that aren't in "industry_a" and vice versa. This is a lot of tables, I hope it became clearer what I want to accomplish.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35713

22 Mar 2017, 12:29

A fake example is fine. This works. There may well be a more direct way. (What you are labelling percents I see as proportions.)

Code:

. clear

. input Pers industry_b   industry_a

          Pers  industr~b  industr~a
  1. 5001    4       3
  2. 5002    3       3
  3. 5003    4       2
  4. 5004    1       5
  5. 5005    6       7
  6. 5006    6       2
  7. 5007    4       2
  8. 5008    4       4
  9. 5009    9       8
 10. 5010    3       3
 11. end

.
. reshape long industry_, i(Pers) j(type) string
(note: j = a b)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                       10   ->      20
Number of variables                   3   ->       3
j variable (2 values)                     ->   type
xij variables:
                  industry_a industry_b   ->   industry_
-----------------------------------------------------------------------------

. contract industry type

. reshape wide _freq, i(industry) j(type) string
(note: j = a b)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                       11   ->       9
Number of variables                   3   ->       3
j variable (2 values)              type   ->   (dropped)
xij variables:
                                  _freq   ->   _freqa _freqb
-----------------------------------------------------------------------------

. mvencode _*, mv(0)
      _freqa: 3 missing values recoded
      _freqb: 4 missing values recoded

. rename (_freq*) (freq_*)

. egen pr_a = pc(freq_a) , prop

. egen pr_b = pc(freq_b) , prop

. gen diff = pr_b - pr_a

. l

     +-------------------------------------------------+
     | indust~_   freq_a   freq_b   pr_a   pr_b   diff |
     |-------------------------------------------------|
  1. |        1        0        1      0     .1     .1 |
  2. |        2        3        0     .3      0    -.3 |
  3. |        3        3        2     .3     .2    -.1 |
  4. |        4        1        4     .1     .4     .3 |
  5. |        5        1        0     .1      0    -.1 |
     |-------------------------------------------------|
  6. |        6        0        2      0     .2     .2 |
  7. |        7        1        0     .1      0    -.1 |
  8. |        8        1        0     .1      0    -.1 |
  9. |        9        0        1      0     .1     .1 |
     +-------------------------------------------------+

Here is the code in one:

Code:

clear
input Pers industry_b    industry_a
5001    4    3
5002    3    3
5003    4    2
5004    1    5
5005    6    7
5006    6    2
5007    4    2
5008    4    4
5009    9    8
5010    3    3
end

reshape long industry_, i(Pers) j(type) string
contract industry type
reshape wide _freq, i(industry) j(type) string
mvencode _*, mv(0)
rename (_freq*) (freq_*)
egen pr_a = pc(freq_a) , prop
egen pr_b = pc(freq_b) , prop
gen diff = pr_b - pr_a
l, sep(0)

Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

22 Mar 2017, 12:37

Added in edit: crossed with Nick's post #4, a complete solution, but do see the advice in the last paragraph below.

Here is some technique that may start you toward the solution you want. The key is using the reshape command to convert your data from a wide layout to a long one. This makes the tabulation straightforward, and will almost certainly make other analysis with this data easier. The experienced users here generally agree that, with few exceptions, Stata makes it much more straightforward to accomplish complex analyses using a long layout of your data rather than a wide layout of the same data.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(pers ind_b ind_a)
5001 4 3
5002 3 3
5003 4 2
5004 1 5
5005 6 7
5006 6 2
5007 4 2
5008 4 4
5009 9 8
5010 3 3
end
dataex
exit
reshape long ind_, i(pers) j(period) string
rename ind_ industry
list, sepby(pers)
tabulate industry period, column

You should run this example in your copy of Stata to understand what is happening. The output from the tabulate command is as follows.

Code:

. tab industry period, column

+-------------------+
| Key               |
|-------------------|
|     frequency     |
| column percentage |
+-------------------+

           |        period
  industry |         a          b |     Total
-----------+----------------------+----------
         1 |         0          1 |         1
           |      0.00      10.00 |      5.00
-----------+----------------------+----------
         2 |         3          0 |         3
           |     30.00       0.00 |     15.00
-----------+----------------------+----------
         3 |         3          2 |         5
           |     30.00      20.00 |     25.00
-----------+----------------------+----------
         4 |         1          4 |         5
           |     10.00      40.00 |     25.00
-----------+----------------------+----------
         5 |         1          0 |         1
           |     10.00       0.00 |      5.00
-----------+----------------------+----------
         6 |         0          2 |         2
           |      0.00      20.00 |     10.00
-----------+----------------------+----------
         7 |         1          0 |         1
           |     10.00       0.00 |      5.00
-----------+----------------------+----------
         8 |         1          0 |         1
           |     10.00       0.00 |      5.00
-----------+----------------------+----------
         9 |         0          1 |         1
           |      0.00      10.00 |      5.00
-----------+----------------------+----------
     Total |        10         10 |        20
           |    100.00     100.00 |    100.00

Obviously more work is needed, and perhaps a different command than tabulate.

With that said, let me recommend that before your next post you review the Statalist FAQ linked to from the top of the page, as Nick recommended. See especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using CODE delimiters, and to use the dataex command to provide sample data, as described in section 12 of the FAQ. In particular, dataex ensures that the data as read faithfully reproduce the data types, formats, labels, and value labels in your data.

Comment

castro rene

Join Date: Mar 2017

Posts: 7
#6

23 Mar 2017, 00:49

Hi, I have a problem with percetages. This is my first post.

How can I get row percentages with the contract command, similar to the obtained with the tab command??

In the auto dataset, I try "contract rep78 foreign , percent(percent)" but the result is very similar to "tab rep78 foreign, col", and I need the row percentages to store them in a new dataset.

I hope somebody can help me.

Thank you!!
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4420
#7

23 Mar 2017, 02:39

You could do something like the following.

Code:

contract rep78 foreign, freq(count) reshape wide count, i(rep78) j(foreign) forvalues i = 0/1 { generate double pct`i' = 100 * count`i' / (count0 + count1) }

There might be an official or user-written command that will do what you want in one step, but I can't think of any at the moment.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35713

23 Mar 2017, 03:22

The example given happens to have a binary column variable, so it can be done directly:

Code:

. sysuse auto, clear
(1978 Automobile Data)

. egen pcforeign = mean(100 * foreign), by(rep78) 

. tabdisp rep78, c(pcforeign) format(%4.2f)

----------------------
Repair    |
Record    |
1978      |  pcforeign
----------+-----------
        1 |       0.00
        2 |       0.00
        3 |      10.00
        4 |      50.00
        5 |      81.82
        . |      20.00
----------------------

Not a general solution, but worth a mention.

Comment

Julian Emmler

Join Date: Mar 2017

Posts: 7
#9

23 Mar 2017, 04:47

Thank you very much Nick and William, that was exactly what i wanted.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35713
#10

23 Mar 2017, 05:29

Note that in #4 using

Code:

contract industry type, zero

would make the mvencode statement unnecessary.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#11

23 Mar 2017, 05:55

castro rene at post #6: Welcome to Statalist.

Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question.

In particular, while this topic did discuss the contract command, your question appears to have been overlooked in the discussion of the question that initiated this topic.

Instead of posting a new question in an existing topic, you should start a new topic with a descriptive title so that you attract members to read it who may be able to offer advice.

With that said, I do not think contract offers row percentages. Perhaps the table command with the replace option can do what you seek. Or perhaps more easily, you can add code to calculate the row percentages yourself following the contract command.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35713
#12

23 Mar 2017, 07:13

#7 and #8 are answers to #6.
Comment
castro rene

Join Date: Mar 2017

Posts: 7
#13

25 Apr 2017, 10:44

Thanks!!! @Nick Cox and @Joseph Coveney

I resolved the problem temporary, creating two new variables with "collapse (count)" for the main variable and his category of interest respectively, and then dividing both into a new third variable.

But your suggestions looks simpler and faster. I will try them.
Comment

	Frequency	Percent
1	1	0.1
3	2	0.2
4	4	0.4
6	2	0.2
9	1	0.1

	Frequency	Percent
2	3	0.3
3	3	0.3
4	1	0.1
5	1	0.1
7	1	0.1
8	1	0.1

Industry	Before	After	Difference
1	0,1	0	0,1
2	0	0,3	-0,3
3	0,2	0,3	-0,1
4	0,4	0,1	0,3
5	0	0,1	-0,1
6	0,2	0	0,2
7	0	0,1	-0,1
8	0	0,1	-0,1
9	0,1	0	0,1

Announcement