How to measure correlation for dummy variables?

Can Deniz

Join Date: Mar 2021

Posts: 16
#1

How to measure correlation for dummy variables?

02 Sep 2021, 13:08

Dear experts,

In my study, I want to examine whether the effective tax rate (ETR) of companies depends on the company size or the industry sector. Therefore, I have a panel dataset, where I use dummy (factor) variables for the independent variable i.size (small medium large ) and i.industry (industry 1, industry 2,..., industry 20).

How can I measure the correlation between i.size and the ETR and i.industry and ETR?

Many thanks.

Best wishes,
Can

Last edited by Can Deniz; 02 Sep 2021, 13:16.
Tags: None
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

02 Sep 2021, 13:38

For some reason -correlate- and -pwcorr- do not accept factor variable notation.

Generate your dummies manually, and calculate the correlations this way.
1 like
Comment
Maxence Morlet

Join Date: Mar 2021

Posts: 653
#3

02 Sep 2021, 13:48

Dear Can,

If you manually write out all size and industry variables, type something like this:

Code:

spearman industry1 industry2 industry3... ETR, stats(rho p obs)

This will give you Spearman's correlation coefficient, which relaxes the assumption of normal distribution of the variables made by Pearson's correlation coefficient
1 like
Comment
Can Deniz

Join Date: Mar 2021

Posts: 16
#4

02 Sep 2021, 14:37

I have now created the dummies manually.

But how can I interpret the correlation coefficients if they are dummy variables?

For example, the correlation coefficient for the large company is negatively correlated with ETR (r = -0.05).

"The larger a large company is, the lower the ETR" can't be right, can it?

Not to mention the interpretation for the Industry ..

Code:

| ETR -------------+-------------------- ETR | 1.0000 large | -0.0563 small | 0.0340 medium | 0.0069 industry_1 | -0.0058
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#5

02 Sep 2021, 18:30

Correlations are defined among indicator variables so long both variables in a pair have some zeros and some ones. (If not, then at least one SD is zero and the correlation is indeterminate.)

If you have dummy variables or indicator variables then Spearman and Pearson correlations are necessarily identical. The reason is that instead of dealing with several zeros and several ones we are just dealing with the average rank of the zeros and the average rank of the ones. So, the ranks are just a linear transformation of the zeros and ones (that also preserves sign, with a convention that 0 ranks lower than 1) and the correlation is identical.

Here is a silly example.

Code:

. sysuse auto, clear (1978 automobile data) . gen foo = _n <= 37 . correlate foo foreign (obs=74) | foo foreign -------------+------------------ foo | 1.0000 foreign | -0.6504 1.0000 . spearman foo foreign Number of obs = 74 Spearman's rho = -0.6504

So in this special case using Spearman isn't using different information, and gains nothing extra. Normality of marginal distribution is unattainable here, as is bivariate normality, and indeed a red herring, as for decades,now we have had ways of getting confidence intervals and P-values for correlations if we really care.

The scatter plot is easily thought about. If x = y then we have, with the condition from the first sentence, points at (0. 0) and (1, 1) and the correlation is identically 1, and vice versa If x = -y the analysis is also easy. And so on.

See also https://stats.stackexchange.com/ques...een-two-boolea which shows some measure of dissatisfaction with the arguments given -- and -- if it is new to you -- a community in which voting on the merits of different answers and comments is allowed, once contributors themselves have been upvoted moderately.

Last edited by Nick Cox; 02 Sep 2021, 18:58.
Comment
Maxence Morlet

Join Date: Mar 2021

Posts: 653
#6

03 Sep 2021, 02:32

A very informative answer Nick Cox, I learned something
Comment
Can Deniz

Join Date: Mar 2021

Posts: 16
#7

03 Sep 2021, 10:14

Nick Cox So what do I need to do in my case? How can I measure the correlation and how is the correlation to be interpreted for dummy variables?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#8

03 Sep 2021, 10:24

Your question has already been answered so far as I can see. My reply was mostly aimed at the last sentence in #3. You can use correlate or spearman to calculate the correlations -- it doesn't matter which -- and their interpretation isn't different from that of any other correlation, except that bivariate normality is not a pertinent reference case. Correlation is quantifying how well a relationship is linear. If any variable is constant, the correlation is indeterminate but that is always true.
Comment
Can Deniz

Join Date: Mar 2021

Posts: 16
#9

03 Sep 2021, 13:17

Nick Cox In my case, does it then make sense to use the correlation coefficient of the dummy variables with the tax rate (ETR)? I want to examine whether the ETR depends on firm size and industry..

If not, should I just do a multivariate regression analysis and use the significance to evaluate the influence?

And could you please give an example for the interpretation based on my output in #4?

Last edited by Can Deniz; 03 Sep 2021, 13:22.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#10

04 Sep 2021, 02:15

When threads seem to be asking what should I do next in my project, my reaction ,is that you need specialist advice from people in your field, not least your supervisor or advisor if you have one.

Here I am not in your field; So I have no advice on modelling ETR. I suspect that you say multivariate regression and perhaps mean multiple regression.

Similarly, I don't know what advice you want on interpretation of correlations. The economic interpretation of correlations is a matter for ... economists.

Sorry, that is just "no idea what you want here" repeated.
Comment

Announcement

How to measure correlation for dummy variables?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment