Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to measure correlation for dummy variables?

    Dear experts,

    In my study, I want to examine whether the effective tax rate (ETR) of companies depends on the company size or the industry sector. Therefore, I have a panel dataset, where I use dummy (factor) variables for the independent variable i.size (small medium large ) and i.industry (industry 1, industry 2,..., industry 20).

    How can I measure the correlation between i.size and the ETR and i.industry and ETR?

    Many thanks.

    Best wishes,
    Can
    Last edited by Can Deniz; 02 Sep 2021, 13:16.

  • #2
    For some reason -correlate- and -pwcorr- do not accept factor variable notation.

    Generate your dummies manually, and calculate the correlations this way.

    Comment


    • #3
      Dear Can,

      If you manually write out all size and industry variables, type something like this:

      Code:
      spearman industry1 industry2 industry3... ETR, stats(rho p obs)
      This will give you Spearman's correlation coefficient, which relaxes the assumption of normal distribution of the variables made by Pearson's correlation coefficient

      Comment


      • #4
        I have now created the dummies manually.

        But how can I interpret the correlation coefficients if they are dummy variables?

        For example, the correlation coefficient for the large company is negatively correlated with ETR (r = -0.05).

        "The larger a large company is, the lower the ETR" can't be right, can it?

        Not to mention the interpretation for the Industry ..

        Code:
                     |      ETR   
        -------------+--------------------
                 ETR |   1.0000
               large |  -0.0563   
               small |   0.0340  
              medium |   0.0069 
          industry_1 |  -0.0058

        Comment


        • #5
          Correlations are defined among indicator variables so long both variables in a pair have some zeros and some ones. (If not, then at least one SD is zero and the correlation is indeterminate.)

          If you have dummy variables or indicator variables then Spearman and Pearson correlations are necessarily identical. The reason is that instead of dealing with several zeros and several ones we are just dealing with the average rank of the zeros and the average rank of the ones. So, the ranks are just a linear transformation of the zeros and ones (that also preserves sign, with a convention that 0 ranks lower than 1) and the correlation is identical.

          Here is a silly example.

          Code:
          . sysuse auto, clear
          (1978 automobile data)
          
          . gen foo = _n <= 37
          
          . correlate foo foreign
          (obs=74)
          
                       |      foo  foreign
          -------------+------------------
                   foo |   1.0000
               foreign |  -0.6504   1.0000
          
          
          . spearman foo foreign
          
           Number of obs =      74
          Spearman's rho =      -0.6504
          So in this special case using Spearman isn't using different information, and gains nothing extra. Normality of marginal distribution is unattainable here, as is bivariate normality, and indeed a red herring, as for decades,now we have had ways of getting confidence intervals and P-values for correlations if we really care.

          The scatter plot is easily thought about. If x = y then we have, with the condition from the first sentence, points at (0. 0) and (1, 1) and the correlation is identically 1, and vice versa If x = -y the analysis is also easy. And so on.

          See also https://stats.stackexchange.com/ques...een-two-boolea which shows some measure of dissatisfaction with the arguments given -- and -- if it is new to you -- a community in which voting on the merits of different answers and comments is allowed, once contributors themselves have been upvoted moderately.
          Last edited by Nick Cox; 02 Sep 2021, 18:58.

          Comment


          • #6
            A very informative answer Nick Cox, I learned something

            Comment


            • #7
              Nick Cox So what do I need to do in my case? How can I measure the correlation and how is the correlation to be interpreted for dummy variables?

              Comment


              • #8
                Your question has already been answered so far as I can see. My reply was mostly aimed at the last sentence in #3. You can use correlate or spearman to calculate the correlations -- it doesn't matter which -- and their interpretation isn't different from that of any other correlation, except that bivariate normality is not a pertinent reference case. Correlation is quantifying how well a relationship is linear. If any variable is constant, the correlation is indeterminate but that is always true.

                Comment


                • #9

                  Nick Cox In my case, does it then make sense to use the correlation coefficient of the dummy variables with the tax rate (ETR)? I want to examine whether the ETR depends on firm size and industry..

                  If not, should I just do a multivariate regression analysis and use the significance to evaluate the influence?

                  And could you please give an example for the interpretation based on my output in #4?
                  Last edited by Can Deniz; 03 Sep 2021, 13:22.

                  Comment


                  • #10
                    When threads seem to be asking what should I do next in my project, my reaction ,is that you need specialist advice from people in your field, not least your supervisor or advisor if you have one.

                    Here I am not in your field; So I have no advice on modelling ETR. I suspect that you say multivariate regression and perhaps mean multiple regression.

                    Similarly, I don't know what advice you want on interpretation of correlations. The economic interpretation of correlations is a matter for ... economists.

                    Sorry, that is just "no idea what you want here" repeated.

                    Comment

                    Working...
                    X