Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with variables in the correlation matrix

    Hello there,

    i am trying to create a correlation matrix with several variables. One of my variables from the data set contains the GIC industry code. In my matrix the correlation refers to the total variable "Industry" (The variable has been split so that only the first two digits represent the industry category). Is it possible to divide this variable into sub-variables (see the first picture) in the correlations matrix, so that I can see the correlations for each industry (and not for the total variable "Industry")?

    I am using Stata 16.1.

    Thanks for your help!
    Best regards
    Sven

    Code:
        *Calculate R&D Intensity (RDI)
        gen rdi=xrd/revt
    
        *Set a panel for the data 
        xtset gvkey fyear 
    
        *Generate lagged variables
        by gvkey: generate patent_app_count_l3=patent_app_count[_n-3]    
        by gvkey: generate cit_forw_l3= cit_forw[_n-3]    
        by gvkey: generate c_ma_deal_l1= c_ma_deal[_n-1]
        
        *Winsorize revenue at 1% level
        winsor revt, gen(revt_w) p(0.01)    
    
        *Winsorize R&D Intensity at 1% level
        winsor rdi, gen(rdi_w) p(0.01)
    
        *Calculate logarithms for R&D Intensity, revenue 
        gen ln_rdi_w = ln(rdi_w)
        gen ln_revt_w = ln(revt_w)
        
        *Simplification of the “Gind” (GIC Industry) variable
        gen industry_1 = substr(gind, 1,2)
        encode industry_1, generate(industry)
    
        *Tabulate the correlations matrix    
        pwcorr c_cvc_deal patent_app_count cit_forw ln_revt_w ln_rdi_w c_ma_deal_l1 industry, obs sig star(0.05)
    Click image for larger version

Name:	Bildschirmfoto 2020-12-04 um 12.07.28.png
Views:	1
Size:	67.6 KB
ID:	1584623


    Click image for larger version

Name:	Bildschirmfoto 2020-12-04 um 12.12.41.png
Views:	1
Size:	127.5 KB
ID:	1584624

  • #2
    Sure; this possibility is documented in the help for pwcorr (see the commented quotation in the code below).

    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . * by is allowed with correlate and pwcorr; see [D] by.
    
    . bysort foreign : pwcorr mpg weight price
    
    -----------------------------------------------------------------------------------
    -> foreign = Domestic
    
                 |      mpg   weight    price
    -------------+---------------------------
             mpg |   1.0000 
          weight |  -0.8759   1.0000 
           price |  -0.5043   0.6724   1.0000 
    
    -----------------------------------------------------------------------------------
    -> foreign = Foreign
    
                 |      mpg   weight    price
    -------------+---------------------------
             mpg |   1.0000 
          weight |  -0.6829   1.0000 
           price |  -0.6313   0.8855   1.0000
    Code:
    
    
    I am not sure how helpful it is to have 9 correlation matrices rather than 1, but the code is easy enough.

    Comment


    • #3
      Thanks for your answer!
      How can I create the correlation matrix in one table instead of nine? So that the "total" variable "industry" is broken down into their sub-categories (10, 15, 20, 25, 30, 35, 40, 45, 60)
      Like this (Just a quick example created with Excel):
      Click image for larger version

Name:	Bildschirmfoto 2020-12-04 um 14.17.55.png
Views:	1
Size:	85.0 KB
ID:	1584636


      Thanks a lot!

      Comment


      • #4
        So you want 9 correlations and 9 P-values in each cell? I don't have Stata code suggestions for that.

        Comment


        • #5
          Thank you for your feedback and I'm sorry if my problem is described a bit unclear.
          The variable "Industry" contains different industry categories (e.g.10, 20). In my correlation matrix (2nd picture) the whole variable is correlated with other variables (e.g. CVC deals). My goal is that not the single variable "Industry" is correlated, but its subcategories(1st picture). So in the end I can see e.g. that the "industry sector 10" has a more significant correlation to CVC deals than the "industry sector 20" (3rd picture: excel-example how it should look like).

          Comment

          Working...
          X