Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating bar graph with two groups: industries and centiles

    I am currently using Stata 16.1 on Windows 10.

    I'm trying to create a bar graph like attached below. However, I cannot seem to get the code right.

    Variable Y is PD_12m, continuous
    Variable ESG is on the X axes, grouped by industry ánd low/high percentiles.

    The numbers on top of the bars are the averages of variable PD_12m, in industry x, in the low or high percentile (50/50) of ESG scores.

    I have for example tried using a dummy for ESG lower percentile being 1:

    Code:
    gen byte dummy = ESG_score <= r(p50)
    
    graph bar (mean) PD_12m, over(i_id) over(dummy)
    However, the graph only depicts the mean per industry and not per "low" and "high" ESG percentile. Does anyone have a solution?

    Click image for larger version

Name:	Bar graph.png
Views:	1
Size:	163.4 KB
ID:	1713822


  • #2
    Your code looks fine so far as it goes. But you need to run summarize first to get r(p50) as a saved result. My guess is that you omitted that step. If you did then a call to r(p50) returns missing and all values go in one bin, and are not separated into two bins.

    The division wanted includes (1) less than or equal to median and (2) above median. I wouldn't describe those bins as low or high percentile myself.

    There is nothing we can check otherwise as you don't give a data example and your graph is explained to be for yet other data.

    Here is some technique with a reproducible example.

    Code:
    sysuse auto, clear 
    
    su price, detail 
    gen low_or_high_price = price <= r(p50)
    
    egen mean = mean(mpg), by(rep78 low_or_high_price) 
    egen count = count(mpg), by(rep78 low_or_high_price) 
    
    tabdisp rep78 low_or_high_price, c(mean count)
    
    graph bar (mean) mpg , over(low_or_high_price) over(rep78)

    Comment


    • #3
      Thank you Nick, with your help I have been able to create a very nice graph. You are right, in the text I write around the graph I will mention low and high refer to less & equal to the median and higher than the median respectively.

      Would you know any way that I could add (append) to the graph underneath the following:
      Code:
       tabdisp low_or_high_ESG, c(mean)
      Which gives the overall mean:
      Code:
       
      ----------------------
      low_or_hi |
      gh_ESG    |       mean
      ----------+-----------
              0 |   .0012439
              1 |   .0004687
      ----------------------
      Click image for larger version

Name:	Graph2.png
Views:	1
Size:	46.8 KB
ID:	1713937

      Comment


      • #4
        You can add yline() for those levels, or apply the trick in https://journals.sagepub.com/doi/pdf...867X1401400117

        Comment


        • #5
          I tried to apply the trick from the journal, but I am doing something wrong. I have:
          Code:
          su ESG_score, detail
          gen low_or_high_ESG = ESG_score >= r(p50)
          
          egen mean = mean(PD_12m), by(i_id low_or_high_ESG)
          egen count = count(PD_12m), by(i_id low_or_high_ESG)
          
          tabdisp i_id low_or_high_ESG, c(mean count)
          
          separate low_or_high_ESG, by(1)
          
          preserve
          expand 2
          
          replace i_id = .z if _n > _N/2
          
          label define i_id .z "all"
          label values i_id i_id
          
          graph bar (mean) PD_12m, over(low_or_high_ESG) over (low_or_high_ESG1) over(i_id) ytitle(Mean of PD_12m)nofill missing
          I am now getting a bar with a dot which should not appear, and above "all" there are not the overall mean values that it should be such as described above. I am sorry if I do obvious things wrong, but I cannot seem to figure it out.

          Click image for larger version

Name:	Graph.png
Views:	1
Size:	25.1 KB
ID:	1713964

          Comment


          • #6
            When you went expand 2 you copied all observations, but some have 0 and some have 1 on your low or high indicator, just as the original. So you get two extra means, not one. That's the main story.

            You need to do more work!

            Code:
            sysuse auto, clear 
            
            su price, detail 
            gen low_or_high_price = price <= r(p50)
            
            egen mean = mean(mpg), by(rep78 low_or_high_price) 
            egen count = count(mpg), by(rep78 low_or_high_price) 
            
            tabdisp rep78 low_or_high_price, c(mean count)
            
            graph bar (mean) mpg , over(low_or_high_price) over(rep78)
            
            expand 2 
            
            replace rep78 = 6 if _n > _N/2 
            label def rep78 6 "all"
            label val rep78 rep78 
            replace low_or_high_price = 2 if _n > _N/2 
            label def which 2 "0 or 1"
            label val low_or_high  which 
            
            graph bar (mean) mpg , over(low_or_high_price) over(rep78) nofill

            Comment


            • #7
              Thanks Nick, you really helped me out and I have managed now.

              Comment

              Working...
              X