Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculation of average (mean) sorted by various variable group

    Hello,
    I would like to calculate the mean of my average stock returns grouped by another variable, which splits my oberservations in 3 groups.

    Basically i want to replicate this formula:

    EMI = efficient minus inefficient

    I tried something like:

    bysort SOX_size3 re_year Month : gen EMI1_SOX_e = avgSOXreturn100 if SOX_size3==1

    and then take this as a new generated variable and tried to substract it from

    bysort SOX_size3 re_year Month : gen EMI1_SOX_ie = avgSOXreturn100 if SOX_size3==3

    as a new generated variable, to get something like: EMI_SOX

    But this is not working because the data looks in a similar way like this, if I sort my data on my criteria (SOX_size3, re_year and Month):
    SOX_size3 avgSOXreturn EMI_SOX_e EMISOX_ie EMI_SOX
    1 123 123 missing values 123
    1 123 123 missing values 123
    1 123 123 missing values 123
    1 123 123 missing values 123
    1 123 123 missing values 123
    2 123 missing values missing values 123
    2 123 missing values missing values 123
    2 123 missing values missing values 123
    2 123 missing values missing values 345
    2 123 missing values missing values 345
    3 123 missing values 345 345
    3 123 missing values 345 345
    3 123 missing values 345 345
    3 123 missing values 345 345
    3 123 missing values 345 345
    3 123 missing values 345 345
    For SOX_size3: 1 means efficient and 3 means inefficient.

    EMI_SOX should be the average of EMI_SOX_e minus EMI_SOX_ie.
    Missing Values can be ignored.

    I am pretty sure to use something with bysort, generate and sort in Stata, but I cannot figure it out. I hope someone of you can help.

    Below you find my dataex extract.

    Thanks.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long RIC_n int re_year byte Month double SOX_size3 float(avgSOXreturn100 EMI1_SOX_e EMI1_SOX_ie)
     3 2009 1 1 .09465913 .09465913 .
     4 2009 1 1 .09465913 .09465913 .
     6 2009 1 1 .09465913 .09465913 .
     7 2009 1 1 .09465913 .09465913 .
     9 2009 1 1 .09465913 .09465913 .
    11 2009 1 1 .09465913 .09465913 .
    14 2009 1 1 .09465913 .09465913 .
    23 2009 1 1 .09465913 .09465913 .
    26 2009 1 1 .09465913 .09465913 .
    27 2009 1 1 .09465913 .09465913 .
    34 2009 1 1 .09465913 .09465913 .
    42 2009 1 1 .09465913 .09465913 .
    46 2009 1 1 .09465913 .09465913 .
    48 2009 1 1 .09465913 .09465913 .
    50 2009 1 1 .09465913 .09465913 .
    59 2009 1 1 .09465913 .09465913 .
    60 2009 1 1 .09465913 .09465913 .
     3 2009 2 1 -.4017774 -.4017774 .
     4 2009 2 1 -.4017774 -.4017774 .
     6 2009 2 1 -.4017774 -.4017774 .
     7 2009 2 1 -.4017774 -.4017774 .
     9 2009 2 1 -.4017774 -.4017774 .
    11 2009 2 1 -.4017774 -.4017774 .
    14 2009 2 1 -.4017774 -.4017774 .
    23 2009 2 1 -.4017774 -.4017774 .
    26 2009 2 1 -.4017774 -.4017774 .
    27 2009 2 1 -.4017774 -.4017774 .
    34 2009 2 1 -.4017774 -.4017774 .
    42 2009 2 1 -.4017774 -.4017774 .
    46 2009 2 1 -.4017774 -.4017774 .
    48 2009 2 1 -.4017774 -.4017774 .
    50 2009 2 1 -.4017774 -.4017774 .
    59 2009 2 1 -.4017774 -.4017774 .
    60 2009 2 1 -.4017774 -.4017774 .
     3 2009 3 1 .24362627 .24362627 .
     4 2009 3 1 .24362627 .24362627 .
     6 2009 3 1 .24362627 .24362627 .
     7 2009 3 1 .24362627 .24362627 .
     9 2009 3 1 .24362627 .24362627 .
    11 2009 3 1 .24362627 .24362627 .
    14 2009 3 1 .24362627 .24362627 .
    23 2009 3 1 .24362627 .24362627 .
    26 2009 3 1 .24362627 .24362627 .
    27 2009 3 1 .24362627 .24362627 .
    34 2009 3 1 .24362627 .24362627 .
    42 2009 3 1 .24362627 .24362627 .
    46 2009 3 1 .24362627 .24362627 .
    48 2009 3 1 .24362627 .24362627 .
    50 2009 3 1 .24362627 .24362627 .
    59 2009 3 1 .24362627 .24362627 .
    60 2009 3 1 .24362627 .24362627 .
     3 2009 4 1  .8238223  .8238223 .
     4 2009 4 1  .8238223  .8238223 .
     6 2009 4 1  .8238223  .8238223 .
     7 2009 4 1  .8238223  .8238223 .
     9 2009 4 1  .8238223  .8238223 .
    11 2009 4 1  .8238223  .8238223 .
    14 2009 4 1  .8238223  .8238223 .
    23 2009 4 1  .8238223  .8238223 .
    26 2009 4 1  .8238223  .8238223 .
    27 2009 4 1  .8238223  .8238223 .
    34 2009 4 1  .8238223  .8238223 .
    42 2009 4 1  .8238223  .8238223 .
    46 2009 4 1  .8238223  .8238223 .
    48 2009 4 1  .8238223  .8238223 .
    50 2009 4 1  .8238223  .8238223 .
    59 2009 4 1  .8238223  .8238223 .
    60 2009 4 1  .8238223  .8238223 .
     3 2009 5 1 .59562814 .59562814 .
     4 2009 5 1 .59562814 .59562814 .
     6 2009 5 1 .59562814 .59562814 .
     7 2009 5 1 .59562814 .59562814 .
     9 2009 5 1 .59562814 .59562814 .
    11 2009 5 1 .59562814 .59562814 .
    14 2009 5 1 .59562814 .59562814 .
    23 2009 5 1 .59562814 .59562814 .
    26 2009 5 1 .59562814 .59562814 .
    27 2009 5 1 .59562814 .59562814 .
    34 2009 5 1 .59562814 .59562814 .
    42 2009 5 1 .59562814 .59562814 .
    46 2009 5 1 .59562814 .59562814 .
    48 2009 5 1 .59562814 .59562814 .
    50 2009 5 1 .59562814 .59562814 .
    59 2009 5 1 .59562814 .59562814 .
    60 2009 5 1 .59562814 .59562814 .
     3 2009 6 1 -.2812882 -.2812882 .
     4 2009 6 1 -.2812882 -.2812882 .
     6 2009 6 1 -.2812882 -.2812882 .
     7 2009 6 1 -.2812882 -.2812882 .
     9 2009 6 1 -.2812882 -.2812882 .
    11 2009 6 1 -.2812882 -.2812882 .
    14 2009 6 1 -.2812882 -.2812882 .
    23 2009 6 1 -.2812882 -.2812882 .
    26 2009 6 1 -.2812882 -.2812882 .
    27 2009 6 1 -.2812882 -.2812882 .
    34 2009 6 1 -.2812882 -.2812882 .
    42 2009 6 1 -.2812882 -.2812882 .
    46 2009 6 1 -.2812882 -.2812882 .
    48 2009 6 1 -.2812882 -.2812882 .
    50 2009 6 1 -.2812882 -.2812882 .
    end
    label values RIC_n RIC_n
    label def RIC_n 3 "AKERBP.OL", modify
    label def RIC_n 4 "BASFn.F", modify
    label def RIC_n 6 "BWO.OL", modify
    label def RIC_n 7 "CLB", modify
    label def RIC_n 9 "ENAG.MC", modify
    label def RIC_n 11 "EQNR.OL", modify
    label def RIC_n 14 "EUAV.BR", modify
    label def RIC_n 23 "HBR.L", modify
    label def RIC_n 26 "LTSP.WA", modify
    label def RIC_n 27 "LUNE.ST", modify
    label def RIC_n 34 "OMVV.VI", modify
    label def RIC_n 42 "REC.OL", modify
    label def RIC_n 46 "ROSNP.BX", modify
    label def RIC_n 48 "SBMO.AS", modify
    label def RIC_n 50 "SGREN.MC", modify
    label def RIC_n 59 "VLLP.PA", modify
    label def RIC_n 60 "VOPA.AS", modify
    label values SOX_size3 SOX_size3l
    label def SOX_size3l 1 "1 (Efficient within CO2)", modify
    Last edited by Peter Dot; 30 Sep 2021, 07:39.

  • #2
    Your fake data are a good starting place.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte criteriaforsort int(averagereturn avgsoxreturn)
    1 123 123
    1 123 123
    1 123 123
    1 123 123
    1 123 123
    2 123   .
    2 123   .
    2 123   .
    2 123   .
    2 123   .
    3 123 345
    3 123 345
    3 123 345
    3 123 345
    3 123 345
    3 123 345
    end
    
    . egen mean3 = mean(cond(crit == 3, avgsoxreturn, .))
    
    . egen mean1 = mean(cond(crit == 1, avgsoxreturn, .))
    
    . gen wanted = mean3 - mean1
    
    
    . l, sepby(crit)
    
         +---------------------------------------------------------+
         | criter~t   averag~n   avgsox~n   mean3   mean1   wanted |
         |---------------------------------------------------------|
      1. |        1        123        123     345     123      222 |
      2. |        1        123        123     345     123      222 |
      3. |        1        123        123     345     123      222 |
      4. |        1        123        123     345     123      222 |
      5. |        1        123        123     345     123      222 |
         |---------------------------------------------------------|
      6. |        2        123          .     345     123      222 |
      7. |        2        123          .     345     123      222 |
      8. |        2        123          .     345     123      222 |
      9. |        2        123          .     345     123      222 |
     10. |        2        123          .     345     123      222 |
         |---------------------------------------------------------|
     11. |        3        123        345     345     123      222 |
     12. |        3        123        345     345     123      222 |
     13. |        3        123        345     345     123      222 |
     14. |        3        123        345     345     123      222 |
     15. |        3        123        345     345     123      222 |
     16. |        3        123        345     345     123      222 |
         +---------------------------------------------------------+
    To take this further, you need a by() option or a by: prefix on the egen calls.

    See also https://www.stata-journal.com/articl...article=dm0055 especially Section 9.

    Comment


    • #3
      Hi Nick, thanks for your reply.

      I tried the following:

      bysort SOX_size3 re_year Month : egen EMI1_SOX_e = mean(cond(SOX_size3 == 1, avgSOXreturn100, .))

      bysort SOX_size3 re_year Month : egen EMI1_SOX_ie = mean(cond(SOX_size3 == 3, avgSOXreturn100, .))

      gen EMI1 = EMI1_SOX_e - EMI1_SOX_ie



      and it is not working, cause my EMI1 is over my 7.200 observation empty.

      I think it was my fault on how to describe the problem. My fake data should describe, that I want to substract from e.g. observation#1 (with EMI_SOX_e) - obvervation#5624 (with EMI_SOX_ie) because after sorting (with my above criteria) there are values inside within my variables.

      Attached again my datex:

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input long RIC_n byte Month int re_year double SOX_size3 float(avgSOXreturn100 EMI1_SOX_e EMI1_SOX_ie EMI1)
      46 1 2009 1 .09465913 .09465913 . .
       4 1 2009 1 .09465913 .09465913 . .
      60 1 2009 1 .09465913 .09465913 . .
      50 1 2009 1 .09465913 .09465913 . .
      11 1 2009 1 .09465913 .09465913 . .
       6 1 2009 1 .09465913 .09465913 . .
      27 1 2009 1 .09465913 .09465913 . .
      26 1 2009 1 .09465913 .09465913 . .
      34 1 2009 1 .09465913 .09465913 . .
      48 1 2009 1 .09465913 .09465913 . .
       9 1 2009 1 .09465913 .09465913 . .
       3 1 2009 1 .09465913 .09465913 . .
       7 1 2009 1 .09465913 .09465913 . .
      14 1 2009 1 .09465913 .09465913 . .
      23 1 2009 1 .09465913 .09465913 . .
      42 1 2009 1 .09465913 .09465913 . .
      59 1 2009 1 .09465913 .09465913 . .
      42 2 2009 1 -.4017774 -.4017774 . .
      50 2 2009 1 -.4017774 -.4017774 . .
      14 2 2009 1 -.4017774 -.4017774 . .
       7 2 2009 1 -.4017774 -.4017774 . .
       9 2 2009 1 -.4017774 -.4017774 . .
      11 2 2009 1 -.4017774 -.4017774 . .
      60 2 2009 1 -.4017774 -.4017774 . .
       4 2 2009 1 -.4017774 -.4017774 . .
       3 2 2009 1 -.4017774 -.4017774 . .
       6 2 2009 1 -.4017774 -.4017774 . .
      23 2 2009 1 -.4017774 -.4017774 . .
      48 2 2009 1 -.4017774 -.4017774 . .
      27 2 2009 1 -.4017774 -.4017774 . .
      34 2 2009 1 -.4017774 -.4017774 . .
      46 2 2009 1 -.4017774 -.4017774 . .
      59 2 2009 1 -.4017774 -.4017774 . .
      26 2 2009 1 -.4017774 -.4017774 . .
       4 3 2009 1 .24362627 .24362627 . .
      14 3 2009 1 .24362627 .24362627 . .
       7 3 2009 1 .24362627 .24362627 . .
      26 3 2009 1 .24362627 .24362627 . .
      23 3 2009 1 .24362627 .24362627 . .
       6 3 2009 1 .24362627 .24362627 . .
      34 3 2009 1 .24362627 .24362627 . .
      46 3 2009 1 .24362627 .24362627 . .
      27 3 2009 1 .24362627 .24362627 . .
      48 3 2009 1 .24362627 .24362627 . .
      11 3 2009 1 .24362627 .24362627 . .
       9 3 2009 1 .24362627 .24362627 . .
      60 3 2009 1 .24362627 .24362627 . .
      59 3 2009 1 .24362627 .24362627 . .
      50 3 2009 1 .24362627 .24362627 . .
      42 3 2009 1 .24362627 .24362627 . .
       3 3 2009 1 .24362627 .24362627 . .
      48 4 2009 1  .8238223  .8238223 . .
      27 4 2009 1  .8238223  .8238223 . .
      23 4 2009 1  .8238223  .8238223 . .
       6 4 2009 1  .8238223  .8238223 . .
       3 4 2009 1  .8238223  .8238223 . .
      11 4 2009 1  .8238223  .8238223 . .
       9 4 2009 1  .8238223  .8238223 . .
      46 4 2009 1  .8238223  .8238223 . .
      34 4 2009 1  .8238223  .8238223 . .
      26 4 2009 1  .8238223  .8238223 . .
      60 4 2009 1  .8238223  .8238223 . .
      59 4 2009 1  .8238223  .8238223 . .
      50 4 2009 1  .8238223  .8238223 . .
      42 4 2009 1  .8238223  .8238223 . .
       4 4 2009 1  .8238223  .8238223 . .
       7 4 2009 1  .8238223  .8238223 . .
      14 4 2009 1  .8238223  .8238223 . .
      50 5 2009 1 .59562814 .59562814 . .
       4 5 2009 1 .59562814 .59562814 . .
      60 5 2009 1 .59562814 .59562814 . .
      59 5 2009 1 .59562814 .59562814 . .
      14 5 2009 1 .59562814 .59562814 . .
      26 5 2009 1 .59562814 .59562814 . .
      48 5 2009 1 .59562814 .59562814 . .
      11 5 2009 1 .59562814 .59562814 . .
       7 5 2009 1 .59562814 .59562814 . .
       9 5 2009 1 .59562814 .59562814 . .
      42 5 2009 1 .59562814 .59562814 . .
       3 5 2009 1 .59562814 .59562814 . .
       6 5 2009 1 .59562814 .59562814 . .
      23 5 2009 1 .59562814 .59562814 . .
      27 5 2009 1 .59562814 .59562814 . .
      34 5 2009 1 .59562814 .59562814 . .
      46 5 2009 1 .59562814 .59562814 . .
      48 6 2009 1 -.2812882 -.2812882 . .
      27 6 2009 1 -.2812882 -.2812882 . .
       6 6 2009 1 -.2812882 -.2812882 . .
      59 6 2009 1 -.2812882 -.2812882 . .
      60 6 2009 1 -.2812882 -.2812882 . .
      46 6 2009 1 -.2812882 -.2812882 . .
      23 6 2009 1 -.2812882 -.2812882 . .
       9 6 2009 1 -.2812882 -.2812882 . .
      11 6 2009 1 -.2812882 -.2812882 . .
       3 6 2009 1 -.2812882 -.2812882 . .
      42 6 2009 1 -.2812882 -.2812882 . .
      34 6 2009 1 -.2812882 -.2812882 . .
       7 6 2009 1 -.2812882 -.2812882 . .
       4 6 2009 1 -.2812882 -.2812882 . .
      26 6 2009 1 -.2812882 -.2812882 . .
      end
      label values RIC_n RIC_n
      label def RIC_n 3 "AKERBP.OL", modify
      label def RIC_n 4 "BASFn.F", modify
      label def RIC_n 6 "BWO.OL", modify
      label def RIC_n 7 "CLB", modify
      label def RIC_n 9 "ENAG.MC", modify
      label def RIC_n 11 "EQNR.OL", modify
      label def RIC_n 14 "EUAV.BR", modify
      label def RIC_n 23 "HBR.L", modify
      label def RIC_n 26 "LTSP.WA", modify
      label def RIC_n 27 "LUNE.ST", modify
      label def RIC_n 34 "OMVV.VI", modify
      label def RIC_n 42 "REC.OL", modify
      label def RIC_n 46 "ROSNP.BX", modify
      label def RIC_n 48 "SBMO.AS", modify
      label def RIC_n 50 "SGREN.MC", modify
      label def RIC_n 59 "VLLP.PA", modify
      label def RIC_n 60 "VOPA.AS", modify
      label values SOX_size3 SOX_size3l
      label def SOX_size3l 1 "1 (Efficient within CO2)", modify
      I hope you can help. Thank you.
      Last edited by Peter Dot; 30 Sep 2021, 08:01.

      Comment


      • #4
        Not quite, as


        Code:
        bysort SOX_size3
        alone defines groups in which that variable is constant and you must have some observations with 1 and some with 3 within the same group to be able to compare.

        But you're almost there. Although it may seem redundant, a monthly date variable

        Code:
        gen mdate = ym(re_year, Month) 
        format mdate %tm
        is likely to be needed sooner or later and with that you can go


        Code:
        bysort mdate :
        and then the rest of the egen code should be good as the rest of that statement.

        Comment


        • #5
          Hey Nick, thanks but your idea helps not at all.

          Your suggestion leads me to the following data structure (sorted by SOX_size and mdate):
          Again EMI_SOX_e have values specified by SOX_size3 ==1 and EMI_SOX_ie have values specified by SOX_size3==3.
          SOX_size3 mdate avgSOXreturn100 EMI_SOX_e EMI_SOX_ie EMI
          1 2009m1 123 123 missing ==123-345
          … … … … … ==…. - ….
          1 2019m12 123 123 missing ==123-345
          2 2009m1 567 missing missing
          … … … … …
          2 2019m12 567 missing missing
          3 2009m1 345 missing 345
          … … … … …
          3 2019m12 345 missing 345

          Basically my idea is to have at the end the following:

          I would like to take the subtraction of EMI_SOX_e and EMI_SOX_ie and generate a new variable namely EMI and take after that the mean of EMI.

          I hope you can help again, thanks.


          Comment


          • #6
            Sorry, but I can't see what went wrong or how this question differs from those I have answered already.

            Concrete data examples and specific code that you ran are essential to make progress.

            Comment


            • #7
              Nick, sorry for my misunderstanding. Maybe the following screenshot will describe my problem better:

              Click image for larger version

Name:	Bildschirmfoto 2021-10-01 um 17.50.50.png
Views:	1
Size:	40.5 KB
ID:	1629885


              As you mentioned in #2 of your post, I would like to calculate the 123 minus the 345 and get rid of the middle part. So basically my observation of initially 16 will result in 5 or 6 after the operation. In my dataset the first part of the calculation has 2304 obs. and the second part 2076.

              Comment


              • #8
                If you want your result to be missing for observations with criterion 2 you can just go something like

                Code:
                 
                 gen wanted = mean3 - mean1 if inlist(crit, 1, 3)
                To see each group of repeated values just once, check out the tag() function in egen.

                Comment


                • #9
                  Nick, replying to your #6 Post I want to do the following:

                  I want to substract the above part minus the bottom part for each observations. (The data of the screenshot were sorted by: mdate and SOX_size3.)


                  Sometimes the number of the observations for each "package (see yellow box)" are not the same, I mean sometimes it is missing and the size are not the same (see: variable n_s)

                  Any suggestions to code this?

                  Sorry for my bad describing skills...
                  Attached Files
                  Last edited by Peter Dot; 04 Oct 2021, 04:45.

                  Comment


                  • #10
                    Sorry to be stupid but I can't understand

                    1. what your latest image is intended to show

                    2. what your puzzlement might be as the code I am recommending calculates a mean and in doing so ignores missing values and automatically takes account of the number of non-missing values.

                    If you look back to #2 you will see that the code had no problem in dealing with 5 values of 123 and 6 values of 345 and yielding a difference between means of 222.

                    Comment


                    • #11
                      I am so sorry Nick, your suggestions worked! I was completely wrong! Thanks for your time and patience. Thread can be closed.

                      Comment


                      • #12
                        Thanks for that. Pleased that you got there.

                        Comment

                        Working...
                        X