Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate Table, with rows/columns dependent on number in variable name.

    Dear community,

    I have currently a list of 270 stock portfolio returns (monthly data), who are constructed by sorting on specific firm characteristics.
    The sorting is done on 4 different characteristics, abbreviated by "ME", "BM", "INV", and "RMA"
    The first three are sorted into 3 categories, while the fourth is sorted on 5.
    The columns in my dataset are, therefore, labeled from R_ME1BM1INV1RMA1 to ME3BM3INV3RMW5

    I would like to create a table with mean returns dependent on the numbers in the variable name, which looks something like this:
    ME BM INV RMW-> 1 2 3 4 5
    1 1 1 mean(R_ME1BM1INV1RMA1) ...
    1 1 2 ...
    1 1 3
    1 2 1
    1 2 2
    1 2 3
    ...
    I assume that this specific format is not easy to accomplish, but I would be very happy to get a tip how I can make the number in the variable name determining the rows.
    I did not find a similar question here on Statalist.

    Thank you in advance!

  • #2
    I think this is the simplest way to do this:

    Code:
    capture program drop one_mean
    program define one_mean
        local me = me[1]
        local bm = bm[1]
        local inv = inv[1]
        local rma = rma[1]
        summ R_me`me'bm`bm'inv`inv'rma`rma', meanonly
        gen mean = r(mean)
        keep me bm inv rma mean
        keep in 1
        exit
    end
    
    runby one_mean, by(me bm inv rma)
    rename mean mean_rma
    reshape wide mean_rma, i(me bm inv) j(rma)
    To use this code you will need the program -runby-, by Robert Picard and me, available from SSC.

    This code is not tested because you did not provide an example of your data to work with. Beware of typos, errors, and possible misfits to your data. You will have to modify the code accordingly.

    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    I urge you to learn and use Stata terminology. Stata data sets are not spreadsheets and we do not speak of rows and columns when talking about them. The "rows" are observations, and the "columns" are variables. While it is pedantic to make the point, it is important. If you think about Stata as if it were a spreadsheet, you will be tempted to try to do analysis as if you were working in a spreadsheet. Your instincts from spreadsheet experience are not helpful in doing Stata analysis. Often they will get you into trouble. So it is important to banish all aspects of the spreadsheet metaphor from your mind when working with Stata. Not using spreadsheet-appropriate terminology is one start in that direction.

    Comment


    • #3
      It dawns on me that your data layout may in fact be radically different from what I imagined when writing #2. So here's a different approach that is appropriate to a different initial data layout (one in which separate variables me bm inv and rm1 do not exist, just variables R_*.

      Code:
      capture postutil clear
      tempfile building
      postfile handle byte (me bm inv rmw) float mean_R using `building'
      
      forvalues i = 1/3 {
          forvalues j = 1/3 {
              forvalues k = 1/3 {
                  forvalues m = 1/5 {
                      summ R_me`i'bm`j'inv`k'rma`m', meanonly
                      post handle (`i') (`j') (`k') (`m') (`r(mean)')
                  }
              }
          }
      }
      postclose handle
      
      use `building', clear
      rename mean_R mean_R_rma
      reshape wide mean_R_rma, i(me bm inv) j(rmw)
      This uncertainty on my part about your data layout underscores the importance of always showing example data when asking for help with code. Code is like driving directions: you can't get them right without knowing both the starting point and the destination.

      Comment


      • #4
        Dear Clyde,

        thank you very much for the support, I will try both!
        Find below a short extract from the data using -dataex-. Maybe that makes it more clear and you can give a suggestion which one to use!

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long date float(R_ME1BM1OP1RMW1 R_ME1BM1OP1RMW2 R_ME1BM1OP1RMW3 R_ME1BM1OP1RMW4 R_ME1BM1OP1RMW5 R_ME1BM1OP2RMW1 R_ME1BM1OP2RMW2 R_ME1BM1OP2RMW3 R_ME1BM1OP2RMW4 R_ME1BM1OP2RMW5 R_ME1BM1OP3RMW1)
        20000131 8.9373865 1.3906333 24.94088 19.299826 13.011122 -1.268156 1.1190631 .6036164 -.9459738 2.497578 -1.98488
        20000229 35.22927 34.94847 38.75097 35.481968 27.182663 6.657858 3.017326 1.7236083 3.302876 7.542269 3.156618
        20000331 -14.443842 -18.436178 -30.43669 -16.43674 -10.641294 -6.325729 -2.780201 -2.582879 .9272819 -2.860675 -.6665761
        20000428 -27.631174 -18.369743 -28.280313 -20.25849 -20.667427 -5.697472 -1.788506 -3.3564355 -1.567505 -2.6838365 -3.053039
        20000531 -12.100137 -9.86248 -7.880253 -10.873996 -8.364392 -4.01419 -2.157115 -.9140872 -1.4205456 -1.6339346 -4.832917
        20000630 19.299923 16.517431 27.3564 22.04087 16.068064 4.2703156 2.5752914 .6615145 2.9706404 5.293447 5.286972
        20000731 -7.549302 -3.98921 -4.1969085 -2.705402 -2.7176976 -.14799568 -.4703484 .06822845 -2.6943986 -1.9468787 .6108273
        20000831 7.224619 7.431433 7.626074 5.843343 6.180289 3.893407 2.1313396 1.5850357 3.04021 2.266343 1.711844
        20000929 -5.315658 1.8555487 -3.7677474 .849898 -5.463445 -1.1064572 -1.9216635 -1.2137762 2.597101 -1.8449255 -2.7891026
        20001031 -8.265444 -8.731771 -8.372848 -2.061017 -9.604306 -1.5189612 -1.9500257 -.9537592 -2.102842 -.54998535 -1.5104102
        end
        Best regards,
        John

        Comment


        • #5
          Well, that data layout is appropriate to the code in #3. I note, however, that the variable names in your data example are only loosely similar to the ones you referred to in #1, so the code in #3 will require some tinkering to change the variable name schemes in the code match what you actually have in your data.

          Also, the code in #3 assumes that every combination of me from 1 to 3, bm 1 to 3, inv (now called op?) 1 to 3 and rma (no called rmw?) 1 to 5 appear in the data. That is not true of your example, though you may well have chosen to just select a small number of variables to illustrate. If your data does not contain all those combinations, then the code will need another modification deep inside the loop to check for the existence of the variable before processing it.

          Comment


          • #6
            Here's a mashup of Clyde's #2 and #3 solutions, assuming that the data example in #4 is saved in "dataex.dta" and is in Stata's current directory.

            Code:
            * load data and make a dataset of variables that will appear in the table
            clear all
            use "dataex.dta"
            describe R_*, replace
            keep name
            
            * calculate the mean and extract the numbers from the variable name
            program mean_table
                local vname = name
                use `vname' using "dataex.dta", clear
                sum `vname', meanonly
                clear
                set obs 1
                foreach v in ME BM OP RMW {
                    gen `v' = regexs(1) if regexm("`vname'", "`v'([0-9]+)")
                }
                gen mean = r(mean)
                gen meanof = "`vname'"
            end
            runby mean_table, by(name) verbose
            
            list, noobs sepby(ME BM OP)
            and the results
            Code:
            . list, noobs sepby(ME BM OP)
            
              +--------------------------------------------------+
              | ME   BM   OP   RMW        mean            meanof |
              |--------------------------------------------------|
              |  1    1    1     1   -.4614357   R_ME1BM1OP1RMW1 |
              |  1    1    1     2    .2754135   R_ME1BM1OP1RMW2 |
              |  1    1    1     3    1.573956   R_ME1BM1OP1RMW3 |
              |  1    1    1     4    3.118026   R_ME1BM1OP1RMW4 |
              |  1    1    1     5    .4983575   R_ME1BM1OP1RMW5 |
              |--------------------------------------------------|
              |  1    1    2     1   -.5257381   R_ME1BM1OP2RMW1 |
              |  1    1    2     2   -.2224839   R_ME1BM1OP2RMW2 |
              |  1    1    2     3   -.4378934   R_ME1BM1OP2RMW3 |
              |  1    1    2     4    .4106844   R_ME1BM1OP2RMW4 |
              |  1    1    2     5    .6079401   R_ME1BM1OP2RMW5 |
              |--------------------------------------------------|
              |  1    1    3     1   -.4070664   R_ME1BM1OP3RMW1 |
              +--------------------------------------------------+
            
            .

            Comment


            • #7
              Wow! Both Code from #3 and #6 work greatly!

              Thank you veeeeeeery much Clyde and Robert, you saved me countless hours!

              Comment

              Working...
              X