Generate Table, with rows/columns dependent on number in variable name.

John Hanser

Join Date: Jan 2018

Posts: 24
#1

Generate Table, with rows/columns dependent on number in variable name.

19 Mar 2018, 14:29

Dear community,

I have currently a list of 270 stock portfolio returns (monthly data), who are constructed by sorting on specific firm characteristics.
The sorting is done on 4 different characteristics, abbreviated by "ME", "BM", "INV", and "RMA"
The first three are sorted into 3 categories, while the fourth is sorted on 5.
The columns in my dataset are, therefore, labeled from R_ME1BM1INV1RMA1 to ME3BM3INV3RMW5

I would like to create a table with mean returns dependent on the numbers in the variable name, which looks something like this:

ME BM INV RMW-> 1 2 3 4 5

1 1 1 mean(R_ME1BM1INV1RMA1) ...

1 1 2 ...

1 1 3

1 2 1

1 2 2

1 2 3

...

I assume that this specific format is not easy to accomplish, but I would be very happy to get a tip how I can make the number in the variable name determining the rows.
I did not find a similar question here on Statalist.

Thank you in advance!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#2

19 Mar 2018, 15:25

I think this is the simplest way to do this:

Code:

capture program drop one_mean program define one_mean local me = me[1] local bm = bm[1] local inv = inv[1] local rma = rma[1] summ R_me`me'bm`bm'inv`inv'rma`rma', meanonly gen mean = r(mean) keep me bm inv rma mean keep in 1 exit end runby one_mean, by(me bm inv rma) rename mean mean_rma reshape wide mean_rma, i(me bm inv) j(rma)

To use this code you will need the program -runby-, by Robert Picard and me, available from SSC.

This code is not tested because you did not provide an example of your data to work with. Beware of typos, errors, and possible misfits to your data. You will have to modify the code accordingly.

In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.

I urge you to learn and use Stata terminology. Stata data sets are not spreadsheets and we do not speak of rows and columns when talking about them. The "rows" are observations, and the "columns" are variables. While it is pedantic to make the point, it is important. If you think about Stata as if it were a spreadsheet, you will be tempted to try to do analysis as if you were working in a spreadsheet. Your instincts from spreadsheet experience are not helpful in doing Stata analysis. Often they will get you into trouble. So it is important to banish all aspects of the spreadsheet metaphor from your mind when working with Stata. Not using spreadsheet-appropriate terminology is one start in that direction.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#3

19 Mar 2018, 15:35

It dawns on me that your data layout may in fact be radically different from what I imagined when writing #2. So here's a different approach that is appropriate to a different initial data layout (one in which separate variables me bm inv and rm1 do not exist, just variables R_*.

Code:

capture postutil clear tempfile building postfile handle byte (me bm inv rmw) float mean_R using `building' forvalues i = 1/3 { forvalues j = 1/3 { forvalues k = 1/3 { forvalues m = 1/5 { summ R_me`i'bm`j'inv`k'rma`m', meanonly post handle (`i') (`j') (`k') (`m') (`r(mean)') } } } } postclose handle use `building', clear rename mean_R mean_R_rma reshape wide mean_R_rma, i(me bm inv) j(rmw)

This uncertainty on my part about your data layout underscores the importance of always showing example data when asking for help with code. Code is like driving directions: you can't get them right without knowing both the starting point and the destination.
1 like
Comment

John Hanser

Join Date: Jan 2018
Posts: 24

20 Mar 2018, 02:31

Dear Clyde,

thank you very much for the support, I will try both!
Find below a short extract from the data using -dataex-. Maybe that makes it more clear and you can give a suggestion which one to use!

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long date float(R_ME1BM1OP1RMW1 R_ME1BM1OP1RMW2 R_ME1BM1OP1RMW3 R_ME1BM1OP1RMW4 R_ME1BM1OP1RMW5 R_ME1BM1OP2RMW1 R_ME1BM1OP2RMW2 R_ME1BM1OP2RMW3 R_ME1BM1OP2RMW4 R_ME1BM1OP2RMW5 R_ME1BM1OP3RMW1)
20000131 8.9373865 1.3906333 24.94088 19.299826 13.011122 -1.268156 1.1190631 .6036164 -.9459738 2.497578 -1.98488
20000229 35.22927 34.94847 38.75097 35.481968 27.182663 6.657858 3.017326 1.7236083 3.302876 7.542269 3.156618
20000331 -14.443842 -18.436178 -30.43669 -16.43674 -10.641294 -6.325729 -2.780201 -2.582879 .9272819 -2.860675 -.6665761
20000428 -27.631174 -18.369743 -28.280313 -20.25849 -20.667427 -5.697472 -1.788506 -3.3564355 -1.567505 -2.6838365 -3.053039
20000531 -12.100137 -9.86248 -7.880253 -10.873996 -8.364392 -4.01419 -2.157115 -.9140872 -1.4205456 -1.6339346 -4.832917
20000630 19.299923 16.517431 27.3564 22.04087 16.068064 4.2703156 2.5752914 .6615145 2.9706404 5.293447 5.286972
20000731 -7.549302 -3.98921 -4.1969085 -2.705402 -2.7176976 -.14799568 -.4703484 .06822845 -2.6943986 -1.9468787 .6108273
20000831 7.224619 7.431433 7.626074 5.843343 6.180289 3.893407 2.1313396 1.5850357 3.04021 2.266343 1.711844
20000929 -5.315658 1.8555487 -3.7677474 .849898 -5.463445 -1.1064572 -1.9216635 -1.2137762 2.597101 -1.8449255 -2.7891026
20001031 -8.265444 -8.731771 -8.372848 -2.061017 -9.604306 -1.5189612 -1.9500257 -.9537592 -2.102842 -.54998535 -1.5104102
end

Best regards,
John

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#5

20 Mar 2018, 08:11

Well, that data layout is appropriate to the code in #3. I note, however, that the variable names in your data example are only loosely similar to the ones you referred to in #1, so the code in #3 will require some tinkering to change the variable name schemes in the code match what you actually have in your data.

Also, the code in #3 assumes that every combination of me from 1 to 3, bm 1 to 3, inv (now called op?) 1 to 3 and rma (no called rmw?) 1 to 5 appear in the data. That is not true of your example, though you may well have chosen to just select a small number of variables to illustrate. If your data does not contain all those combinations, then the code will need another modification deep inside the loop to check for the existence of the variable before processing it.
1 like
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

20 Mar 2018, 09:40

Here's a mashup of Clyde's #2 and #3 solutions, assuming that the data example in #4 is saved in "dataex.dta" and is in Stata's current directory.

Code:

* load data and make a dataset of variables that will appear in the table
clear all
use "dataex.dta"
describe R_*, replace
keep name

* calculate the mean and extract the numbers from the variable name
program mean_table
    local vname = name
    use `vname' using "dataex.dta", clear
    sum `vname', meanonly
    clear
    set obs 1
    foreach v in ME BM OP RMW {
        gen `v' = regexs(1) if regexm("`vname'", "`v'([0-9]+)")
    }
    gen mean = r(mean)
    gen meanof = "`vname'"
end
runby mean_table, by(name) verbose

list, noobs sepby(ME BM OP)

and the results

Code:

. list, noobs sepby(ME BM OP)

  +--------------------------------------------------+
  | ME   BM   OP   RMW        mean            meanof |
  |--------------------------------------------------|
  |  1    1    1     1   -.4614357   R_ME1BM1OP1RMW1 |
  |  1    1    1     2    .2754135   R_ME1BM1OP1RMW2 |
  |  1    1    1     3    1.573956   R_ME1BM1OP1RMW3 |
  |  1    1    1     4    3.118026   R_ME1BM1OP1RMW4 |
  |  1    1    1     5    .4983575   R_ME1BM1OP1RMW5 |
  |--------------------------------------------------|
  |  1    1    2     1   -.5257381   R_ME1BM1OP2RMW1 |
  |  1    1    2     2   -.2224839   R_ME1BM1OP2RMW2 |
  |  1    1    2     3   -.4378934   R_ME1BM1OP2RMW3 |
  |  1    1    2     4    .4106844   R_ME1BM1OP2RMW4 |
  |  1    1    2     5    .6079401   R_ME1BM1OP2RMW5 |
  |--------------------------------------------------|
  |  1    1    3     1   -.4070664   R_ME1BM1OP3RMW1 |
  +--------------------------------------------------+

.

Comment

John Hanser

Join Date: Jan 2018

Posts: 24
#7

20 Mar 2018, 13:33

Wow! Both Code from #3 and #6 work greatly!

Thank you veeeeeeery much Clyde and Robert, you saved me countless hours!
Comment

ME	BM	INV	RMW->	1	2	3	4	5
1	1	1		mean(R_ME1BM1INV1RMA1)	...
1	1	2		...
1	1	3
1	2	1
1	2	2
1	2	3
...

Announcement

Generate Table, with rows/columns dependent on number in variable name.

Comment

Comment

Comment

Comment

Comment

Comment