Compute Arithmetic Operation on the Elements of a List

Lois Fisher

Join Date: Nov 2016

Posts: 23
#1

Compute Arithmetic Operation on the Elements of a List

23 Nov 2016, 11:36

This is my first post. I am asking a basic question but I would appreciate your help and understanding. I have tried looking and looking through posts via web searches.

The general issue is whether Stata can perform an operation on a list (versus a variable) without resorting to Mata or writing a program. Or perhaps I am simply too inexperienced to see another way.

I have a nested loop in which I generate a series of simple calculations. The results of the calculation are stored in a local macro (as a list).
My question: how can I take the average across this list of numeric results? Is it better to switch to Mata? Or have I missed a better strategy?

In "WHAT NOW" below, I have tried to use functions that act on variables without success. I tried collapse and summarize, but perhaps I am not understanding how to best use a return result within a loop.

forval i = 1 (1) 5 {
foreach x in `fives' {
while !missing(ligne`i'_`x'v) & !missing(ligne`i'_`x'h){
local graines ligne`i'_`x'v * ligne`i'_`x'h)
local grain_list `grain_list' `graines'
cap gen drop line_avg_`i'
gen line_avg_`i' = WHAT NOW? `grain_list'
}
}
}

Thank you in advance for any support you could offer.
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35699

23 Nov 2016, 11:46

The code isn't self-contained enough for us even to try to understand it. What is in the local macros you don't define? What is in the variables you don't show us?

But the question might be answered by this:

Code:

local given 2 3 5 7 11 13 17 19

* check the answer 
di (2 + 3 + 5 + 7 + 11 + 13 + 17 + 19) / 8
9.625

* slow way 
local ngiven : word count `given'
tokenize "`given'"

local total = 0

forval i = 1/`ngiven' {
     local total = `total' + ``i''
}

di `total'/`ngiven'
9.625

* better way 
mata : mean(strtoreal(tokens(st_local("given"))'))
  9.625

Comment

Lois Fisher

Join Date: Nov 2016

Posts: 23
#3

24 Nov 2016, 09:07

Thank you, and my sincere apologies for the initial entirely opaque post. Let me try again, as I have been working on it given your advice but I am stuck again.

My problem: I am estimating a corn harvest from a sample portion of a field. The sample from which I will extrapolate a yield can be thought of as a matrix on a portion of the field.

My sample of the corn field consists of five lines of plants. Each line has a slightly different density of plants (6-20+ plants).
The essential unit that will be calculated is the average number of grains per cob for the entire sample or matrix.
For each cob: total grains/cob ("graines") = vertical count of grains ("v") x horizontal count of grains ("h").

Starting on the first of five lines, I take the first plant and calculate its total grains (ligne1_1v x ligne1_1h)
I measure the grain count for every fifth plant in the line ("fives" = 1 6 11 ..).

At the end of each of 5 lines I am calculating the average number of grains per cob for the entire line. Because I will eventually take the grand average across five lines this could also be thought of as a matrix average. (However I was motivated to do the calculation per line to imitate what was done in the past in excel output.)

Here is my adapted code, which is still not hitting the mark:

forval i = 1 (1) 5 {
foreach x in `fives' {
while !missing(ligne`i'_`x'v) & !missing(ligne`i'_`x'h){
local graines ligne`i'_`x'v * ligne`i'_`x'h
local graine_list `graine_list' `graines'
local ngraine_list: word count `graine_list'
tokenize "'graine_list'"
local totl = 0
forval k = 1/`ngraine_list' {
local totl = `totl' + ``k''
capture drop line_avg_`i'
gen line_avg_`i' = `totl'/`ngraine_list'
}
}
}
}

Thank you again for your kind response, and thank you for all that you do for the Stata community. I have profited from many of your lessons.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35699
#4

24 Nov 2016, 09:15

Thanks for your generous comments, which I do appreciate.

Unfortunately without sample data and the definitions of local macros, I can't follow this either.

But my instinct is that you shouldn't show us your code at all. You should show us a sample dataset small enough to be clear and large enough to show what you need and explain what it is you want. Essentially you want a mean and in using a statistical program there should be no need to write your own code de novo for calculating a mean.

There will be, almost certainly, a much simpler solution.
Comment
Lois Fisher

Join Date: Nov 2016

Posts: 23
#5

25 Nov 2016, 10:14

Thank you. I agree there must be a simpler way.

Below is the dataex code for 30 observations and an abbreviated set of variables. I read that tenured members prefer the dataex output (input codes) to be pasted directly into the post.

Two goals: 1) make a calculation within 4 x 2 matrix that yields a 2 x 2 matrix; and 2) take the average of the 4 (calculated) numbers in the 2 x 2 matrix.

Calculation for the 1st goal: by row, capture the product of columns 1 & 2. Likewise, by row, capture the product of columns 3 and 4.
Calculation for the 2nd goal: Take an average of the 4 numbers calculated in step 1.

Context of the 4 x 2 matrix:
In the problem here, I am calculating the number of grains in a sample of corn cobs.
Total grains for one corn cob = vertical count of grains x horizontal count of grains
The sample of corn cobs
The 4 corn cobs sampled are cobs taken from the 1st and 6th plant on each of two lines of corn plants.
The variables:
number1_cobs = number of plants in the first line of corn plants
line1_cob1_vert = vertical count of grains for the 1st cob on the 1st line of plants
line1_cob1_horiz = horizontal count of grains for the 1st cob on the 1st line of plants
number2_cobs_line2 = number of plants in the second line of corn plants
line2_cob1_vert = vertical count of grains for the 6th cob on the 2nd line of plants
line2_cob1_horiz = horizontal count of grains for the 6th cob on the 2nd line of plants
A nuance:
In the actual data set, each line of plants in sample area of each field will contain a different number of plants. The actual range of the number of plants is ~5 to ~25 per line.
The actual data reflect a sample cob from every fifth plant in each line: (1st plant, 6th plant, 11th plant, 16th plant, 21st plant)
My previous code had a "while" loop that accounted for the varying lengths of each line.

Thank you Dr Cox and other members for any advice you could provide. It's just a box but I struggle with construction and syntax. I appreciate your instruction here, I promise it will be generalized to many similar issues with macros, loops, and/or matrices

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input double id float(number1_cobs_line1 line1_cob1_vert line1_cob1_horiz line1_cob6_vert line1_cob6_horiz number2_cobs_line2 line2_cob1_vert line2_cob1_horiz line2_cob6_vert line2_cob6_horiz) 35207 15 16 14 14 10 15 15 12 21 14 26223 14 15 10 19 12 18 16 10 17 14 18711 10 14 10 15 12 11 15 10 14 10 26280 9 17 14 19 14 13 14 10 17 12 35221 15 22 14 16 12 18 19 12 17 14 18250 22 24 16 15 14 11 20 14 18 14 35244 14 20 16 19 14 16 15 12 20 16 28924 21 24 20 24 16 21 23 16 20 14 18101 13 17 12 20 16 13 15 12 16 12 18575 13 22 14 18 12 13 20 12 14 10 19819 14 23 16 19 14 12 24 16 19 16 18669 18 18 12 25 12 9 25 14 21 14 35166 11 15 16 20 14 12 17 14 9 16 35159 11 16 16 19 16 22 17 16 31 12 40479 22 31 16 17 16 23 27 16 18 16 18640 22 22 14 9 14 12 14 12 17 18 35150 11 17 12 27 12 12 25 14 10 12 34519 14 6 14 19 16 21 29 16 17 12 18671 11 17 12 29 14 11 15 16 19 12 34505 11 22 18 25 14 15 27 12 25 12 40466 12 15 12 17 18 15 17 16 32 18 28401 15 12 14 13 14 26 7 14 18 16 22724 9 13 14 16 18 13 39 14 32 14 22726 15 26 12 20 14 10 30 14 30 16 18490 12 28 14 33 14 10 21 14 24 14 35441 12 29 16 18 14 15 19 12 17 14 35467 20 22 12 16 14 20 21 8 23 12 34188 20 11 14 17 12 16 17 14 16 12 35419 18 11 16 25 14 17 18 12 21 10 18933 15 19 10 21 10 18 22 14 13 14 end

Last edited by Lois Fisher; 25 Nov 2016, 10:29.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30101

25 Nov 2016, 10:43

I am confused by your description of what you want to do, so forgive me if this code entirely misses the point. I've made some "educated guesses" about what you want.

Even if I have your goals wrong, one thing I will suggest is that whatever you want to do, in Stata it will probably be easier to do it with the data in long layout. So my code begins with that. This is somewhat complicated because of the variable names you are using: so I have to prepare for the double-reshaping with some -rename-s.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double id float(number1_cobs_line1 line1_cob1_vert line1_cob1_horiz line1_cob6_vert line1_cob6_horiz number2_cobs_line2 line2_cob1_vert line2_cob1_horiz line2_cob6_vert line2_cob6_horiz)
35207 15 16 14 14 10 15 15 12 21 14
26223 14 15 10 19 12 18 16 10 17 14
18711 10 14 10 15 12 11 15 10 14 10
26280  9 17 14 19 14 13 14 10 17 12
35221 15 22 14 16 12 18 19 12 17 14
18250 22 24 16 15 14 11 20 14 18 14
35244 14 20 16 19 14 16 15 12 20 16
28924 21 24 20 24 16 21 23 16 20 14
18101 13 17 12 20 16 13 15 12 16 12
18575 13 22 14 18 12 13 20 12 14 10
19819 14 23 16 19 14 12 24 16 19 16
18669 18 18 12 25 12  9 25 14 21 14
35166 11 15 16 20 14 12 17 14  9 16
35159 11 16 16 19 16 22 17 16 31 12
40479 22 31 16 17 16 23 27 16 18 16
18640 22 22 14  9 14 12 14 12 17 18
35150 11 17 12 27 12 12 25 14 10 12
34519 14  6 14 19 16 21 29 16 17 12
18671 11 17 12 29 14 11 15 16 19 12
34505 11 22 18 25 14 15 27 12 25 12
40466 12 15 12 17 18 15 17 16 32 18
28401 15 12 14 13 14 26  7 14 18 16
22724  9 13 14 16 18 13 39 14 32 14
22726 15 26 12 20 14 10 30 14 30 16
18490 12 28 14 33 14 10 21 14 24 14
35441 12 29 16 18 14 15 19 12 17 14
35467 20 22 12 16 14 20 21  8 23 12
34188 20 11 14 17 12 16 17 14 16 12
35419 18 11 16 25 14 17 18 12 21 10
18933 15 19 10 21 10 18 22 14 13 14
end

rename number?_* number_line_*
rename *_vert vert_*
rename *_horiz horiz_*

reshape long vert_line1_cob horiz_line1_cob vert_line2_cob horiz_line2_cob, i(id) j(cob_num)
rename *_cob *
reshape long number_line_ vert_line horiz_line, i(id cob_num) j(line_num)
rename number_line_ number_line
rename *_line *

gen grains = vert*horiz

by line_num, sort: egen avg_grains_this_line = mean(grains)

//    AVERAGE NUMBER OF GRAINS FOR ENTIRE SAMPLE
summ grains, meanonly
local grand_mean_grains = r(mean)

//    UNWEIGHTED AVERAGE OF LINE AVERAGES
egen line_flag = tag(line_num)
summ avg_grains_this_line if line_flag, meanonly
local mean_of_line_means = r(mean)

I hope this is helpful.

Comment

Lois Fisher

Join Date: Nov 2016

Posts: 23
#7

26 Nov 2016, 14:20

ANSWER COMPLETE.

Voila - ! Thank you Dr. Schechter for reminding me that panel data is not always time-denominated - ! And thank you so much for taking the time to carefully illustrate this idea using my example. I sure appreciated it.

FOR MY FELLOW NOVICES WHO HAVE CODED PANEL DATA JUST A FEW TIMES AND/OR NOT RECENTLY:
Always think of restructuring multi-level data from wide to long (or vice-versa).

For your convenience, here is a link to Nick Cox's tips on how to use the reshape function, which is required reading if you want to get a quick refresher, before diving into the full details.
http://www.stata.com/support/faqs/da...-with-reshape/

Dr. Schechter, I see you have a such a wonderful background in simulation and modeling in medicine and also have done some studies on telemedicine. I am currently in Bamako, Mali, where, as you may know, they have experimented a great deal with telemedicine. Here we have a context that promises to be truly valuable. I hope to learn more when I am done counting corn cobs - !
Comment

Announcement