Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Compute Arithmetic Operation on the Elements of a List

    This is my first post. I am asking a basic question but I would appreciate your help and understanding. I have tried looking and looking through posts via web searches.

    The general issue is whether Stata can perform an operation on a list (versus a variable) without resorting to Mata or writing a program. Or perhaps I am simply too inexperienced to see another way.

    I have a nested loop in which I generate a series of simple calculations. The results of the calculation are stored in a local macro (as a list).
    My question: how can I take the average across this list of numeric results? Is it better to switch to Mata? Or have I missed a better strategy?

    In "WHAT NOW" below, I have tried to use functions that act on variables without success. I tried collapse and summarize, but perhaps I am not understanding how to best use a return result within a loop.

    forval i = 1 (1) 5 {
    foreach x in `fives' {
    while !missing(ligne`i'_`x'v) & !missing(ligne`i'_`x'h){
    local graines ligne`i'_`x'v * ligne`i'_`x'h)
    local grain_list `grain_list' `graines'
    cap gen drop line_avg_`i'
    gen line_avg_`i' = WHAT NOW? `grain_list'
    }
    }
    }

    Thank you in advance for any support you could offer.

  • #2
    The code isn't self-contained enough for us even to try to understand it. What is in the local macros you don't define? What is in the variables you don't show us?

    But the question might be answered by this:

    Code:
    local given 2 3 5 7 11 13 17 19
    
    * check the answer 
    di (2 + 3 + 5 + 7 + 11 + 13 + 17 + 19) / 8
    9.625
    
    * slow way 
    local ngiven : word count `given'
    tokenize "`given'"
    
    local total = 0
    
    forval i = 1/`ngiven' {
         local total = `total' + ``i''
    }
    
    di `total'/`ngiven'
    9.625
    
    * better way 
    mata : mean(strtoreal(tokens(st_local("given"))'))
      9.625

    Comment


    • #3
      Thank you, and my sincere apologies for the initial entirely opaque post. Let me try again, as I have been working on it given your advice but I am stuck again.

      My problem: I am estimating a corn harvest from a sample portion of a field. The sample from which I will extrapolate a yield can be thought of as a matrix on a portion of the field.

      My sample of the corn field consists of five lines of plants. Each line has a slightly different density of plants (6-20+ plants).
      The essential unit that will be calculated is the average number of grains per cob for the entire sample or matrix.
      For each cob: total grains/cob ("graines") = vertical count of grains ("v") x horizontal count of grains ("h").

      Starting on the first of five lines, I take the first plant and calculate its total grains (ligne1_1v x ligne1_1h)
      I measure the grain count for every fifth plant in the line ("fives" = 1 6 11 ..).

      At the end of each of 5 lines I am calculating the average number of grains per cob for the entire line. Because I will eventually take the grand average across five lines this could also be thought of as a matrix average. (However I was motivated to do the calculation per line to imitate what was done in the past in excel output.)

      Here is my adapted code, which is still not hitting the mark:

      forval i = 1 (1) 5 {
      foreach x in `fives' {
      while !missing(ligne`i'_`x'v) & !missing(ligne`i'_`x'h){
      local graines ligne`i'_`x'v * ligne`i'_`x'h
      local graine_list `graine_list' `graines'
      local ngraine_list: word count `graine_list'
      tokenize "'graine_list'"
      local totl = 0
      forval k = 1/`ngraine_list' {
      local totl = `totl' + ``k''
      capture drop line_avg_`i'
      gen line_avg_`i' = `totl'/`ngraine_list'
      }
      }
      }
      }


      Thank you again for your kind response, and thank you for all that you do for the Stata community. I have profited from many of your lessons.

      Comment


      • #4
        Thanks for your generous comments, which I do appreciate.

        Unfortunately without sample data and the definitions of local macros, I can't follow this either.

        But my instinct is that you shouldn't show us your code at all. You should show us a sample dataset small enough to be clear and large enough to show what you need and explain what it is you want. Essentially you want a mean and in using a statistical program there should be no need to write your own code de novo for calculating a mean.

        There will be, almost certainly, a much simpler solution.

        Comment


        • #5
          Thank you. I agree there must be a simpler way.

          Below is the dataex code for 30 observations and an abbreviated set of variables. I read that tenured members prefer the dataex output (input codes) to be pasted directly into the post.

          Two goals: 1) make a calculation within 4 x 2 matrix that yields a 2 x 2 matrix; and 2) take the average of the 4 (calculated) numbers in the 2 x 2 matrix.

          Calculation for the 1st goal: by row, capture the product of columns 1 & 2. Likewise, by row, capture the product of columns 3 and 4.
          Calculation for the 2nd goal: Take an average of the 4 numbers calculated in step 1.

          Context of the 4 x 2 matrix:
          In the problem here, I am calculating the number of grains in a sample of corn cobs.
          Total grains for one corn cob = vertical count of grains x horizontal count of grains
          The sample of corn cobs
          The 4 corn cobs sampled are cobs taken from the 1st and 6th plant on each of two lines of corn plants.
          The variables:
          number1_cobs = number of plants in the first line of corn plants
          line1_cob1_vert = vertical count of grains for the 1st cob on the 1st line of plants
          line1_cob1_horiz = horizontal count of grains for the 1st cob on the 1st line of plants
          number2_cobs_line2 = number of plants in the second line of corn plants
          line2_cob1_vert = vertical count of grains for the 6th cob on the 2nd line of plants
          line2_cob1_horiz = horizontal count of grains for the 6th cob on the 2nd line of plants
          A nuance:
          In the actual data set, each line of plants in sample area of each field will contain a different number of plants. The actual range of the number of plants is ~5 to ~25 per line.
          The actual data reflect a sample cob from every fifth plant in each line: (1st plant, 6th plant, 11th plant, 16th plant, 21st plant)
          My previous code had a "while" loop that accounted for the varying lengths of each line.

          Thank you Dr Cox and other members for any advice you could provide. It's just a box but I struggle with construction and syntax. I appreciate your instruction here, I promise it will be generalized to many similar issues with macros, loops, and/or matrices

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input double id float(number1_cobs_line1 line1_cob1_vert line1_cob1_horiz line1_cob6_vert line1_cob6_horiz number2_cobs_line2 line2_cob1_vert line2_cob1_horiz line2_cob6_vert line2_cob6_horiz)
          35207 15 16 14 14 10 15 15 12 21 14
          26223 14 15 10 19 12 18 16 10 17 14
          18711 10 14 10 15 12 11 15 10 14 10
          26280  9 17 14 19 14 13 14 10 17 12
          35221 15 22 14 16 12 18 19 12 17 14
          18250 22 24 16 15 14 11 20 14 18 14
          35244 14 20 16 19 14 16 15 12 20 16
          28924 21 24 20 24 16 21 23 16 20 14
          18101 13 17 12 20 16 13 15 12 16 12
          18575 13 22 14 18 12 13 20 12 14 10
          19819 14 23 16 19 14 12 24 16 19 16
          18669 18 18 12 25 12  9 25 14 21 14
          35166 11 15 16 20 14 12 17 14  9 16
          35159 11 16 16 19 16 22 17 16 31 12
          40479 22 31 16 17 16 23 27 16 18 16
          18640 22 22 14  9 14 12 14 12 17 18
          35150 11 17 12 27 12 12 25 14 10 12
          34519 14  6 14 19 16 21 29 16 17 12
          18671 11 17 12 29 14 11 15 16 19 12
          34505 11 22 18 25 14 15 27 12 25 12
          40466 12 15 12 17 18 15 17 16 32 18
          28401 15 12 14 13 14 26  7 14 18 16
          22724  9 13 14 16 18 13 39 14 32 14
          22726 15 26 12 20 14 10 30 14 30 16
          18490 12 28 14 33 14 10 21 14 24 14
          35441 12 29 16 18 14 15 19 12 17 14
          35467 20 22 12 16 14 20 21  8 23 12
          34188 20 11 14 17 12 16 17 14 16 12
          35419 18 11 16 25 14 17 18 12 21 10
          18933 15 19 10 21 10 18 22 14 13 14
          end
          Last edited by Lois Fisher; 25 Nov 2016, 10:29.

          Comment


          • #6
            I am confused by your description of what you want to do, so forgive me if this code entirely misses the point. I've made some "educated guesses" about what you want.

            Even if I have your goals wrong, one thing I will suggest is that whatever you want to do, in Stata it will probably be easier to do it with the data in long layout. So my code begins with that. This is somewhat complicated because of the variable names you are using: so I have to prepare for the double-reshaping with some -rename-s.

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input double id float(number1_cobs_line1 line1_cob1_vert line1_cob1_horiz line1_cob6_vert line1_cob6_horiz number2_cobs_line2 line2_cob1_vert line2_cob1_horiz line2_cob6_vert line2_cob6_horiz)
            35207 15 16 14 14 10 15 15 12 21 14
            26223 14 15 10 19 12 18 16 10 17 14
            18711 10 14 10 15 12 11 15 10 14 10
            26280  9 17 14 19 14 13 14 10 17 12
            35221 15 22 14 16 12 18 19 12 17 14
            18250 22 24 16 15 14 11 20 14 18 14
            35244 14 20 16 19 14 16 15 12 20 16
            28924 21 24 20 24 16 21 23 16 20 14
            18101 13 17 12 20 16 13 15 12 16 12
            18575 13 22 14 18 12 13 20 12 14 10
            19819 14 23 16 19 14 12 24 16 19 16
            18669 18 18 12 25 12  9 25 14 21 14
            35166 11 15 16 20 14 12 17 14  9 16
            35159 11 16 16 19 16 22 17 16 31 12
            40479 22 31 16 17 16 23 27 16 18 16
            18640 22 22 14  9 14 12 14 12 17 18
            35150 11 17 12 27 12 12 25 14 10 12
            34519 14  6 14 19 16 21 29 16 17 12
            18671 11 17 12 29 14 11 15 16 19 12
            34505 11 22 18 25 14 15 27 12 25 12
            40466 12 15 12 17 18 15 17 16 32 18
            28401 15 12 14 13 14 26  7 14 18 16
            22724  9 13 14 16 18 13 39 14 32 14
            22726 15 26 12 20 14 10 30 14 30 16
            18490 12 28 14 33 14 10 21 14 24 14
            35441 12 29 16 18 14 15 19 12 17 14
            35467 20 22 12 16 14 20 21  8 23 12
            34188 20 11 14 17 12 16 17 14 16 12
            35419 18 11 16 25 14 17 18 12 21 10
            18933 15 19 10 21 10 18 22 14 13 14
            end
            
            rename number?_* number_line_*
            rename *_vert vert_*
            rename *_horiz horiz_*
            
            reshape long vert_line1_cob horiz_line1_cob vert_line2_cob horiz_line2_cob, i(id) j(cob_num)
            rename *_cob *
            reshape long number_line_ vert_line horiz_line, i(id cob_num) j(line_num)
            rename number_line_ number_line
            rename *_line *
            
            gen grains = vert*horiz
            
            by line_num, sort: egen avg_grains_this_line = mean(grains)
            
            //    AVERAGE NUMBER OF GRAINS FOR ENTIRE SAMPLE
            summ grains, meanonly
            local grand_mean_grains = r(mean)
            
            //    UNWEIGHTED AVERAGE OF LINE AVERAGES
            egen line_flag = tag(line_num)
            summ avg_grains_this_line if line_flag, meanonly
            local mean_of_line_means = r(mean)
            I hope this is helpful.

            Comment


            • #7
              ANSWER COMPLETE.

              Voila - ! Thank you Dr. Schechter for reminding me that panel data is not always time-denominated - ! And thank you so much for taking the time to carefully illustrate this idea using my example. I sure appreciated it.

              FOR MY FELLOW NOVICES WHO HAVE CODED PANEL DATA JUST A FEW TIMES AND/OR NOT RECENTLY:
              Always think of restructuring multi-level data from wide to long (or vice-versa).

              For your convenience, here is a link to Nick Cox's tips on how to use the reshape function, which is required reading if you want to get a quick refresher, before diving into the full details.
              http://www.stata.com/support/faqs/da...-with-reshape/

              Dr. Schechter, I see you have a such a wonderful background in simulation and modeling in medicine and also have done some studies on telemedicine. I am currently in Bamako, Mali, where, as you may know, they have experimented a great deal with telemedicine. Here we have a context that promises to be truly valuable. I hope to learn more when I am done counting corn cobs - !

              Comment

              Working...
              X