Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Nested foreach loops

    Dear Statalists,

    I am fairly new to Stata and am facing some issues regarding nested foreach loops.


    My aim is to calculate arith. means (and later on also harmonic and geometric means) of financial rations for several firm-year-activity combinations using a holdout procedure:
    Per year, I am starting from a number n of firms within the same activity. I am then "excluding" one firm. For this excluded firm I want to know the arith. mean of the remaining n-1 ones. (so I aggregate the financial ratios of the remaining n-1 firms using the arith. mean).

    As my dataset is rather large (140,000 obs, 40 financial ratios) I wanted to do this using foreach loops.
    Unfortunately I am not able to use it properly.

    To make it easier I attached a small part of my actual dataset:




    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(id year) double Act float(p_a p_b p_c p_d p_e p_f a b c d e f)
    66 2016 5120102013 1.2984524  .6801872 21.108166 33.604084  10.03312 100.91905 . . . . . .
    73 2016 5120102013         .         .         .         .         .         . . . . . . .
    73 2017 5120102013         .         .         .         .         .         . . . . . . .
    66 2017 5120102013  1.914292  .7849112  12.16106 15.510412  9.922243  29.23344 . . . . . .
    52 2016 5120201012  3.071616 2.3932686 32.960102  43.91845 18.725128  82.86536 . . . . . .
    52 2017 5120201012 2.9343715 2.0664864  21.13209 29.698446         .         . . . . . . .
    78 2016 5220102010  2.330663  .2610788 8.3273735 11.115675  5.950643  25.58993 2 2 2 2 2 2
    70 2016 5220102010 1.5020788  1.145595  9.933325  18.64282  7.041817  64.33379 2 2 2 2 2 2
    70 2017 5220102010 1.6521858  1.243019  9.578372  16.23191    10.556  72.60179 2 2 2 2 2 2
    78 2017 5220102010 2.7057986 .29020053  7.148772 12.627624  5.434725  34.07482 2 2 2 2 2 2
    60 2016 5220306010  1.363158   .620429  7.854228 10.115916  5.823207         . . . . . . .
    60 2017 5220306010 1.8390906  .7305213  9.004681 13.430566  6.947975         . . . . . . .
    63 2016 5240601010  2.725502         .         .         .         . 1.5158334 . . . . . .
    63 2017 5240601010         .         .         .         .         .         . . . . . . .
    61 2016 5330101010 1.8158855   5.40461  51.26385 37.998203 16.408314  49.22495 . . . . . .
    61 2017 5330101010  2.408882  5.718142  30.85248  27.41594  9.346869 67.951164 . . . . . .
    55 2016 5340304019    2.5977 1.8031392 18.441538  26.42728 17.828302 33.486195 . . . . . .
    55 2017 5340304019 1.5581638  .9036034 13.876739  19.71499 12.656157 17.653942 . . . . . .
    57 2016 5510101010  1.724135 2.5399055   8.11631 12.668252  7.908437 32.725983 . . . . . .
    57 2017 5510101010  .7788456  2.574076  11.53141  19.74624         .         . . . . . . .
    74 2016 5510103010  2.734703  3.719215  59.01502  60.57954         .         . . . . . . .
    74 2017 5510103010  2.879613  2.934534         .         .         .         . . . . . . .
    51 2016 5540302012 1.5938604     9.312  26.64166 35.174557  22.65542   27.7221 . . . . . .
    51 2017 5540302012 1.6104847  8.437128  60.17859  40.18389  21.06223  25.35945 . . . . . .
    62 2016 5610101010  7.737345 1.1574975         .         .  13.76379         . 2 2 . . . .
    53 2016 5610101010  9.066516 2867.9675         .         .         .         . 2 2 . . . .
    62 2017 5610101010  8.461952 1.0269839         .         .         .         . 2 2 . . . .
    53 2017 5610101010 10.828953  307.7424         .         .         .         . 2 2 . . . .
    77 2016 5620104015 2.2196906  3.411423    5.3847  7.504815  5.902096    9.7498 . . . . . .
    77 2017 5620104015 1.0993739 2.0176318         .         .  8.982515         . . . . . . .
    50 2016 5620201010  6.227028 186.26508         .         .         .         . . . . . . .
    50 2017 5620201010   12.3296 33.097652         .         .         .         . . . . . . .
    54 2016 5620201011 4.4957156 36.520832         .         .         .         . 6 5 . . . .
    79 2016 5620201011 12.312634  6.749701  20.59523 23.435795 17.748589 102.95913 6 5 . . 2 .
    71 2016 5620201011  2.438079         .         .         .         .         . 6 . . . . .
    76 2016 5620201011  1.865898 2.3851955         .         .  35.06238         . 6 5 . . 2 .
    68 2016 5620201011  2.709173    72.146         .         .         .         . 6 5 . . . .
    67 2016 5620201011  .4670446 1.1933546         .         .         .         . 6 5 . . . .
    76 2017 5620201011 1.6545606 1.4624665         .         . 8.9155445         . 5 4 . . . .
    54 2017 5620201011  2.795352 75.729546         .         .         .         . 5 4 . . . .
    67 2017 5620201011  7.305226  79.72797         .         .         .         . 5 4 . . . .
    71 2017 5620201011  3.491845  467.3678         .         .         .         . 5 4 . . . .
    79 2017 5620201011         .         .         .         .         .         . . . . . . .
    68 2017 5620201011 1.6894443         .         .         .         .         . 5 . . . . .
    72 2016 5710201010         .         .         .         .         .         . . . . . . .
    72 2017 5710201010         .         .         .         .         .         . . . . . . .
    65 2016 5710601012  .6606954  .1637355         .         . 4.5977554  25.11868 . . . . . .
    65 2017 5710601012  .7739521  .1869667  12.89933  15.75604         . 29.280806 . . . . . .
    69 2016 5720101010  3.280811    2.4627  13.34587  19.12024 24.809576         . 2 2 2 2 2 .
    56 2016 5720101010  9.606293  2.207179 12.952106 16.849419 15.863658  50.46669 2 2 2 2 2 .
    56 2017 5720101010  8.150916 2.0931437 15.802845 20.295147 14.668383  46.53449 2 2 2 2 2 .
    69 2017 5720101010  3.258684   2.43273  57.18305  60.71841  17.04262         . 2 2 2 2 2 .
    58 2016 5720102010  .9051487  3.838624         .         .  48.35808 3749561.5 . . . . . .
    58 2017 5720102010  1.377798  6.083374  97.29674 128.03157  332.5254   4479412 . . . . . .
    75 2016 5720102013 11.264032  9.646719   508.982 207.46164  178.6787         . . . . . . .
    75 2017 5720102013   7.42067  4.722989         .         . 297.33984         . . . . . . .
    80 2016 5720103016  2.774277  3.828485  22.87399  26.29795 11.739504 129.73656 . . . . . .
    80 2017 5720103016  3.935634  5.306964  32.35358  40.96696  16.82737  164.7742 . . . . . .
    59 2016 5910101011  .9965011  .6387489  9.366394 10.847375 4.6392045 20.188396 . . . . . .
    59 2017 5910101011 1.1410109  .5914634  12.04218 19.490486   8.73666  20.92999 . . . . . .
    64 2016 5910401010 1.7382427 1.0490441  6.981328  11.07251  5.228491  26.24563 . . . . . .
    64 2017 5910401010 1.6192815     .9944  9.650793 15.393454  7.760488 20.434355 . . . . . .
    end



    Variables p_a to p_g are financial ratios.
    Variables a to g are Dummy Variables used for grouping.

    Again, to make it more clear I wrote a code for variable p_a and p_b of my dataset, which is showing what the output of the final foreach loop(s) should look like:

    Code:
    egen A_group = group(Act year a) 
    rangestat A_mean=p_a, interval(A_group 0 0) by(A_group) excludeself
    replace A_mean = . if p_a ==.
    
    egen B_group = group(Act year b) 
    rangestat B_mean=p_b, interval(B_group 0 0) by(B_group) excludeself
    replace B_mean = . if p_b ==.
    For your information, I am using rangestat from SSC, because I found it helpful due to the option "excludeself". (https://www.statalist.org/forums/for...s-within-range)

    Also I am using Stata 12 SE.


    My attempt do to this with two foreach loops looks as follows:

    Code:
    foreach x of varlist  a b c d e f {
    egen group_`x' = group(Act year `x')  
    foreach y of varlist  p_a p_b p_c p_d p_e p_f{
    rangestat `y'_mean=`y', interval(group_`x' 0 0) by(group_`x') excludeself 
    }
    }
    Stata is returning
    Code:
    cannot create -p_a_mean-; variable(s) already defined
    r(110);
    
    
    end of do-file

    I can assume what the problem is but I can't figure out how to fix it.

    Any help is highly appreciated!

    Thank you all in advance,
    Anna






  • #2
    The way you set up your double loop, p_a_mean is created when `x' is a. When you move on to the second item of the outer loop, you ask rangestat to name the variable again p_a_mean.

    I think the following does what you want:

    Code:
    isid id year, sort
    
    foreach x of varlist  a b c d e f {
        egen group_`x' = group(Act year `x') 
        rangestat (mean) p_a p_b p_c p_d p_e p_f, interval(group_`x' 0 0) excludeself 
        foreach y of varlist p_a p_b p_c p_d p_e p_f {
            rename `y'_mean g`x'_`y'_mean
        }
    }

    Comment


    • #3
      Dear Robert,

      thank you very much for your fast reply and your code.
      It is working just fine, although it is producing more output, than I actually need.

      Right know it is calculating the arith. mean of every financial ratio for grouped by a, b, c, d, e and f.
      So a total of 36 new variables stating arith. means (+ another 6 for the groups).

      I would only need to calculate the arith. mean of p_a grouped by a, the arith. mean of p_b grouped by b, the arith. mean of p_c grouped by c and so on.
      So a total of 6 new variables stating arith. means. Sorry if I didn't specify that clear enough in my initial post.

      Is there a way to change your code slightly in order to do that? Otherwise I would have to drop the variables I don't need manually.

      Thank you again in advance
      Anna

      Comment


      • #4
        If Stata has done all the work you just need to drop 30 variables.


        But this is one way to write the code as it should have been.

        Code:
        foreach x of varlist  a b c d e f {    
            egen group_`x' = group(Act year `x')      
            rangestat (mean) p_`x', interval(group_`x' 0 0) excludeself  
        }

        Comment


        • #5
          Thank's Nick. Your code is working perfectly for the given snap of my dataset.

          However when applying it to the whole dataset, whenever the suffix of the financial ratio is not the same as the group dummy (for example the financial ratio is p_z and group dummy x) it of course stops working.
          I know this is my mistake/ due to my inconsistent naming of variables and I already fixed this.
          But as I am still new to Stata , I really want to learn and improve my coding, I was wondering if there is a way to rewrite your code in a way it fixes this potential problem?


          As for the dropping of variables: my full dataset is consisting of over 40 variables and 40 group dummies. So I assumed dropping over 1,500 variables is not really considered "good coding".
          But of course, I do get your point!

          And again, thank you both!

          Comment


          • #6
            I can guess that the code could be revised further but your wording doesn't allow me to do it. You appear to be asking "How do I generalise the code to instances where naming is inconsistent?" and I can't tell you that without your spelling out what is paired with what. It is like saying that couples should be placed close to one other without explaining who is in which couple. Sorry that may seem too sharp but I can't identify what you're seeking

            Comment


            • #7
              Thank's for your fast answer.
              Sorry, that was absolutely my bad, not providing you with an example.

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input int(id year) double Act float(p_a p_b p_c p_d p_e p_f z y x w v u)
              66 2016 5120102013 1.2984524  .6801872 21.108166 33.604084  10.03312 100.91905 . . . . . .
              73 2016 5120102013         .         .         .         .         .         . . . . . . .
              73 2017 5120102013         .         .         .         .         .         . . . . . . .
              66 2017 5120102013  1.914292  .7849112  12.16106 15.510412  9.922243  29.23344 . . . . . .
              52 2016 5120201012  3.071616 2.3932686 32.960102  43.91845 18.725128  82.86536 . . . . . .
              52 2017 5120201012 2.9343715 2.0664864  21.13209 29.698446         .         . . . . . . .
              78 2016 5220102010  2.330663  .2610788 8.3273735 11.115675  5.950643  25.58993 2 2 2 2 2 2
              70 2016 5220102010 1.5020788  1.145595  9.933325  18.64282  7.041817  64.33379 2 2 2 2 2 2
              70 2017 5220102010 1.6521858  1.243019  9.578372  16.23191    10.556  72.60179 2 2 2 2 2 2
              78 2017 5220102010 2.7057986 .29020053  7.148772 12.627624  5.434725  34.07482 2 2 2 2 2 2
              60 2016 5220306010  1.363158   .620429  7.854228 10.115916  5.823207         . . . . . . .
              60 2017 5220306010 1.8390906  .7305213  9.004681 13.430566  6.947975         . . . . . . .
              63 2016 5240601010  2.725502         .         .         .         . 1.5158334 . . . . . .
              63 2017 5240601010         .         .         .         .         .         . . . . . . .
              61 2016 5330101010 1.8158855   5.40461  51.26385 37.998203 16.408314  49.22495 . . . . . .
              61 2017 5330101010  2.408882  5.718142  30.85248  27.41594  9.346869 67.951164 . . . . . .
              55 2016 5340304019    2.5977 1.8031392 18.441538  26.42728 17.828302 33.486195 . . . . . .
              55 2017 5340304019 1.5581638  .9036034 13.876739  19.71499 12.656157 17.653942 . . . . . .
              57 2016 5510101010  1.724135 2.5399055   8.11631 12.668252  7.908437 32.725983 . . . . . .
              57 2017 5510101010  .7788456  2.574076  11.53141  19.74624         .         . . . . . . .
              74 2016 5510103010  2.734703  3.719215  59.01502  60.57954         .         . . . . . . .
              74 2017 5510103010  2.879613  2.934534         .         .         .         . . . . . . .
              51 2016 5540302012 1.5938604     9.312  26.64166 35.174557  22.65542   27.7221 . . . . . .
              51 2017 5540302012 1.6104847  8.437128  60.17859  40.18389  21.06223  25.35945 . . . . . .
              62 2016 5610101010  7.737345 1.1574975         .         .  13.76379         . 2 2 . . . .
              53 2016 5610101010  9.066516 2867.9675         .         .         .         . 2 2 . . . .
              62 2017 5610101010  8.461952 1.0269839         .         .         .         . 2 2 . . . .
              53 2017 5610101010 10.828953  307.7424         .         .         .         . 2 2 . . . .
              77 2016 5620104015 2.2196906  3.411423    5.3847  7.504815  5.902096    9.7498 . . . . . .
              77 2017 5620104015 1.0993739 2.0176318         .         .  8.982515         . . . . . . .
              50 2016 5620201010  6.227028 186.26508         .         .         .         . . . . . . .
              50 2017 5620201010   12.3296 33.097652         .         .         .         . . . . . . .
              54 2016 5620201011 4.4957156 36.520832         .         .         .         . 6 5 . . . .
              79 2016 5620201011 12.312634  6.749701  20.59523 23.435795 17.748589 102.95913 6 5 . . 2 .
              71 2016 5620201011  2.438079         .         .         .         .         . 6 . . . . .
              76 2016 5620201011  1.865898 2.3851955         .         .  35.06238         . 6 5 . . 2 .
              68 2016 5620201011  2.709173    72.146         .         .         .         . 6 5 . . . .
              67 2016 5620201011  .4670446 1.1933546         .         .         .         . 6 5 . . . .
              76 2017 5620201011 1.6545606 1.4624665         .         . 8.9155445         . 5 4 . . . .
              54 2017 5620201011  2.795352 75.729546         .         .         .         . 5 4 . . . .
              67 2017 5620201011  7.305226  79.72797         .         .         .         . 5 4 . . . .
              71 2017 5620201011  3.491845  467.3678         .         .         .         . 5 4 . . . .
              79 2017 5620201011         .         .         .         .         .         . . . . . . .
              68 2017 5620201011 1.6894443         .         .         .         .         . 5 . . . . .
              72 2016 5710201010         .         .         .         .         .         . . . . . . .
              72 2017 5710201010         .         .         .         .         .         . . . . . . .
              65 2016 5710601012  .6606954  .1637355         .         . 4.5977554  25.11868 . . . . . .
              65 2017 5710601012  .7739521  .1869667  12.89933  15.75604         . 29.280806 . . . . . .
              69 2016 5720101010  3.280811    2.4627  13.34587  19.12024 24.809576         . 2 2 2 2 2 .
              56 2016 5720101010  9.606293  2.207179 12.952106 16.849419 15.863658  50.46669 2 2 2 2 2 .
              56 2017 5720101010  8.150916 2.0931437 15.802845 20.295147 14.668383  46.53449 2 2 2 2 2 .
              69 2017 5720101010  3.258684   2.43273  57.18305  60.71841  17.04262         . 2 2 2 2 2 .
              58 2016 5720102010  .9051487  3.838624         .         .  48.35808 3749561.5 . . . . . .
              58 2017 5720102010  1.377798  6.083374  97.29674 128.03157  332.5254   4479412 . . . . . .
              75 2016 5720102013 11.264032  9.646719   508.982 207.46164  178.6787         . . . . . . .
              75 2017 5720102013   7.42067  4.722989         .         . 297.33984         . . . . . . .
              80 2016 5720103016  2.774277  3.828485  22.87399  26.29795 11.739504 129.73656 . . . . . .
              80 2017 5720103016  3.935634  5.306964  32.35358  40.96696  16.82737  164.7742 . . . . . .
              59 2016 5910101011  .9965011  .6387489  9.366394 10.847375 4.6392045 20.188396 . . . . . .
              59 2017 5910101011 1.1410109  .5914634  12.04218 19.490486   8.73666  20.92999 . . . . . .
              64 2016 5910401010 1.7382427 1.0490441  6.981328  11.07251  5.228491  26.24563 . . . . . .
              64 2017 5910401010 1.6192815     .9944  9.650793 15.393454  7.760488 20.434355 . . . . . .
              end

              Thats is the same dataset I initially posted with the industry dummies being named differently.
              Everything else remains the same: I still want to calculate the arith. mean of p_a grouped by z, the arith. mean of p_b grouped by y and so on.

              In this case, your code would not work. So I was just curious if there would be a reasonable way recoding your code from #4. Or, and I do believe that this is probably the go-to solution, if it would just be easier to rename the industry dummies
              Code:
              rename z a
              and so on....



              Comment


              • #8
                Now that's a good question. Sometimes this is called parallel for, and there's grumble on the net that it's difficult in Stata, which is exaggerated.

                Code:
                local stuff z y x w v u
                foreach y in a b c d e f {        
                     gettoken x stuff : stuff      
                     egen group_`x' = group(Act year `x')          
                     rangestat (mean) p_`y', interval(group_`x' 0 0) excludeself  
                }
                Each time we go round the loop we take off the first item of stuff and leave the rest. There aren't any checks here. The programmer has to ensure that the two lists a b c d e f and z y x w v u are of equal length (or, sometimes, that unequal length is not a problem).

                Comment


                • #9
                  Thank you for the helpful advise on that, Nick.

                  It's working perfectly fine now!

                  Comment

                  Working...
                  X