Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to manage missing variables when running foreach loop

    Hi all,

    I'm trying to run the following loop:

    foreach country in country_list {
    use "D:\...\estimation"
    keep if COUNTRY==`country'
    save "D:\...\estimation_`country', replace

    foreach code in code_list {

    use "D:\...\estimation_`country'"
    capture keep if CODE==`code'
    xtile dec_var=var, n(10)
    gen var_10=var if dec_var==1
    collapse (mean) var_10, by(CODE13) *CODE13 is another variable in the subset
    graph bar var_10, over(CODE13)
    }
    }

    However, since not every country has all the CODEs in code_list, when I run it I obtain the following error as soon as the variable in code list is not there:
    "nquantiles() must be less than or equal to number of observations plus one"

    How can I solve this issue?
    Thanks in advance


  • #2
    This code isn't plausible unless the first line is


    Code:
    foreach country in `country_list' {
    and similar comments apply to code_list.


    You don't need the choreography of multiple files.

    As I understand it, you want the mean of values in the lowest decile of one variable -- by categories of another variable, by different codes whatever they are, and by different countries.

    OK. This is an attempt at simpler without the mess of multiple new files. Naturally I can't test it.

    Code:
    use "D:\...\estimation"
    
    gen this = 0
     
    * assumes local macros country_list and code_list 
    foreach country in `country_list' {
        foreach code in `code_list' {
            replace this = country == "`country'" & code == "`code'" 
            _pctile var if this, p(10) 
            if r(r1) < . { 
                graph bar var if this & var < r(r1), by(CODE13) title("`country' `code'") 
            } 
        }
    }

    Working from the inside going outwards

    1. graph bar defaults to showing means, so the collapse isn't needed.

    2. Multiple graphs aren't much use unless you have text showing what they represent.

    3. xtile is overkill if all you want is the lowest decile.

    There is a bigger question, which is that you have a portfolio of so many countries and so many codes, so you need to save those graphs somewhere too.





    Comment


    • #3
      Originally posted by Nick Cox View Post
      This code isn't plausible unless the first line is


      Code:
      foreach country in `country_list' {
      and similar comments apply to code_list.


      You don't need the choreography of multiple files.

      As I understand it, you want the mean of values in the lowest decile of one variable -- by categories of another variable, by different codes whatever they are, and by different countries.

      OK. This is an attempt at simpler without the mess of multiple new files. Naturally I can't test it.

      Code:
      use "D:\...\estimation"
      
      gen this = 0
      
      * assumes local macros country_list and code_list
      foreach country in `country_list' {
      foreach code in `code_list' {
      replace this = country == "`country'" & code == "`code'"
      _pctile var if this, p(10)
      if r(r1) < . {
      graph bar var if this & var < r(r1), by(CODE13) title("`country' `code'")
      }
      }
      }

      Working from the inside going outwards

      1. graph bar defaults to showing means, so the collapse isn't needed.

      2. Multiple graphs aren't much use unless you have text showing what they represent.

      3. xtile is overkill if all you want is the lowest decile.

      There is a bigger question, which is that you have a portfolio of so many countries and so many codes, so you need to save those graphs somewhere too.




      I apologize, I made some foolish simplification in the code.

      I need the mean values of different quantiles (0-10%, 10%-25%, 25%-50%, 50%-75%, 75%-90%, 90%-100% ) of var by categories of another variable, by different codes and different countries.

      Then, I need a bar chart for all the quantiles of var for every country and every CODE13, where CODE13 is a subcategory of CODE (but represented in another variable).

      Can I still use _pctile or should I go back to xtile?

      Comment


      • #4
        xtile is no use here as it assumes that you want quantiles equally spaced in probability. You need some custom code. This isn't the whole of your problem but it shows some technique.


        Code:
        . sysuse auto, clear
        (1978 automobile data)
        
        . su mpg , detail
        
                                Mileage (mpg)
        -------------------------------------------------------------
              Percentiles      Smallest
         1%           12             12
         5%           14             12
        10%           14             14       Obs                  74
        25%           18             14       Sum of wgt.          74
        
        50%           20                      Mean            21.2973
                                Largest       Std. dev.      5.785503
        75%           25             34
        90%           29             35       Variance       33.47205
        95%           34             35       Skewness       .9487176
        99%           41             41       Kurtosis       3.975005
        
        . ret li
        
        scalars:
                          r(N) =  74
                      r(sum_w) =  74
                       r(mean) =  21.2972972972973
                        r(Var) =  33.47204738985561
                         r(sd) =  5.785503209735141
                   r(skewness) =  .9487175964588155
                   r(kurtosis) =  3.97500459645325
                        r(sum) =  1576
                        r(min) =  12
                        r(max) =  41
                         r(p1) =  12
                         r(p5) =  14
                        r(p10) =  14
                        r(p25) =  18
                        r(p50) =  20
                        r(p75) =  25
                        r(p90) =  29
                        r(p95) =  34
                        r(p99) =  41
        
        . gen wanted = cond(mpg <= r(p10), 1, cond(mpg <= r(p25), 2, cond(mpg <= r(p50), 3, cond(mpg <= r(p75), 4, cond(mpg <= r(p90), 5, 6))))) if mpg < .
        
        .  version 16: table mpg wanted
        
        ----------------------------------------------
        Mileage   |               wanted              
        (mpg)     |    1     2     3     4     5     6
        ----------+-----------------------------------
               12 |    2                              
               14 |    6                              
               15 |          2                        
               16 |          4                        
               17 |          4                        
               18 |          9                        
               19 |                8                  
               20 |                3                  
               21 |                      5            
               22 |                      5            
               23 |                      3            
               24 |                      4            
               25 |                      5            
               26 |                            3      
               28 |                            3      
               29 |                            1      
               30 |                                  2
               31 |                                  1
               34 |                                  1
               35 |                                  2
               41 |                                  1
        ----------------------------------------------

        Comment

        Working...
        X