Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • -egen, xtile- works in Stata 11, 12, 13, but generates an error message in Stata 15... Is this situation supposed to not occur?

    Good afternoon,

    The -egen, xtile- command/function from the egenmore package by Nick Cox is exhibiting some concerning behaviour: it works under Stata 11,12,13, but it generates the following error message under Stata 15:

    Code:
    . webuse nlswork, clear
    (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
    
    . 
    . keep idcode year ln_wage hours
    
    . egen xtiles = xtile(ln_wage), by(idcode)
    too many values
    r(134);
    
    .
    I believe the problem is a point in the code of - _gxtile- where an outdated command which fails under Stata 15 (but works under Stata 11,12,13) is used:

    Code:
    . levels idcode
    too many values
    r(134);
    
    . levelsof idcode
    ******** output omitted, but it works
    My questions is, Is this not supposed to never happen? Are the commands / functions in Stata not supposed to be backward compatible?

  • #2
    Thanks for the report. Ulrich Kohler is the author of this function.

    What's really biting here is that levels is using tabulate.

    We'll be back with revised code when we can.

    EDIT: Not much tested, but a quick hack below.

    Code:
    *! 1.2.1 NJC 14 Jan 2019 
    *! _gxtile version 1.2 UK 08 Mai 2006
    * categorizes exp by its quantiles - byable
    
    * 1.2: Bug: Opt percentiles were treated incorrectely after implement. of option nq
    *       Allows By-Variables that are strings
    * 1.1: Bug: weights are treated incorectelly in version 1.0. -> fixed
    *     New option nquantiles() implemented             
    * 1.0: initial version
    program _gxtile, byable(onecall) sortpreserve
        version 8.2
            gettoken type 0 : 0
            gettoken h    0 : 0 
            gettoken eqs  0 : 0
    
        syntax varname(numeric) [if] [in] [, ///
          Percentiles(string) ///
          Nquantiles(string) ///
          Weights(string) ALTdef by(varlist) ]
    
        marksample touse 
        
        // Error Checks
    
        if "`altdef'" ~= "" & "`weights'" ~= "" {
            di as error "weights are not allowed with altdef"
            exit 111
        }
        
        if "`percentiles'" != "" & "`nquantiles'" != "" {
            di as error "do not specify percentiles and nquantiles"
            exit 198
        }
    
        // Default Settings etc.
    
        if "`weights'" ~= "" {
            local weight "[aw = `weights']"
        }
    
        if "`percentiles'" != "" {
            local percnum "`percentiles'"
        }
        else if "`nquantiles'" != "" {
            local perc = 100/`nquantiles'
            local first = `perc'
            local step = `perc'
            local last = 100-`perc'
            local percnum "`first'(`step')`last'"
        }
    
        if "`nquantiles'" == "" & "`percentiles'" == "" {
            local percnum 50
        }
    
        quietly {
        
            gen `type' `h' = .
    
            // Without by
    
            if "`by'"=="" {
                local i 1
                _pctile `varlist' `weight' if `touse', percentiles(`percnum') `altdef'
                foreach p of numlist `percnum' {
                    if `i' == 1 {
                        replace `h' = `i' if `varlist' <= r(r`i') & `touse'
                    }
                    replace `h' = `++i' if `varlist' > r(r`--i')  & `touse'
                    local i = `i' + 1
                }
                exit
            }
    
            // With by
            tempvar byvar
            by `touse' `by', sort: gen `byvar' = 1 if _n==1 & `touse'
            by `touse' (`by'): replace `byvar' = sum(`byvar')
            
            su `byvar', meanonly 
            forval k = 1/`r(max)' {
                local i 1
                _pctile `varlist' `weight' if `byvar' == `k' & `touse' , percentiles(`percnum') `altdef'
                foreach p of numlist `percnum' {
                    if `i' == 1 {
                        replace `h' = `i' if `varlist' <= r(r`i') & `byvar' == `k' & `touse'
                    }
                    replace `h' = `++i' if `varlist' > r(r`--i')  & `byvar' == `k' & `touse'
                    local i = `i' + 1
                }
            }
        }
        end
        exit
    Last edited by Nick Cox; 14 Jan 2019, 05:34.

    Comment


    • #3
      My worry is why something that works under Stata 11,12,13 does not work under Stata 15? I thought that commands are backward compatible, so that one can never encounter a situation in which something works under older versions of Stata, but does not work under newer versions of Stata?

      Otherwise as far as I am concerned update is not needed. This is easy enough to do with a loop and with the standard Stata command -xtile-:

      Code:
      egen group2 = group(idcode)
      
      gen beta2 = .
      
      summ group2, meanonly
      
      qui forvalues i = 1/`r(max)' {
      
      cap xtile xtilewagetemp = ln_wage if `i' == group2
      
      cap replace beta2=  xtilewagetemp if `i' == group2
      
      cap drop xtilewagetemp
      
      }

      Comment


      • #4
        As I understand it the problem arises because levels, an outdated official command, is calling tabulate. The limits on tabulate would have applied back in Stata 11 to 13.

        It's possible that changes to levelsof in Stata 15 were accompanied by changes in how calls to levels were handled.

        I don't feel agitated about a general issue here. Sure, not breaking existing code is a good idea. When that fails, we do what we can to fix it.

        Comment


        • #5
          Originally posted by Joro Kolev View Post
          My worry is why something that works under Stata 11,12,13 does not work under Stata 15?
          Are you by any chance running Stata 15 IC when Stata 11, 12, and 13 were SE (or MP) flavor?

          Best
          Daniel

          Comment


          • #6
            daniel klein , do you have psychic powers or are you big brother watching me what I do, as we speak now?

            Yes indeed, the older versions of Stata, 11, 12, 13 I have are SE/MP, and the most modern version of Stata 15 I have is IC...

            This was an excellent catch ! It did not even pass my mind that the phenomenon might be occurring not because of the recency of the Stata version, but because of the Stata flavour (IC/MP/SE)...







            Comment


            • #7
              Yes, I think it is IC vs larger Stata. I ran that code on Stata 15.1 MP2 (Windows 7) and there were no problems.

              But it takes forever and a day to run. Much faster is:

              Code:
              webuse nlswork, clear
              keep idcode year ln_wage hours
              
              // egen xtiles = xtile(ln_wage), by(idcode)
              
              capture program drop one_xtile
              program define one_xtile
                  xtile xtiles = ln_wage
                  exit
              end
              
              runby one_xtile, by(idcode)
              -runby- is written by Robert Picard and me, and is available from SSC.

              Comment


              • #8
                Indeed Clyde Schechter , your code -runby- is the fastest. I managed to beat -egen, xtile-, but your -runby- comes out fastest by far.

                These are the running times in seconds under Stata 11 SE.

                2: 162.57 / 1 = 162.5690 My solution, loop with -xtile- as in #3, really slow
                3: 35.17 / 1 = 35.1710 -egen, xtile- from egenmore, not too fast either
                4: 2.51 / 1 = 2.5080 -runby-, the fastest
                5: 18.09 / 1 = 18.0850 My solution, loop with - _pctile -, as in the code below (not too bad, not too good either):

                Code:
                egen group4 = group(idcode)
                
                gen beta5 = .
                
                summ group4, meanonly
                
                
                qui forvalues i = 1/`r(max)' {
                
                cap _pctile ln_wage if `i' == group4, p(50)
                
                cap replace beta5=  cond(ln_wage>r(r1),2,1) if `i' == group4  & !missing(ln_wage)
                
                }

                Comment


                • #9
                  Everyone who points out that egen, xtile() from egenmore (SSC)is slow is quite correct. It's just old code going back to 2006 if not earlier. As it's often mentioned and it will perhaps be included in several do-files or even programs there is no point to removing it from SSC.

                  Meanwhile, I trust that faster xtile for panels or other groups is on the StataCorp to-do list with people needing to write their own loops.

                  Comment


                  • #10
                    Nick, I am not throwing stones in the garden of -egen, xtile-. It is a useful function that serves a useful purpose. I have used it a couple of times interactively. If I do something once (means no bootstrap, no simulation, no repeated sampling and recalculation) and the dataset is of the order of magnitude and complexity of the National Longitudinal Survey, I d rather use the one line of code of -egen, xtile-, than write the 6-7 lines of code in #3 and #8... 30 seconds, 1 min, this is not a cost in terms of time that is even to be mentioned if you do something once.

                    I would also recommend -egen, xtile- to a rudimentary Stata user, rather than lecture them for hours on how one writes loops and dereference locals.

                    I was doing speed comparisons anyways when I encountered the issue mentioned in the title of this post. Clyde suggested -runby- so I included it in the speed comparison too, and as I had the speed comparisons already, I thought it would not hurt nobody if I report them to the general public.

                    Comment


                    • #11
                      Joro Kolev I appreciate that. What you have said about that egen function is all correct. Also, it is key that this forum discusses strengths and weaknesses of different code. (The code in question was not mine, but I am confident that Ulrich Kohler would agree with my stance.)

                      Comment


                      • #12
                        Originally posted by Nick Cox View Post
                        Joro Kolev I appreciate that. What you have said about that egen function is all correct. Also, it is key that this forum discusses strengths and weaknesses of different code. (The code in question was not mine, but I am confident that Ulrich Kohler would agree with my stance.)
                        Of course, I do.

                        I created a new version of -_gxtile.ado- based on Nick's code from yesterday. I'll send this to Kit Baum as soon as possible. However, while running some verifiations, I realized that I had to make one additional change in the "Default, Settings, etc" section. Does anybody know when Stata's -_pctile- started to have both options, -percentiles()- and -nquantiles-?

                        Uli

                        Comment


                        • #13
                          Ulrich Kohler I believe that the -nquantiles- option to -_pctile- has been present since Stata 1. See below for my method of determining such things (I am not sure whether this method is always valid).

                          the earliest Stata I have on this laptop is Stata 11.

                          And Stata 11's -_pctile- already has the mentioned -nquantiles- option, and the option is documented in the help file.

                          Otherwise I think the option has been in existence since Stata 1, unless the following method is not valid for some reasons that I am not aware of:

                          Code:
                          . version 1: _pctile price, nq(5)
                          
                          . version 1: return list
                          
                          scalars:
                                           r(r1) =  4099
                                           r(r2) =  4647
                                           r(r3) =  5705
                                           r(r4) =  7827

                          Comment


                          • #14
                            Also I do not think that this option is of any particular grand use. It is just a convenience command, as

                            Code:
                             
                              _pctile price, nq(5)
                            is equivalent to

                            Code:
                              
                              _pctile price, _pctile price, percentiles(20 40 60 80)

                            Comment


                            • #15
                              As far as I know, new features do still work under version control. Try out

                              . sysuse auto
                              . version 1.0
                              . npregress kernel mpg weight

                              I am pretty sure that non parametric kernel regression was not available in Stata 1.0

                              Comment

                              Working...
                              X