Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Translating an R smart-rounding function to Stata

    Quick background: I am an R user whose office is trying to move toward Stata. I have another modeling software that we use that requires any data that is a rate to total to 100% (I know there is an associated drop in data precision, but for the purposes of my project it doesn't matter the loss is negligible). So to specifiy and clarify, I have data where I need each column to total to 100 after rounding to the nearest 10th, and I want to have code that will adjust whatever the largest value in the column so that the column total is 100. I have a user-made function in R that does a very good job of this, but I have no idea how to recreate this function in Stata (or if I even can). I have included the function below. I need to be able to run this on rows or columns.

    SmartRound <- function(x, digits = 0) {
    up <- 10 ^ digits
    x <- x * up
    y <- floor(x)
    indices <- tail(order(x-y), round(sum(x)) - sum(y))
    y[indices] <- y[indices] + 1
    y / up
    }

    I have, so far, exclusively used this function within dplyr code such as:

    County1 = CountySim %>%
    group_by(`COUNTY`) %>%
    summarize(n=n()) %>%
    mutate(Rate = (n/sum(n))*100) %>%
    adorn_totals("row") %>%
    mutate_if(is.numeric, ~SmartRound(., 2)) %>%
    mutate (IncRate=round((1095.75/n),5))

    and it just seems to work whether it is on columns or rows (I think the group_by statement is controlling that).

    If anyone can help me translate this R function to Stata, it would make setting up these tables much quicker, easier, and more reproducible.
    Thanks,
    Chris

  • #2
    This Stata command implements essentially the same algorithm you defined in R.


    Code:
    quietly capture program drop SmartRound
    program SmartRound
        syntax varlist(numeric) [, digits(integer 0)]
        scalar up = 10^`digits'
        gen original_order = _n
        foreach var in `varlist'{
            generate double x_`var' = `var' * up
            generate double y_`var' = floor(x_`var')
            generate orderby_`var' = x_`var' - y_`var'
            sort orderby_`var'
            quietly sum x_`var'
            scalar x_sum = r(sum)
            quietly sum y_`var'
            scalar y_sum = r(sum)
            if !mod(x_sum - y_sum, 0.5) & mod(x_sum - y_sum, 2){
                scalar increment_count = trunc(x_sum - y_sum)
            }
            else{
                scalar increment_count = round(x_sum - y_sum)
            }
            replace y_`var' = y_`var' + 1 if _n > _N - increment_count
            replace y_`var' = y_`var' / up
            drop x_`var' orderby_`var'
            rename y_`var' sr_`var'
        }
        sort original_order
        drop original_order
    end
    The conditional statement inside of the for loop deals with the fact that R will round 0.5 down and Stata will round 0.5 up. Its a little hacky, but I force Stata to behave like R here.

    Rather than over-writing the existing variables, this command will new variables with the sr_ prefix. Stata has some useful syntax for working with prefixed variables. I test with some made up example data.

    Code:
    clear
    input int(rate1) float(rate2 rate3)
    20 20.50 20.23242
    20 20.20 20.0234
    15 15.15 15.0254
    5 5.05 5.0256
    10 10.01 10.0112
    10 10.02 10.01432
    10 10.03 10.01123
    10 10.04 10.01432
    end
    Which yields the following new variables:

    Code:
    SmartRound rate*
    list sr_rate*
    Code:
    . list sr_rate*, noobs clean
    
        sr_rate1   sr_rate2   sr_rate3  
              20         21         20  
              20         20         20  
              15         15         15  
               5          5          5  
              10         10         10  
              10         10         10  
              10         10         10  
              10         10         10
    Code:
    drop sr_rate*
    SmartRound rate*, digits(2)
    list sr_rate*, noobs clean
    Code:
    . list sr_rate*, noobs clean
    
        sr_rate1   sr_rate2   sr_rate3  
              20       20.5      20.23  
              20       20.2      20.02  
              15      15.15      15.03  
               5       5.05       5.03  
              10      10.01      10.01  
              10      10.02      10.01  
              10      10.03      10.01  
              10      10.04      10.02
    There may still be some small differences between the algorithms, because the two platforms may use different sorting algorithms. If I understand correctly, this shouldn't be a problem, correct?

    I've also written a complementary R script for testing purposes.

    Code:
    x1 <- c(20, 20, 15, 5, 10, 10, 10, 10)
    x2 <- c(20.50, 20.20, 15.15, 5.05, 10.01, 10.02, 10.03, 10.04)
    x3 <- c(20.23242, 20.0234, 15.0254, 5.0256, 10.0112, 10.01432, 10.01123, 10.01432)
    
    
    SmartRound <- function(x, digits = 0) {
      up <- 10 ^ digits
      x <- x * up
      y <- floor(x)
      indices <- tail(order(x-y), round(sum(x)) - sum(y))
      y[indices] <- y[indices] + 1
      y / up
    }
    
    
    print(SmartRound(x1))
    print(SmartRound(x2))
    print(SmartRound(x3))
    
    print(SmartRound(x1, 2))
    print(SmartRound(x2, 2))
    print(SmartRound(x3, 2))
    Code:
    > print(SmartRound(x1))
    [1] 20 20 15  5 10 10 10 10
    > print(SmartRound(x2))
    [1] 21 20 15  5 10 10 10 10
    > print(SmartRound(x3))
    [1] 20 20 15  5 10 10 10 10
    >
    > print(SmartRound(x1, 2))
    [1] 20 20 15  5 10 10 10 10
    > print(SmartRound(x2, 2))
    [1] 20.50 20.20 15.15  5.05 10.01 10.02 10.03 10.04
    > print(SmartRound(x3, 2))
    [1] 20.23 20.02 15.03  5.03 10.01 10.01 10.01 10.02
    Last edited by Daniel Schaefer; 05 Jan 2023, 13:09.

    Comment


    • #3
      I also just want to add that I'm not necessarily sure this is the best way to solve this particular problem in Stata. There may be a simpler, more elegant, more readable implementation. My goal was to implement your algorithm with as much fidelity as possible, and to demonstrate how new commands can be created in Stata.

      Comment


      • #4
        Some references at https://www.statalist.org/forums/for...lable-from-ssc are pertinent here.

        I am the other way round here. If I noticed that percents always added to 100.0 (I probably wouldn't) I would expect chicanery, not to say unethical treatment of results. Percents may not add exactly to 100 because of rounding errors is a fair message I've been seeing ever since.

        Still, you're probably subject to a (misguided) policy here and not in a position to object.

        Comment


        • #5
          Originally posted by Daniel Schaefer View Post
          I also just want to add that I'm not necessarily sure this is the best way to solve this particular problem in Stata. There may be a simpler, more elegant, more readable implementation. My goal was to implement your algorithm with as much fidelity as possible, and to demonstrate how new commands can be created in Stata.
          This is a huge help! I've only just started in Stata and am still trying to figure out all of the options that are available. I've also never been much good at writing my own functions. I think this will do what I need, which will save me a ton of time rather than manually going through and fixing the values by hand. Thanks so much!

          Comment


          • #6
            Great, Chris, happy to help!

            I like both R and Stata, usually for different things. I learned R first, and prefer to do data management type tasks in R, but I definitely prefer to do statistical analysis work in Stata. If you're the type who prefers to avoid writing your own functions, I think you're going to grow to really like Stata.

            Comment

            Working...
            X