Translating an R smart-rounding function to Stata

Chris Marini

Join Date: Jan 2023

Posts: 2
#1

Translating an R smart-rounding function to Stata

05 Jan 2023, 09:43

Quick background: I am an R user whose office is trying to move toward Stata. I have another modeling software that we use that requires any data that is a rate to total to 100% (I know there is an associated drop in data precision, but for the purposes of my project it doesn't matter the loss is negligible). So to specifiy and clarify, I have data where I need each column to total to 100 after rounding to the nearest 10th, and I want to have code that will adjust whatever the largest value in the column so that the column total is 100. I have a user-made function in R that does a very good job of this, but I have no idea how to recreate this function in Stata (or if I even can). I have included the function below. I need to be able to run this on rows or columns.

SmartRound <- function(x, digits = 0) {
up <- 10 ^ digits
x <- x * up
y <- floor(x)
indices <- tail(order(x-y), round(sum(x)) - sum(y))
y[indices] <- y[indices] + 1
y / up
}

I have, so far, exclusively used this function within dplyr code such as:

County1 = CountySim %>%
group_by(`COUNTY`) %>%
summarize(n=n()) %>%
mutate(Rate = (n/sum(n))*100) %>%
adorn_totals("row") %>%
mutate_if(is.numeric, ~SmartRound(., 2)) %>%
mutate (IncRate=round((1095.75/n),5))

and it just seems to work whether it is on columns or rows (I think the group_by statement is controlling that).

If anyone can help me translate this R function to Stata, it would make setting up these tables much quicker, easier, and more reproducible.
Thanks,
Chris
Tags: None

Daniel Schaefer

Join Date: Mar 2020
Posts: 814

05 Jan 2023, 13:02

This Stata command implements essentially the same algorithm you defined in R.

Code:

quietly capture program drop SmartRound
program SmartRound
    syntax varlist(numeric) [, digits(integer 0)]
    scalar up = 10^`digits'
    gen original_order = _n
    foreach var in `varlist'{
        generate double x_`var' = `var' * up
        generate double y_`var' = floor(x_`var')
        generate orderby_`var' = x_`var' - y_`var'
        sort orderby_`var'
        quietly sum x_`var'
        scalar x_sum = r(sum)
        quietly sum y_`var'
        scalar y_sum = r(sum)
        if !mod(x_sum - y_sum, 0.5) & mod(x_sum - y_sum, 2){
            scalar increment_count = trunc(x_sum - y_sum)
        }
        else{
            scalar increment_count = round(x_sum - y_sum)
        }
        replace y_`var' = y_`var' + 1 if _n > _N - increment_count
        replace y_`var' = y_`var' / up
        drop x_`var' orderby_`var'
        rename y_`var' sr_`var'
    }
    sort original_order
    drop original_order
end

The conditional statement inside of the for loop deals with the fact that R will round 0.5 down and Stata will round 0.5 up. Its a little hacky, but I force Stata to behave like R here.

Rather than over-writing the existing variables, this command will new variables with the sr_ prefix. Stata has some useful syntax for working with prefixed variables. I test with some made up example data.

Code:

clear
input int(rate1) float(rate2 rate3)
20 20.50 20.23242
20 20.20 20.0234
15 15.15 15.0254
5 5.05 5.0256
10 10.01 10.0112
10 10.02 10.01432
10 10.03 10.01123
10 10.04 10.01432
end

Which yields the following new variables:

Code:

SmartRound rate*
list sr_rate*

Code:

. list sr_rate*, noobs clean

    sr_rate1   sr_rate2   sr_rate3  
          20         21         20  
          20         20         20  
          15         15         15  
           5          5          5  
          10         10         10  
          10         10         10  
          10         10         10  
          10         10         10

Code:

drop sr_rate*
SmartRound rate*, digits(2)
list sr_rate*, noobs clean

Code:

. list sr_rate*, noobs clean

    sr_rate1   sr_rate2   sr_rate3  
          20       20.5      20.23  
          20       20.2      20.02  
          15      15.15      15.03  
           5       5.05       5.03  
          10      10.01      10.01  
          10      10.02      10.01  
          10      10.03      10.01  
          10      10.04      10.02

There may still be some small differences between the algorithms, because the two platforms may use different sorting algorithms. If I understand correctly, this shouldn't be a problem, correct?

I've also written a complementary R script for testing purposes.

Code:

x1 <- c(20, 20, 15, 5, 10, 10, 10, 10)
x2 <- c(20.50, 20.20, 15.15, 5.05, 10.01, 10.02, 10.03, 10.04)
x3 <- c(20.23242, 20.0234, 15.0254, 5.0256, 10.0112, 10.01432, 10.01123, 10.01432)


SmartRound <- function(x, digits = 0) {
  up <- 10 ^ digits
  x <- x * up
  y <- floor(x)
  indices <- tail(order(x-y), round(sum(x)) - sum(y))
  y[indices] <- y[indices] + 1
  y / up
}


print(SmartRound(x1))
print(SmartRound(x2))
print(SmartRound(x3))

print(SmartRound(x1, 2))
print(SmartRound(x2, 2))
print(SmartRound(x3, 2))

Code:

> print(SmartRound(x1))
[1] 20 20 15  5 10 10 10 10
> print(SmartRound(x2))
[1] 21 20 15  5 10 10 10 10
> print(SmartRound(x3))
[1] 20 20 15  5 10 10 10 10
>
> print(SmartRound(x1, 2))
[1] 20 20 15  5 10 10 10 10
> print(SmartRound(x2, 2))
[1] 20.50 20.20 15.15  5.05 10.01 10.02 10.03 10.04
> print(SmartRound(x3, 2))
[1] 20.23 20.02 15.03  5.03 10.01 10.01 10.01 10.02

Last edited by Daniel Schaefer; 05 Jan 2023, 13:09.

Comment

Daniel Schaefer

Join Date: Mar 2020

Posts: 814
#3

05 Jan 2023, 13:19

I also just want to add that I'm not necessarily sure this is the best way to solve this particular problem in Stata. There may be a simpler, more elegant, more readable implementation. My goal was to implement your algorithm with as much fidelity as possible, and to demonstrate how new commands can be created in Stata.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35664
#4

06 Jan 2023, 06:24

Some references at https://www.statalist.org/forums/for...lable-from-ssc are pertinent here.

I am the other way round here. If I noticed that percents always added to 100.0 (I probably wouldn't) I would expect chicanery, not to say unethical treatment of results. Percents may not add exactly to 100 because of rounding errors is a fair message I've been seeing ever since.

Still, you're probably subject to a (misguided) policy here and not in a position to object.
2 likes
Comment
Chris Marini

Join Date: Jan 2023

Posts: 2
#5

06 Jan 2023, 11:01

Originally posted by Daniel Schaefer View Post

I also just want to add that I'm not necessarily sure this is the best way to solve this particular problem in Stata. There may be a simpler, more elegant, more readable implementation. My goal was to implement your algorithm with as much fidelity as possible, and to demonstrate how new commands can be created in Stata.

This is a huge help! I've only just started in Stata and am still trying to figure out all of the options that are available. I've also never been much good at writing my own functions. I think this will do what I need, which will save me a ton of time rather than manually going through and fixing the values by hand. Thanks so much!
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 814
#6

06 Jan 2023, 12:13

Great, Chris, happy to help!

I like both R and Stata, usually for different things. I learned R first, and prefer to do data management type tasks in R, but I definitely prefer to do statistical analysis work in Stata. If you're the type who prefers to avoid writing your own functions, I think you're going to grow to really like Stata.
Comment

Announcement