Improving Code Speed

Tom Schaars

Join Date: Apr 2019

Posts: 3
#1

Improving Code Speed

26 Apr 2019, 09:33

Hello Statalist,

After much blood sweat and tears I managed to successfully find a working solution to my problem, however the speed at which this code executes is horrible and I think my solution can be improved upon but I am unable to find how to do so myself.

A quick description of my goal:

I have sorted a bunch of data by 'modelname' and 'nvals'. I have a total of 70k observations but the main structure of the data is as follows:

modelname nvals modelrank (want)

beetle 1 1

beetle 0 1

beetle 0 1

beetle 0 1

megane 1 2

megane 0 2

megane 0 2

Z4 1 3

Z4 0 3

As you can see by the way I have sorted the first of 'nvals' is always 1 for the first instance of a modelname, and the rest is 0
This is the case for all 70k observations and 154 different values of 'modelname'

Now I wanted to create a new variable called 'modelrank' that is 1 for every instance of 'beetle', 2 for every instance of 'megane', 3 for every instance of 'Z4', etc etc.
It is important that this rank is created in the sequence that the data is currently in as the data has also been sorted by the number of times each unique 'modelname' occurs in the dataset.

The solution I have found is as follows, but it takes about 4mins to completely classify the dataset.
gen modelrank = 5000
gen modelcounter = 0

forvalues i = 1/70807 {

replace modelcounter = modelcounter + 1 if nvals[`i'] == 1

replace modelrank = modelcounter[`i'] if _n == `i'

}
This code starts the 'modelcounter' at zero and increments by 1 every time it encounters a 1 in 'nvals', which coincides with the first instance of a new modelname.

At the end of the for-loop I replace 'modelrank' with whatever modelcounter is currently set to, but the notation is a workaround the fact that I cant use square brackets on the LHS of the equal sign (first I had this line set to "replace modelrank[`i'] = modelcounter[`i']" but this gives me the error 'weights not allowed')

Could someone point me to a better solution?
Tags: categorical, forvalues, label, loop, syntax
daniel klein

Join Date: Mar 2014

Posts: 3912
#2

26 Apr 2019, 09:56

Starting from where you are now, you want

Code:

generate modelrank = sum(nvals)

Best
Daniel
1 like
Comment
Tom Schaars

Join Date: Apr 2019

Posts: 3
#3

26 Apr 2019, 10:06

jesus that was easy

thanks!
Comment
Michel Diatta

Join Date: Jan 2020

Posts: 6
#4

30 Jan 2020, 14:08

Hello all,

Can somebody tell me why do I have no observations error when tabulating variables I've just created ?

Thanks in advance!
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2450
#5

30 Jan 2020, 14:18

Michel, on what I think is your first post here, you've unfortunately done several things that make it almost certain that you will not get an answer. First of all, you've posted your question as part of a thread that doesn't have anything to do with your data or question. Second, you have not shown us a example of your data, as described in the FAQ (see -dataex- in the FAQ). Third, you have not shown us your actual code and the error message that Stata gave, that is, the code that created your variables, the code you used to tabulate them, and the content of Stata's error message. Without those things, it's nearly impossible to answer a question. I'd encourage you to repost your question under a new subject, with these omissions fixed. If you do that, there's a good chance you'll get a helpful answer.
2 likes
Comment
Michel Diatta

Join Date: Jan 2020

Posts: 6
#6

30 Jan 2020, 14:31

I understand Mike. Thank you for these important remarks.
Actually I'm Learning how to use the statalist. Still have to figure out how to do it in a more efficient way. Will be back tomorrow with the details you mentioned.

Thanks again.
Comment

modelname	nvals	modelrank (want)
beetle	1	1
beetle	0	1
beetle	0	1
beetle	0	1
megane	1	2
megane	0	2
megane	0	2
Z4	1	3
Z4	0	3

Announcement

Improving Code Speed

Comment

Comment

Comment

Comment

Comment