No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Improving Code Speed

    Hello Statalist,

    After much blood sweat and tears I managed to successfully find a working solution to my problem, however the speed at which this code executes is horrible and I think my solution can be improved upon but I am unable to find how to do so myself.

    A quick description of my goal:

    I have sorted a bunch of data by 'modelname' and 'nvals'. I have a total of 70k observations but the main structure of the data is as follows:
    modelname nvals modelrank (want)
    beetle 1 1
    beetle 0 1
    beetle 0 1
    beetle 0 1
    megane 1 2
    megane 0 2
    megane 0 2
    Z4 1 3
    Z4 0 3
    As you can see by the way I have sorted the first of 'nvals' is always 1 for the first instance of a modelname, and the rest is 0
    This is the case for all 70k observations and 154 different values of 'modelname'

    Now I wanted to create a new variable called 'modelrank' that is 1 for every instance of 'beetle', 2 for every instance of 'megane', 3 for every instance of 'Z4', etc etc.
    It is important that this rank is created in the sequence that the data is currently in as the data has also been sorted by the number of times each unique 'modelname' occurs in the dataset.

    The solution I have found is as follows, but it takes about 4mins to completely classify the dataset.
    gen modelrank = 5000
    gen modelcounter = 0

    forvalues i = 1/70807 {

    replace modelcounter = modelcounter + 1 if nvals[`i'] == 1

    replace modelrank = modelcounter[`i'] if _n == `i'

    This code starts the 'modelcounter' at zero and increments by 1 every time it encounters a 1 in 'nvals', which coincides with the first instance of a new modelname.

    At the end of the for-loop I replace 'modelrank' with whatever modelcounter is currently set to, but the notation is a workaround the fact that I cant use square brackets on the LHS of the equal sign (first I had this line set to "replace modelrank[`i'] = modelcounter[`i']" but this gives me the error 'weights not allowed')

    Could someone point me to a better solution?

  • #2
    Starting from where you are now, you want

    generate modelrank = sum(nvals)


    • #3
      jesus that was easy



      • #4
        Hello all,

        Can somebody tell me why do I have no observations error when tabulating variables I've just created ?

        Thanks in advance!


        • #5
          Michel, on what I think is your first post here, you've unfortunately done several things that make it almost certain that you will not get an answer. First of all, you've posted your question as part of a thread that doesn't have anything to do with your data or question. Second, you have not shown us a example of your data, as described in the FAQ (see -dataex- in the FAQ). Third, you have not shown us your actual code and the error message that Stata gave, that is, the code that created your variables, the code you used to tabulate them, and the content of Stata's error message. Without those things, it's nearly impossible to answer a question. I'd encourage you to repost your question under a new subject, with these omissions fixed. If you do that, there's a good chance you'll get a helpful answer.


          • #6
            I understand Mike. Thank you for these important remarks.
            Actually I'm Learning how to use the statalist. Still have to figure out how to do it in a more efficient way. Will be back tomorrow with the details you mentioned.

            Thanks again.