Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • stata command unrecognized r(199) error

    Hello,
    I am trying to run the Stata code below, and everything runs except at the very end I am getting 'the command i unrecognized r(199) error'. How can I avoid this error? I am new to Stata and I am not so sure. I have attached the pharmacy_small.dta file with this post so that you can run the code on your computer.

    STATA CODE:

    clear

    //import the pharmacy_small Stata dataset
    use pharmacy_small

    // change the the variables store_type, area, and compliance into binary categorical variables with 0's and 1's
    generate chain = store_type == "CHAIN"
    generate north = area == "North"

    // numericize all the string categorical variables while retaining the same label
    encode county, generate(county_num)

    python:

    # install sklearn, sfi, numpy, and pandas packages first
    # make sure to install them first!
    from sklearn.naive_bayes import MultinomialNB
    from sklearn.model_selection import GridSearchCV
    from sklearn.model_selection import train_test_split
    from sklearn import metrics # import scikit-learn metrics module for accuracy calculation
    from sfi import Data
    import numpy as np
    import pandas as pd


    # Use the sfi Data class to pull data from Stata variables
    X = pd.DataFrame(Data.get("educate north county_num chain"),
    columns = ['educate', 'north', 'county_num', 'chain'])

    Y = pd.DataFrame(Data.get("compliance"), columns = ['compliance'])

    # split the pharmacy_small dataset into a training and a test set using the python commands
    # splitting data into a test and training set is much easier in Python than in Stata (takes 1 line)
    # 'test_size = 0.25' tells Python that we want to reserve 25% of our data for the test set
    # train_test_split() will automatically shuffle the data before the split
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25)

    end

    clear

    gen Alpha = .
    gen AUC = .
    local i = 0
    range alphas 0.0 1.0 20

    foreach a in alphas {

    i++

    python: a = Data.get("a")

    // predict using the best value for alpha
    python: mnb = MultinomialNB(alpha = a, class_prior = None, fit_prior = True)

    // calculate probability of each class on the test set
    // '[:, 1]' at the end extracts the probability for each pharmacy to be under compliance
    python: Y_mnb_score = mnb.fit(X_train, np.ravel(Y_train)).predict_proba(X_test)[:, 1]

    // make test_compliance python variable
    python: test_compliance = Y_test['compliance']

    // transfer the python variables Y_mnb_score and test_compliance to STATA
    python: Data.setObsTotal(len(Y_mnb_score))
    python: Data.addVarFloat('mnbScore')
    python: Data.store(var = 'mnbScore', obs = None, val = Y_mnb_score)

    python: Data.setObsTotal(len(test_compliance))
    python: Data.addVarFloat('testCompliance')
    python: Data.store(var = 'testCompliance', obs = None, val = test_compliance)

    roctab testCompliance mnbScore
    replace AUC = r(area) in `i' // at this point I am getting an error, I think
    replace Alpha = `a'

    }

    Thank you for your help!
    Attached Files

  • #2
    Just inside the loop that starts:
    Code:
    foreach a in alphas {
    
    i++
    you have the illegal command -i++-. There is no command -i-, as Stata is telling you. Presumably the intent here is to increment the value of local macro i. The correct syntax for that is:
    Code:
    local ++i

    Comment


    • #3
      The following demonstration shows how local macro incrementation can occur inline rather than as a separate command, and the difference between `++i' and `i++'.
      Code:
       clear
      
      . set obs 5
      number of observations (_N) was 0, now 5
      
      . generate str8 text = "....."
      
      . local i 0
      
      . foreach a in dog cat frog {
        2. replace text = "`a'" in `++i'
        3. }
      (1 real change made)
      (1 real change made)
      (1 real change made)
      
      . list, clean
      
              text  
        1.     dog  
        2.     cat  
        3.    frog  
        4.   .....  
        5.   .....  
      
      . display "`i'"
      3
      
      . foreach a in wren sparrow {
        2. replace text = "`a'" in `i++'
        3. }
      (1 real change made)
      (1 real change made)
      
      . list, clean
      
                text  
        1.       dog  
        2.       cat  
        3.      wren  
        4.   sparrow  
        5.     .....  
      
      . display "`i'"
      5

      Comment


      • #4
        Note further that the loop starting

        Code:
        foreach a in alphas {
        is a loop over one term, the variable name alphas. In particular, it is not a loop over the distinct values of that variable.

        Comment


        • #5
          Expanding on Nick's answer, and thinking back to your earlier question on the range command at

          https://www.statalist.org/forums/for...missing-values

          perhaps what you want is to call the python MultinomialNB function successively with the values 0, .05, .10, ..., .95, 1.0 for the alpha= argument. In that case I think something like this might do what you want.

          Code:
          forvalues a20=0(1)20 {
              local a = `a20'/20
              python: mnb = MultinomialNB(alpha = `a', class_prior = None, fit_prior = True)
             ...
          which will in succession run the commands
          Code:
              python: mnb = MultinomialNB(alpha = 0, class_prior = None, fit_prior = True)
              python: mnb = MultinomialNB(alpha = .05, class_prior = None, fit_prior = True)
              python: mnb = MultinomialNB(alpha = .10, class_prior = None, fit_prior = True)
              ...
              python: mnb = MultinomialNB(alpha = .95, class_prior = None, fit_prior = True)
              python: mnb = MultinomialNB(alpha = 1, class_prior = None, fit_prior = True)
          Note that since the fraction 1/20 cannot be precisely represented as a floating point number, I choose to index the loop on integer values and recalculate a on each iteration, rather than accumulate an increeasingly imprecise sum of 20 terms.

          Let me add the following more general advice. Your coding suggests that perhaps you are an experienced python user new to Stata? If so, I'm sympathetic to you as a new user of Stata - it's a lot to absorb. And even worse if perhaps you are under pressure to produce some output quickly. Nevertheless, I'd like to encourage you to take a step back from your immediate tasks.

          When I began using Stata in a serious way, I started, as have others here, by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. There are a lot of examples to copy and paste into Stata's do-file editor to run yourself, and better yet, to experiment with changing the options to see how the results change.

          All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu. The objective in doing the reading was not so much to master Stata as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and PDF manuals.

          Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.

          Comment

          Working...
          X