Adding a new variable on Stata that has different dimensions than the variables from a dataset that is in use

Dominique Bourget

Join Date: Sep 2019

Posts: 43
#1

Adding a new variable on Stata that has different dimensions than the variables from a dataset that is in use

19 Oct 2019, 04:00

Hello,

Is there a way to add a new variable on Stata that has different dimensions than the dataset that is in use, without first clearing the variable list?
For example,

Code:

use pharmacy_small

// calculate probability of each class on the test set
// '[:, 1]' at the end extracts the probability for each pharmacy to be under compliance
python: Y_mnb_score = mnb.fit(X_train, np.ravel(Y_train)).predict_proba(X_test)[:, 1]

// transfer the python variables Y_mnb_score as a Stata variable
python: Data.setObsTotal(len(Y_mnb_score))
python: Data.addVarFloat('mnbScore')
python: Data.store(var = 'mnbScore', obs = None, val = Y_mnb_score) // error: 'number of observations to set exceeds the limit of observations'

The error doesn't pop up when I first clear off the variables from the pharmacy_small data set and then execute the lines of python codes on Stata, but when I try to execute the python portion without first clearing out the dataset pharmacy_small, the error appears. I think the error pops up because the dimension of the variable 'mnbScore' that I am trying to add does not match with the dimension of the variables from the pharmacy_small dataset.

Is there a way to add a new variable on Stata that has different dimensions than the dataset that is in use, without first clearing the variable list?

Thank you,
Tags: None

William Lisowski

Join Date: Dec 2014
Posts: 10150

19 Oct 2019, 06:23

Let me start with a request that will make it easier to address your questions.

Please take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ. Section 12 begins

Help us to help you by producing self-contained questions with reproducible examples that explain your data, your code, and your problem.

The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

The reproducible example below shows us that the error message is misleading. The problem arises when the number of observations is less than the current number of observations in the dataset. The setObsTotal() function uses Stata's set obs command, and you are not allowed to reduce the number of observations in an existing dataset in that way, and you probably do not want to do so.

However, omitting the setObsTotal then tells us that the length of the Python list must match the number of observations.

Code:

. do "/var/folders/xr/lm5ccr996k7dspxs35yqzyt80000gp/T//SD78107.000000"

. clear all

. set obs 3
number of observations (_N) was 0, now 3

. generate x = 3

. list, clean

       x  
  1.   3  
  2.   3  
  3.   3  

.
. python:
----------------------------------------------- python (type end to exit) ----------------------
>>> from sfi import Data
>>> Data.setObsTotal(5)
>>> Data.addVarFloat('y')
>>> Data.store(var = 'y', obs = None, val = [5,5,5,5,5])
>>> end
------------------------------------------------------------------------------------------------

.
. list, clean

       x   y  
  1.   3   5  
  2.   3   5  
  3.   3   5  
  4.   .   5  
  5.   .   5  

.
. python:
----------------------------------------------- python (type end to exit) ----------------------
>>> Data.setObsTotal(4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Applications/Stata/ado/base/py/sfi.py", line 1294, in setObsTotal
    return _stp._st_setobstotal(nobs)
ValueError: number of observations to set exceeds the limit of observations
(2 lines skipped)
------------------------------------------------------------------------------------------------
r(7102);

end of do-file

r(7102);

. do "/var/folders/xr/lm5ccr996k7dspxs35yqzyt80000gp/T//SD78107.000000"

. python:
----------------------------------------------- python (type end to exit) ----------------------
>>> from sfi import Data
>>> Data.addVarFloat('z')
>>> Data.store(var = 'z', obs = None, val = [4,4,4,4])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Applications/Stata/ado/base/py/sfi.py", line 1426, in store
    raise ValueError("length of value does not match number of observations")
ValueError: length of value does not match number of observations
(0 lines skipped)
------------------------------------------------------------------------------------------------
r(7102);

end of do-file

r(7102);

.

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

19 Oct 2019, 19:29

I now learn that reason for the error that "the length of the Python list must match the number of observations" in post #2 is beecause "None" was specified as the value of the obs argument to Data.store - which I copied from the original example in post #1. The Stata documentation for Data.store() explains how to specify specific observations.
Comment

Announcement