A Stata API for Python

Parth Dhar

Join Date: May 2019
Posts: 2

A Stata API for Python

16 May 2019, 08:22

We are considering working on a Stata API for Python (StataPy). Just wanted to know if others out there might also find it useful enough.

StataPy would serve those who prefer to do data cleaning, preprocessing, and manipulation of their data in Python (Pandas), with all the advantages of a proper scripting language, the power of Pandas' treatment of tabular data, and the advanced and elegant graphics libraries available in Python. When using StataPy, estimation results also remain in Python, so that tabular and graphical presentation materials can be scripted flexibly.

However, because Stata often has the standard or best implementations of popular econometric estimation algorithms, StataPy's role is to give access to Stata's estimation procedures on Python (Pandas) data frames.

The API allows estimates to be carried out in Stata (or R), generating input .dta and .do files behind the scenes, and parsing results behind the scenes so that they are returned back to Python as Pandas dataframes / etc.

A complementary Python package will give extra help in formatting tables of estimate data in LaTeX, although Pandas already provides plenty of export options.

Example:

Code:

import statapy
...
# Preprocess the data
...
# Use Stata (or R) models to:
# Run an individual estimate
statapy.models.lowess(engine="Stata", data, "mpg", "length").plot()
# Run several estimates, to be arranged into a table
Table = statapy.RegressionTable("Table 1")
Table.add(statapy.models.regress("estimate1", engine="Stata", data,   "price", ["length", "width", "mpg"], vce="robust", beta=True))
Table.add(statapy.models.regress("estimate2", engine="Stata", data,   "price", ["length", "width", "mpg"], vce="cluster county"))
...
# Access model results
Table.estimatesDataframe()        # Return DataFrame of collected parametric estimates from Stata
Table.estimate(name="estimate1")  # Access individual estimate
Table.estimate(index=0).covarr()  # Access individual estimate's covariance matrix
...

5 Votes

Interested	50.00%	5 votes
I'd like to help	10.00%	1 vote
I'd use it	30.00%	3 votes
I have ideas	10.00%	1 vote
Not interested	0%	0 votes
Need already served	0%	0 votes

Tags: API, python, regression, stata, syntax

Parth Dhar

Join Date: May 2019
Posts: 2

11 Jul 2019, 14:31

An initial build of StataPy is out now! Clone the repo, install and run demo_Statapy.py

StataPy allows you to use Stata models while keeping your data and results in Python. You can also add the results into a LaTeX file on the fly.

We will put up the package on PyPi soon. We need more testers for different OSs and versions of Stata, and contributors to add support for more models. Test and raise a pull request if you are on Windows/MacOS.

Here's the demo workflow:

Code:

import statapy

statapy.dependencyCheck()

# Create a new StatPy Project object
proj = statapy.Project("demo")

# Import the 'auto' dataset from Stata, with categorical variables
types = {'foreign': 'category'}
df = pd.read_csv("auto.csv", dtype=types)

# Add a new DataSet
proj.addData("testData", df)

# Add a Correlation Table
proj.addEstimate(None, "dux", "testData", "correlate",
                 ["price", "mpg", "rep78", "headroom", "trunk"], "aw=price")

# Creat a Regression Table to hold compile estimate results
proj.createRegressionTable("Table")

# Add multiple parametric estimates
proj.addEstimate("Table", "foo", df, "ologit", "foreign", [
                  "rep78", "mpg", "length", "weight"],
                  vce="cluster make", level=90, weight="fw=trunk"
          )

proj.addEstimate("Table", "bar", df, "regress",
                  "price", ["length", "weight", "mpg"],
                  vce="cluster make", level=90, beta=True, weight="aw=rep78",
                  stataOptions="plus"
                  )

proj.addEstimate("Table", "roo", df, "logit", "foreign", [
                   "rep78", "mpg", "length", "weight"],
                   vce="cluster make", level=90, weight="fw=trunk"
           )

# Group estimates together
proj.setGroupName('Roo$^2$', ['roo'])

# Add the regression table to the LaTeX file
proj.appendRegressionTable("Table")

# Add a non-parametric plot
proj.addEstimate(None, "koo", df, "lpoly", "price", "length",
                  degree=1, level=90, weight="fw=trunk", at="length",
                  names={"x": "length", "y": "price"}
                  )

# Compile the project into a TeX and PDF file
proj.closeAndCompile()

Attached Files

demo.pdf (200.8 KB, 1 view)

Last edited by Parth Dhar; 11 Jul 2019, 15:27.

Announcement

A Stata API for Python

Comment