We are considering working on a Stata API for Python (StataPy). Just wanted to know if others out there might also find it useful enough.
StataPy would serve those who prefer to do data cleaning, preprocessing, and manipulation of their data in Python (Pandas), with all the advantages of a proper scripting language, the power of Pandas' treatment of tabular data, and the advanced and elegant graphics libraries available in Python. When using StataPy, estimation results also remain in Python, so that tabular and graphical presentation materials can be scripted flexibly.
However, because Stata often has the standard or best implementations of popular econometric estimation algorithms, StataPy's role is to give access to Stata's estimation procedures on Python (Pandas) data frames.
The API allows estimates to be carried out in Stata (or R), generating input .dta and .do files behind the scenes, and parsing results behind the scenes so that they are returned back to Python as Pandas dataframes / etc.
A complementary Python package will give extra help in formatting tables of estimate data in LaTeX, although Pandas already provides plenty of export options.
Example:
StataPy would serve those who prefer to do data cleaning, preprocessing, and manipulation of their data in Python (Pandas), with all the advantages of a proper scripting language, the power of Pandas' treatment of tabular data, and the advanced and elegant graphics libraries available in Python. When using StataPy, estimation results also remain in Python, so that tabular and graphical presentation materials can be scripted flexibly.
However, because Stata often has the standard or best implementations of popular econometric estimation algorithms, StataPy's role is to give access to Stata's estimation procedures on Python (Pandas) data frames.
The API allows estimates to be carried out in Stata (or R), generating input .dta and .do files behind the scenes, and parsing results behind the scenes so that they are returned back to Python as Pandas dataframes / etc.
A complementary Python package will give extra help in formatting tables of estimate data in LaTeX, although Pandas already provides plenty of export options.
Example:
Code:
import statapy ... # Preprocess the data ... # Use Stata (or R) models to: # Run an individual estimate statapy.models.lowess(engine="Stata", data, "mpg", "length").plot() # Run several estimates, to be arranged into a table Table = statapy.RegressionTable("Table 1") Table.add(statapy.models.regress("estimate1", engine="Stata", data, "price", ["length", "width", "mpg"], vce="robust", beta=True)) Table.add(statapy.models.regress("estimate2", engine="Stata", data, "price", ["length", "width", "mpg"], vce="cluster county")) ... # Access model results Table.estimatesDataframe() # Return DataFrame of collected parametric estimates from Stata Table.estimate(name="estimate1") # Access individual estimate Table.estimate(index=0).covarr() # Access individual estimate's covariance matrix ...
Comment