Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • A Stata API for Python

    We are considering working on a Stata API for Python (StataPy). Just wanted to know if others out there might also find it useful enough.

    StataPy would serve those who prefer to do data cleaning, preprocessing, and manipulation of their data in Python (Pandas), with all the advantages of a proper scripting language, the power of Pandas' treatment of tabular data, and the advanced and elegant graphics libraries available in Python. When using StataPy, estimation results also remain in Python, so that tabular and graphical presentation materials can be scripted flexibly.

    However, because Stata often has the standard or best implementations of popular econometric estimation algorithms, StataPy's role is to give access to Stata's estimation procedures on Python (Pandas) data frames.

    The API allows estimates to be carried out in Stata (or R), generating input .dta and .do files behind the scenes, and parsing results behind the scenes so that they are returned back to Python as Pandas dataframes / etc.

    A complementary Python package will give extra help in formatting tables of estimate data in LaTeX, although Pandas already provides plenty of export options.

    Example:
    Code:
    import statapy
    ...
    # Preprocess the data
    ...
    # Use Stata (or R) models to:
    # Run an individual estimate
    statapy.models.lowess(engine="Stata", data, "mpg", "length").plot()
    # Run several estimates, to be arranged into a table
    Table = statapy.RegressionTable("Table 1")
    Table.add(statapy.models.regress("estimate1", engine="Stata", data,   "price", ["length", "width", "mpg"], vce="robust", beta=True))
    Table.add(statapy.models.regress("estimate2", engine="Stata", data,   "price", ["length", "width", "mpg"], vce="cluster county"))
    ...
    # Access model results
    Table.estimatesDataframe()        # Return DataFrame of collected parametric estimates from Stata
    Table.estimate(name="estimate1")  # Access individual estimate
    Table.estimate(index=0).covarr()  # Access individual estimate's covariance matrix
    ...
    5
    Interested
    50.00%
    5
    I'd like to help
    10.00%
    1
    I'd use it
    30.00%
    3
    I have ideas
    10.00%
    1
    Not interested
    0%
    0
    Need already served
    0%
    0

  • #2
    An initial build of StataPy is out now! Clone the repo, install and run demo_Statapy.py

    StataPy allows you to use Stata models while keeping your data and results in Python. You can also add the results into a LaTeX file on the fly.

    We will put up the package on PyPi soon. We need more testers for different OSs and versions of Stata, and contributors to add support for more models. Test and raise a pull request if you are on Windows/MacOS.

    Here's the demo workflow:

    Code:
    import statapy
    
    statapy.dependencyCheck()
    
    # Create a new StatPy Project object
    proj = statapy.Project("demo")
    
    # Import the 'auto' dataset from Stata, with categorical variables
    types = {'foreign': 'category'}
    df = pd.read_csv("auto.csv", dtype=types)
    
    # Add a new DataSet
    proj.addData("testData", df)
    
    # Add a Correlation Table
    proj.addEstimate(None, "dux", "testData", "correlate",
                     ["price", "mpg", "rep78", "headroom", "trunk"], "aw=price")
    
    # Creat a Regression Table to hold compile estimate results
    proj.createRegressionTable("Table")
    
    # Add multiple parametric estimates
    proj.addEstimate("Table", "foo", df, "ologit", "foreign", [
                      "rep78", "mpg", "length", "weight"],
                      vce="cluster make", level=90, weight="fw=trunk"
              )
    
    proj.addEstimate("Table", "bar", df, "regress",
                      "price", ["length", "weight", "mpg"],
                      vce="cluster make", level=90, beta=True, weight="aw=rep78",
                      stataOptions="plus"
                      )
    
    proj.addEstimate("Table", "roo", df, "logit", "foreign", [
                       "rep78", "mpg", "length", "weight"],
                       vce="cluster make", level=90, weight="fw=trunk"
               )
    
    # Group estimates together
    proj.setGroupName('Roo$^2$', ['roo'])
    
    # Add the regression table to the LaTeX file
    proj.appendRegressionTable("Table")
    
    # Add a non-parametric plot
    proj.addEstimate(None, "koo", df, "lpoly", "price", "length",
                      degree=1, level=90, weight="fw=trunk", at="length",
                      names={"x": "length", "y": "price"}
                      )
    
    # Compile the project into a TeX and PDF file
    proj.closeAndCompile()
    Attached Files
    Last edited by Parth Dhar; 11 Jul 2019, 15:27.

    Comment

    Working...
    X