Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Storing a database from python into a Stata Dataframe

    So I've been working on this problem for some time and have not figured a solution. I want to be able to store a dataframe created on or worked on from python directly into Stata.
    Here is my attempt at this solution

    Code:
    frame create ds
    python
    import pandas as pd
    import sfi as sfi
    from sfi import Frame, Data
    from pandas import Series, DataFrame
    data = {'school': ['UCSC', 'UCLA', 'UCD', 'UCSB', 'UCI', 'UCSF'],'year': [2000, 2001, 2002, 2001, 2002, 2003], 'num': [4000, 3987, 5000, 4321, 5000, 8200]}
    ds = pd.DataFrame(data)
    type(ds)
    now lets say I want to store year. This is what I wrote

    Code:
    Data.addVarDouble('year')
    Data.store('year',None,ds['year'])
    end
    tab year

    which produces a new column named year and gives me no errors, but unfortunately does not store the desired values in the variable. On top of that, there does not seem to be a way to do it with the whole database (unless they're all of the same time and loop over them and do this procedure which does not work)

  • #2
    Kevin Gonzalez
    The main thing causing your trouble is that you need to set the number of observations in the dataset or frame before you store the values.

    Code:
    clear
    python
    import pandas as pd
    import sfi as sfi
    from sfi import Frame, Data
    from pandas import Series, DataFrame
    data = {'school': ['UCSC', 'UCLA', 'UCD', 'UCSB', 'UCI', 'UCSF'],'year': [2000, 2001, 2002, 2001, 2002, 2003], 'num': [4000, 3987, 5000, 4321, 5000, 8200]}
    
    ds = pd.DataFrame(data)
    Data.setObsTotal(len(ds))
    Data.addVarStr('school', 1)
    Data.addVarDouble('year')
    Data.addVarDouble('num')
    Data.store(None, None, ds.values.tolist())
    end
    list
    If you wanted to actually store the data directly in a Stata frame, you could do the following.

    Code:
    frames reset
    frame create ds
    python
    import pandas as pd
    import sfi as sfi
    from sfi import Frame, Data
    from pandas import Series, DataFrame
    data = {'school': ['UCSC', 'UCLA', 'UCD', 'UCSB', 'UCI', 'UCSF'],'year': [2000, 2001, 2002, 2001, 2002, 2003], 'num': [4000, 3987, 5000, 4321, 5000, 8200]}
    
    ds = pd.DataFrame(data)
    f = Frame.connect("ds")
    f.setObsTotal(len(ds))
    f.addVarStr('school', 1)
    f.addVarDouble('year')
    f.addVarDouble('num')
    f.store(None, None, ds.values.tolist())
    end
    frame ds : list

    Comment


    • #3
      Zhao Xu (StataCorp)

      Thank you! I was able to replicate the results.
      It does seem somewhat limited when it comes to moving large databases (since one would have to generate many addVar, albeit maybe a loop could work). Do you know if there are any workarounds for this?

      Comment


      • #4
        Kevin Gonzalez
        Yes. A loop needs to be created to add the variables to Stata based on the Pandas data types. This example shown below will work for numeric and string data.

        Code:
        clear all
        python
        import pandas as pd
        from sfi import Data, SFIToolkit
        
        data = {'school': ['UCSC', 'UCLA', 'UCD', 'UCSB', 'UCI', 'UCSF'],'year': [2000, 2001, 2002, 2001, 2002, 2003], 'num': [4000, 3987, 5000, 4321, 5000, 8200]}
        
        ds = pd.DataFrame(data)
        Data.setObsTotal(len(ds))
        
        # get the column names
        colnames = ds.columns
        
        for i in range(len(colnames)):
            dtype = ds.dtypes[i].name
            # make a valid Stata variable name
            varname = SFIToolkit.makeVarName(colnames[i])
            varval = ds[colnames[i]].values.tolist()
            if dtype == "int64":
                Data.addVarInt(varname)
                Data.store(varname, None, varval)
            elif dtype == "float64":
                Data.addVarDouble(varname)
                Data.store(varname, None, varval)
            elif dtype == "bool":
                Data.addVarByte(varname)
                Data.store(varname, None, varval)
            else:
                # all other types store as a string
                Data.addVarStr(varname, 1)
                s = [str(i) for i in varval] 
                Data.store(varname, None, s)
        
        end
        list

        Comment


        • #5
          Thank you, very helpful.
          Markos

          Comment

          Working...
          X