Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • scatter matrix in python via Stata16

    Hi Statalisters,

    With the release of Stata16 and the fantastic integration of Python. I have decided to walk myself through in the Python world. I wanted to replicate a Stata code for graphing individual scatter matrix through for loops in Python, as Python gives me the opportunity of including kernel density.

    The Stata syntax I need to replicate:
    Code:
    *create data
    clear
    set seed 25656894
    set obs 200
    gen weight = (runiform()+1) *50
    gen foreign = runiformint(1, 3)
    sort foreign
    keep weight foreign
    expand 3
    egen id2 = cut(weight), group(3)
    sort id2
    bysort id2: gen id = _n
    drop id2 foreign
    sort id
    bysort id: gen grp = _n
    reshape wide weight, i(id) j(grp)
    replace weight3= (runiform()+1) *50 if weight3==.
    egen rec = cut(weight1), group(4)
    
    *graph scatter matrix
    forv i=0/3 {
    foreach w in weight  {
        graph matrix `w'1 `w'2 `w'3 if rec==`i' ,  ///
            name(`w'`i', replace) xsize(4) scheme(s1color)
        }
    }
    save "py_example.dta", replace
    The individual image is attached here:

    Now if I call the above data to plot with Python, the graph does not plot individual for each rec as I indicated above (if rec==`i'). Rather it colors based on categories of rec. I need help with adjusting my Python aspect of the code to replicate what I have specified in my Stata code above:

    Code:
    *python
    python clear
    python:
    import pandas as pd
    import seaborn as sns, numpy as np
    import matplotlib.pyplot as plt
    data = pd.read_stata('py_example.dta')
    print(data)
    sns.pairplot(data, kind="scatter", diag_kind="kde", vars=["weight1", "weight2", "weight3"], hue="rec")
    plt.show()
    end
    The graph from python is also attached. Any help will be appreciated

    Thanks you
    Attached Files
    Last edited by Madu Abuchi; 22 Jul 2019, 22:44.

  • #2
    The problem here is you export all your data in py_example.dta and then import everything in Python and plot everything, without a loop. The plot is quite nice as is, but if you want four plots, you may split your pandas dataframe before plotting:

    Code:
    for rec, data_rec in data.groupby("rec"):
        print(data_rec)
        sns.pairplot(data_rec, kind="scatter", diag_kind="kde", vars=["weight1", "weight2", "weight3"])
        plt.show()

    Comment


    • #3
      Hi Jean-Claude, thank you. That solved my problem.
      However, I noticed when the first graph shows, others don't come up two like it would do for Stata, except if I close the 1st then the 2nd will show, I close the 2nd the 3rd will show, etc.
      Is there a way I can program the graphs to appear all individually?

      Comment


      • #4
        Add 'plt.figure()' before plotting to open a new window.

        Comment


        • #5
          Great! thank you

          Comment

          Working...
          X