Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • using distplot for categorical variables, pre and post matching

    Hello all
    I was recommended to use -distplot - to demonstrate the goodness of matching pre matching and post matching

    As my amateur solution was initially this:
    - However I had a problem how to overlay these graphs onto one


    Code:
    Prematching:
    
    gen no = 1
    
    graph bar sum(no) , over($treatment) over(Gender) asyvars
    
    
    Post matching: 
    
    gen new1 = _weight*10
    gen newweight = int (new1)    //therefore controls with 0.33 become 0 
    
    graph bar sum(newweight) , over($treatment) over(Gender) asyvars
    However, someone told me to use -distplot-

    I got to this:

    Code:
    gen new=_weight*10
    gen new3=int(new)
    distplot gender [fw=new3]

    The graph doesn't make sense - it's perhaps because I don't understand how to use it properly.

    1. Would you recommend using distplot, and can you advise how to use it
    2. If you wouldn't advise at point 1 can you advise how to overlay both graphs together ?


    Sample data:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(gender smoking infection socialdeprivation ethnicity) double _pscore byte(_treated _support) double(_weight _infection) byte(_id _n1 _n2 _n3) float _nn double _pdif
    1 0 0 7 4 .14629124373657137 0 1  .3333333333333333                 .  1 .  . . 0                    .
    0 0 0 7 1 .18353598336623161 0 1  .3333333333333333                 .  2 .  . . 0                    .
    1 0 0 6 4 .23642515629258887 0 1  .6666666666666666                 .  3 .  . . 0                    .
    1 0 1 4 1 .27681680297995453 0 1  .3333333333333333                 .  4 .  . . 0                    .
    0 0 0 6 1  .2888530710771261 0 1  .3333333333333333                 .  5 .  . . 0                    .
    0 0 0 6 2  .3595669083790513 0 1  .3333333333333333                 .  6 .  . . 0                    .
    1 0 0 3 1  .4088562831270797 0 1  .6666666666666666                 .  7 .  . . 0                    .
    1 0 0 2 1  .5554993045631528 0 1 1.6666666666666665                 .  8 .  . . 0                    .
    1 0 1 1 1  .6930733259357975 0 1 1.3333333333333333                 .  9 .  . . 0                    .
    1 0 0 2 3  .7048182691423677 0 1                  1                 . 10 .  . . 0                    .
    1 1 1 5 1 .17480944488523578 1 1                  1                 0 11 2  1 3 3  .008726538480995832
    1 1 1 5 3  .2881297448796828 1 1                  1 .3333333333333333 12 5  4 3 3 .0007233261974433081
    0 1 1 5 2  .5035916419173779 1 1                  1                 0 13 8  7 6 3  .051907662645774844
    1 1 1 2 1  .5554993045631528 1 1                  1 .3333333333333333 14 8  9 7 3                    0
    1 1 0 2 2  .6333538983172218 1 1                  1 .3333333333333333 15 9 10 8 3   .05971942761857574
    1 1 0 2 2  .6333538983172218 1 1                  1 .3333333333333333 16 9 10 8 3   .05971942761857574
    1 1 1 1 1  .6930733259357975 1 1                  1 .3333333333333333 17 9 10 8 3                    0
    0 1 1 1 1  .8866625742391517 1 0                  .                 . 18 .  . . .                    .
    0 1 1 1 1  .8866625742391517 1 0                  .                 . 19 .  . . .                    .
    0 1 1 1 3  .9372933094185172 1 0                  .                 . 20 .  . . .                    .
    end
    label values gender Gender
    label def Gender 0 "Female", modify
    label def Gender 1 "Male", modify
    label values smoking Smoking
    label def Smoking 0 "Nonsmoker", modify
    label def Smoking 1 "Smoker", modify
    label values socialdeprivation social
    label def social 1 "Most deprived", modify
    label def social 7 "Least deprived", modify
    label values ethnicity Ethnicity
    label def Ethnicity 1 "White", modify
    label def Ethnicity 2 "Asian", modify
    label def Ethnicity 3 "Black African", modify
    label def Ethnicity 4 "Mixed", modify
    label values _treated _treated
    label def _treated 0 "Untreated", modify
    label def _treated 1 "Treated", modify
    label values _support _support
    label def _support 0 "Off support", modify
    label def _support 1 "On support", modify


  • #2
    I come into this thread as an expert on distplot, which is from the Stata Journal, and was written by me. Conversely, I don't know precisely what matching means here, but I can't see that even people who know enough to have a worthwhile opinion have been told what you've done.

    Given gender, which for you is 0 for female and 1 for male, then distplot will show a jump in cumulative probability (1) from 0 to the probability of being female and (2) from that to the probability of being female or male, which is 1. That is not a helpful graph, but it does make sense.

    distplot will show whether two or more distributions match in terms of having similar level, spread and shape, because if so they will have similar cumulative distribution functions. That makes most sense for measured variables or counted variables.

    graph bar results can't usefully be overlaid on distplot results. This is a subtle point and not obvious, but distplot uses the machinery of twoway and isn't compatible with graph bar.

    That's a long way from a complete or really helpful answer, but it's the best I can do.
    Last edited by Nick Cox; 03 Oct 2023, 09:53.

    Comment


    • #3
      Dear Nick,


      yes I'm looking for this

      'distplot will show whether two or more distributions match in terms of having similar level, spread and shape, because if so they will have similar cumulative distribution functions. That makes most sense for measured variables or counted variables'

      Perhaps as I'm not able to overlay graphbar is the reason why -distplot- was recommended. Although to be honest I'm finding it quite difficult to use... so I may stick to my amateur version

      Would you be kind enough to demonstrate sample code that plots two categorical variables pre and post matching?
      Or perhaps as matching is not your specialisation amongst the many you have from what I've seen on this forum, I perhaps will wait for someone else.

      Comment


      • #4
        Thanks for this. You're right that you should get a better answer from someone else on matching. But if you want to compare distributions of categorical variables, you could do worse than dot or bar charts and even chi-square tests. All that said, don't the matching commands provide handles for assessing how well they work? I have no idea!

        Comment


        • #5
          Maybe i'm just being pedantic...I have the stats.... I already have a loveplot... just determined to explore options with barcharts...

          Comment

          Working...
          X