Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Store results of a statsby for use in a graph

    Hi, I am trying to create a graph that depicts average age that a child gets their first phone by a demographic variable (in this example, race). I am using Stata 17 and have used statsby to collect the mean age of first phone, lower bound, upperbound, and number of observations in each race category.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float id long race_cat float floor_age_first_sp
      1 3  .
      2 3 12
      3 3 12
      4 4  6
      5 3 12
      6 3  .
      7 1  7
      8 4 13
      9 1 11
     10 1 10
     11 1 13
     12 1  7
     13 1 12
     14 1 15
     15 3 12
     16 1 13
     17 1 12
     18 3  7
     19 3  .
     20 . 11
     21 3 11
     22 1 13
     23 3 12
     24 3  .
     25 1 12
     26 1 12
     27 1  8
     28 3  .
     29 1  9
     30 4  .
     31 3  .
     32 1 10
     33 . 12
     34 3  .
     35 3  .
     36 3  .
     37 3  9
     38 1  6
     39 1 11
     40 3 11
     41 3 12
     42 3  .
     43 1  9
     44 3  .
     45 3  .
     46 3  9
     47 1  .
     48 4  .
     49 1 13
     50 3  .
     51 3 14
     52 3 12
     53 1  4
     54 4  .
     55 1  .
     56 3 12
     57 1 12
     58 4 11
     59 . 12
     60 1  8
     61 3  .
     62 3 13
     63 1  .
     64 1  .
     65 4  9
     66 3  .
     67 1 11
     68 1 10
     69 3 12
     70 4 13
     71 3 13
     72 4 12
     73 3 11
     74 4  .
     75 3  8
     76 3 10
     77 4 10
     78 3 13
     79 4 12
     80 1 12
     81 1  .
     82 1 13
     83 3 10
     84 1 11
     85 3  .
     86 1 12
     87 1 13
     88 4 11
     89 1 15
     90 3  .
     91 1 13
     92 .  .
     93 3  .
     94 3  9
     95 3  .
     96 3  .
     97 3  .
     98 4 11
     99 4 12
    100 3 11
    end
    label values race_cat enc_race
    label def enc_race 1 "Black", modify
    label def enc_race 3 "Hispanic", modify
    label def enc_race 4 "White", modify
    
    statsby , by(race_cat) clear : ci means floor_age_first_sp
    list
    
    twoway rspike lb ub race_cat , || scatter mean race_cat ///
    ,title("Age child received first smartphone" ,size(medium)) subtitle("{it:by race}" ,size(small))  legend(off) ///
    xla(1 2 3 ,val) xsc(r(.8 3.2)) xtitle("race of child" ,size(small)) ///
    ysc(r(9 12)) ytitle("age child received first smartphone" ,size(small)) ///
    text(9.92038 1 "N = 31" ,place(s)) /// # of observations in group 1 (Black) below lowerbound
    text(10.39561 2 "N = 24" ,place(s)) ///
    text(9.55029 3 "N = 11" ,place(s))
    
    di `r(N)' //nothing happens
    I noticed that running the statsby line produces:
    (running ci on estimation sample)

    Command: ci means floor_age_first_sp
    N: r(N)
    mean: r(mean)
    se: r(se)
    lb: r(lb)
    ub: r(ub)
    level: r(level)
    By: race_cat

    I was hoping to be able to reference the lowerbounds and Ns so that I did not have to input them by hand each time. Something like this:

    Code:
    twoway rspike lb ub race_cat , || scatter mean race_cat ///
    ,title("Age child received first smartphone" ,size(medium)) subtitle("{it:by race}" ,size(small))  legend(off) ///
    xla(1 2 3 ,val) xsc(r(.8 3.2)) xtitle("race of child" ,size(small)) ///
    ysc(r(9 12)) ytitle("age child received first smartphone" ,size(small)) ///
    text(`r(lb) in 1' 1 "N = `r(N) in 1'" ,place(s)) /// # of observations in group 1 (Black) below lowerbound
    text(`r(lb) in 2' 2 "N = `r(N) in 2'" ,place(s)) ///
    text(`r(lb) in 3' 3 "N = `r(N) in 3" ,place(s))
    Doing it by hand is not the end of the world, but it would make my life so much easier to have it done automatically! Essentially, I want the N for each group placed just below the lower bound for that group.

    This is my first post here, so I apologize for any syntax errors or faux pas! Please point them out if I have committed any.

    Thank you!
    Last edited by Addie Sutton; 05 Mar 2023, 20:39.

  • #2
    Thanks for using dataex on your first post.

    r(N) is produced by ci in this case but the problem with your idea is that the only r(N) visible after statsby is the last one calculated and that needs to be accessed immediately. But the information you need to use is in the variable N which gives a route to using it.

    With your data example, I would do something like this:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float id long race_cat float floor_age_first_sp
      1 3  .
      2 3 12
      3 3 12
      4 4  6
      5 3 12
      6 3  .
      7 1  7
      8 4 13
      9 1 11
     10 1 10
     11 1 13
     12 1  7
     13 1 12
     14 1 15
     15 3 12
     16 1 13
     17 1 12
     18 3  7
     19 3  .
     20 . 11
     21 3 11
     22 1 13
     23 3 12
     24 3  .
     25 1 12
     26 1 12
     27 1  8
     28 3  .
     29 1  9
     30 4  .
     31 3  .
     32 1 10
     33 . 12
     34 3  .
     35 3  .
     36 3  .
     37 3  9
     38 1  6
     39 1 11
     40 3 11
     41 3 12
     42 3  .
     43 1  9
     44 3  .
     45 3  .
     46 3  9
     47 1  .
     48 4  .
     49 1 13
     50 3  .
     51 3 14
     52 3 12
     53 1  4
     54 4  .
     55 1  .
     56 3 12
     57 1 12
     58 4 11
     59 . 12
     60 1  8
     61 3  .
     62 3 13
     63 1  .
     64 1  .
     65 4  9
     66 3  .
     67 1 11
     68 1 10
     69 3 12
     70 4 13
     71 3 13
     72 4 12
     73 3 11
     74 4  .
     75 3  8
     76 3 10
     77 4 10
     78 3 13
     79 4 12
     80 1 12
     81 1  .
     82 1 13
     83 3 10
     84 1 11
     85 3  .
     86 1 12
     87 1 13
     88 4 11
     89 1 15
     90 3  .
     91 1 13
     92 .  .
     93 3  .
     94 3  9
     95 3  .
     96 3  .
     97 3  .
     98 4 11
     99 4 12
    100 3 11
    end
    label values race_cat enc_race
    label def enc_race 1 "Black", modify
    label def enc_race 3 "Hispanic", modify
    label def enc_race 4 "White", modify
    
    statsby , by(race_cat) clear : ci means floor_age_first_sp
    list
    
    gen where = 9.2 
    gen toshow = "{it:N} = " + strofreal(N)
    
    twoway rspike lb ub race_cat , || scatter mean race_cat ///
    ,title("Age child received first smartphone" ,size(medium)) subtitle("{it:by race}" ,size(small))  legend(off) ///
    xla(1 2 3 4,val) yla(, ang(h)) xsc(r(.8 4.2)) xtitle("race of child" ,size(small)) ///
    ytitle("age child received first smartphone" ,size(small)) ///
    || scatter where race_cat, ms(none) mla(toshow) mlabpos(0) mlabsize(medium)

    Click image for larger version

Name:	smartphone.png
Views:	1
Size:	17.5 KB
ID:	1704533


    The non-appearance of race_cat 2 is just a side-effect of your data example. You have scope to vary the vertical position of the extra text by making it depend on lb. My wild guess is that most readers would prefer my choice!

    Comment


    • #3
      The idea of putting what you want to show in a string variable used as a marker label is also discussed in https://journals.sagepub.com/doi/pdf...6867X211063413

      Detail: I would use lower case "{it:n}" but the habits of your tribe may vary.

      Comment


      • #4
        Hi Nick, thanks so much for your help, very cool solution! I was able to follow your suggestion by changing the definition of where from 9.2 to lb-.01 but I do prefer the one you created a bit better. Thanks for linking to that article; I will give it a look!
        Thanks again!

        Comment

        Working...
        X