Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Some numbers not rounded

    I am trying to create a figure showing the slope of a line and testing whether the slope is negative or not. However, for some p-values (in the example below, for one of the three graphs), it returns many more digits than the three I specified in my code. Why could this be? Thank you!

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int year float a int b
    2005 1  15
    2006 1  15
    2007 1  21
    2008 1  20
    2009 1  26
    2010 1  32
    2011 1  37
    2012 1  42
    2013 1  48
    2014 1  60
    2015 1  57
    2016 1  86
    2017 1  77
    2018 1  66
    2019 1  61
    2020 1  82
    2021 1 101
    2007 2  21
    2008 2  13
    2009 2  28
    2010 2  38
    2011 2  47
    2012 2  33
    2013 2  39
    2014 2  48
    2015 2  47
    2016 2  47
    2017 2  33
    2018 2  29
    2019 2  24
    2020 2  26
    2021 2  29
    2008 3   0
    2009 3   1
    2010 3   1
    2011 3   2
    2012 3   6
    2013 3   4
    2014 3   5
    2015 3   1
    2016 3   5
    2017 3   3
    2018 3   6
    2019 3   4
    2020 3   2
    2021 3   1
    end
    Code:
    forvalues id = 1/3 {
    sum year if a ==`id'
    local minyear`id' = r(min)
    sum b if a ==`id'
    local maxcite`id' = r(max)
    reg b year if a ==`id'
    scalar scalar_slope`id' = round(_b[year],.01)
    local slope`id' = scalar_slope`id'
    local t = _b[year]/_se[year]
    scalar scalar_p`id' = round(2*ttail(e(df_r),abs(`t')),0.001)
    local p`id' = scalar_p`id'
    twoway scatter b year if a ==`id' , msize(vsmall) ///
    || lfit b year if a ==`id' ///
    , text(0 2021 "Slope = `slope`id'', p-value = `p`id''" ///
    , size(vsmall) place(nw) just(right)) ///
    xtitle("") xlabel(`minyear`id'' 2021, labsize(small)) ///
    ylabel(0 `maxcite`id'', labsize(small)) ///
    name(scatter`id', replace) nodraw
    }
    grc1leg2 scatter1 scatter2 scatter3 ///
    , labsize(tiny ) lrows(1) symxsize(.5cm) symysize(.1cm)

  • #2
    The -round()- function works best for rounding to nearest integer or fractional powers of 2. When you try to round to a certain number of decimal places you run into the problem that most decimal numbers have no exact finite precision binary representation (just as 1/3 has non finite precision decimal representation). So round takes you to the nearest binary approximation of the nearest multiple of the round (.001) in your case, and it won't be a number that fits in 3 decimal places. In any case, you don't really want to change the value of the p-value: you want to change the way it's displayed. So -round()- isn't the appropriate function anyway. Try
    Code:
    local p`id': display %05.3f =2*ttail(e(df_r),abs(`t'))
    instead of the -scalar scalar_p`id'...- and subsequent -local p`id' = scalar_p`i'd- commands. Similar for the other things you tried to "round."

    Comment


    • #3
      Here is a rewrite of your code. It won't get you all the way to where you want, and there may well be cosmetic or other changes you want to make, especially if the data example is only partial or fake or a bit of both.

      I build on Clyde Schechter's point about using display and a specified format: see also Section 4 of https://journals.sagepub.com/doi/pdf...867X1801800311

      One way to get variable text into different panels of a multi-panel plot would be to use value labels. as here.

      Another way is to define a variable with differing text and use that as a marker label. On that see https://journals.sagepub.com/doi/pdf...6867X211063413

      In general, getting several linked graphs using a by() option can much improve on getting several graphs and then using graph combine implicitly or explicitly. Naturally, if the graphs to be combined are quite different, then you do need graph combine, but there I don't think you do. On this as a small strategy, see https://journals.sagepub.com/doi/pdf...36867X20976341

      On putting saved results into locals or scalars:

      1. scalar keeps a little more precision than local.

      2. It is often a little dopey to put something first into a scalar, and then also into a local, or vice versa. Choose one or the other depending on purpose.



      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int year float a int b
      2005 1  15
      2006 1  15
      2007 1  21
      2008 1  20
      2009 1  26
      2010 1  32
      2011 1  37
      2012 1  42
      2013 1  48
      2014 1  60
      2015 1  57
      2016 1  86
      2017 1  77
      2018 1  66
      2019 1  61
      2020 1  82
      2021 1 101
      2007 2  21
      2008 2  13
      2009 2  28
      2010 2  38
      2011 2  47
      2012 2  33
      2013 2  39
      2014 2  48
      2015 2  47
      2016 2  47
      2017 2  33
      2018 2  29
      2019 2  24
      2020 2  26
      2021 2  29
      2008 3   0
      2009 3   1
      2010 3   1
      2011 3   2
      2012 3   6
      2013 3   4
      2014 3   5
      2015 3   1
      2016 3   5
      2017 3   3
      2018 3   6
      2019 3   4
      2020 3   2
      2021 3   1
      end
      
      forvalues id = 1/3 {
         reg b year if a ==`id'
         local slope: di %4.2f _b[year]
         local p : di %05.3f 2*ttail(e(df_r),abs(_b[year]/_se[year])) 
         if "`p'" == "0.000" local p "< 0.0005"
         else local p "= `p'"
         label def a `id' "`id': Slope = `slope', {it:P} `p'", modify 
      }
      
      label val a a 
      
      twoway scatter b year, by(a, note("") yrescale legend(off)) || lfit b year, xtitle("") ytitle("b")
      Click image for larger version

Name:	florian.png
Views:	1
Size:	39.5 KB
ID:	1718008

      Comment

      Working...
      X