Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Benford's Law on last digit

    Hi,

    I am a newbie to Stata (and statistics). I believe I understand how Benford's Law is applied and have found several programs that will test first digits and report on expected and actual frequencies. Is it possible to change any of the programs to test last digits? I was interested to see if this could help me identify if a series I have with height and weight could be analysed for fraud.

    Thank you in advance,

    Doron

  • #2
    Naturally there is no expectation that Benford's Law applies to last digits. Digit preference there might be, and the null hypothesis is, unless you have grounds for thinking otherwise, a uniform distribution on the digits 0 to 9.

    The last digit of a variable containing integers is just mod(varname, 10) and presumably you want to feed that to something like a chi-square test.

    It's not clear that anyone needs a new different program for that, e.g.

    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . gen last = mod(price, 10)
    
    . capture ssc inst tab_chi
    
    . chitest last, count
    
    observed frequencies of last; expected frequencies equal
    
             Pearson chi2(9) =  17.6216   Pr =  0.040
    likelihood-ratio chi2(9) =  16.2719   Pr =  0.061
    
      +--------------------------------------------------+
      | last   observed   expected   obs - exp   Pearson |
      |--------------------------------------------------|
      |    0          8      7.400       0.600     0.221 |
      |    1          3      7.400      -4.400    -1.617 |
      |    2          7      7.400      -0.400    -0.147 |
      |    3          4      7.400      -3.400    -1.250 |
      |    4          7      7.400      -0.400    -0.147 |
      |--------------------------------------------------|
      |    5         11      7.400       3.600     1.323 |
      |    6          7      7.400      -0.400    -0.147 |
      |    7          7      7.400      -0.400    -0.147 |
      |    8          4      7.400      -3.400    -1.250 |
      |    9         16      7.400       8.600     3.161 |
      +--------------------------------------------------+
    The last digit of a measured variable requires a little more care: e.g. in the same dataset

    Code:
    .  gen last2 = real(substr(string(gear_ratio, "%3.2f"), -1,1))
    
    . tab last2
    
          last2 |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |          4        5.41        5.41
              1 |          5        6.76       12.16
              2 |          1        1.35       13.51
              3 |         21       28.38       41.89
              4 |          7        9.46       51.35
              5 |          7        9.46       60.81
              6 |          4        5.41       66.22
              7 |          9       12.16       78.38
              8 |         14       18.92       97.30
              9 |          2        2.70      100.00
    ------------+-----------------------------------
          Total |         74      100.00
    
    . chitest last2, count
    
    observed frequencies of last2; expected frequencies equal
    
             Pearson chi2(9) =  44.6486   Pr =  0.000
    likelihood-ratio chi2(9) =  40.6277   Pr =  0.000
    
      +---------------------------------------------------+
      | last2   observed   expected   obs - exp   Pearson |
      |---------------------------------------------------|
      |     0          4      7.400      -3.400    -1.250 |
      |     1          5      7.400      -2.400    -0.882 |
      |     2          1      7.400      -6.400    -2.353 |
      |     3         21      7.400      13.600     4.999 |
      |     4          7      7.400      -0.400    -0.147 |
      |---------------------------------------------------|
      |     5          7      7.400      -0.400    -0.147 |
      |     6          4      7.400      -3.400    -1.250 |
      |     7          9      7.400       1.600     0.588 |
      |     8         14      7.400       6.600     2.426 |
      |     9          2      7.400      -5.400    -1.985 |
      +---------------------------------------------------+
    If you need to spell out that some digits have zero observed frequency, see chitesti from the downloaded package.
    Last edited by Nick Cox; 24 Jun 2016, 08:53.

    Comment


    • #3
      Thank you, Nick.

      Comment

      Working...
      X