Rayleigh test and circsummarize command in STATA

Maria Mihailovits

Join Date: Oct 2019

Posts: 2
#1

Rayleigh test and circsummarize command in STATA

01 Oct 2019, 09:53

I try to apply Rayleigh test using circular module in Stata. I try to test whether my data have a monthly peak. Specifically, I try to check whether patients die more often from a specific cause at specific days of month, specific months of year etc. I have already applied chi-square goodness-of-fit and I know that the distribution of my patients is not uniform and I also tried Edward's Seasonality test. However, I know that there is one peak only and I would like to use Rayleigh test too. My variables are months (I tried 1,2,3,4... at the beginning but it didn't work, and I suppose it is due to the fact that circular commands assume that data are recorded in degrees & then I chose to record months as 360/12=30 degrees each) and count (which is the number of patients recorded in each month).

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte count 74 73 89 86 76 57 67 58 73 45 54 75 end

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int month 0 30 60 90 120 150 180 210 240 270 300 330 end

I used the following code after installing the CIRCULAR statistics module

Code:

circsu month [fweight=count], rayleigh

All I get is varlist not allowed r(101); I am sure that there is a mistake at the command I use but I cannot find any proper example.

I would appreciate the help

Last edited by Maria Mihailovits; 01 Oct 2019, 10:27.
Tags: circsummarize, circular statistics, rayleigh
Nick Cox

Join Date: Mar 2014

Posts: 35724
#2

01 Oct 2019, 09:56

Please post a data example that can copied and pasted, not an image (FAQ Advice #12; see also #18).
1 like
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

01 Oct 2019, 13:03

Thanks for that. I would do this.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte count
74
73
89
86
76
57
67
58
73
45
54
75
end

gen month = (_n - 1) * 30
expand count
circsummarize month , detail

  +------------------------------------------------+
  |           n   mean   strength   median   range |
  |------------------------------------------------|
  | month   827   65.8      0.096     90.0   330.0 |
  +------------------------------------------------+

  +----------------------------------------------------+
  |         lower   upper   Rayleigh   Kuiper      Rao |
  |----------------------------------------------------|
  | month    36.1    95.4     0.0005   0.0000   0.0000 |
  +----------------------------------------------------+

I don't which version you are using, but it won't be this one.

Code:

*! 3.0.0 NJC 24 March 2013
* 2.1.0 NJC 15 April 2009
* 2.0.3 NJC 2 June 2004
* 2.0.2 NJC 6 May 2004
* 2.0.1 NJC 30 March 2004
* 2.0.0 NJC 13 January 2004
* 1.3.3 NJC 10 January 1999
* 1.3.2 NJC 1 January 1999
* 1.3.1 NJC 15 December 1998
* 1.3.0 NJC 15 July 1998
* 1.2.2 NJC 29 October 1996
* mean direction, vector strength, median and circular range of circular data
* optionally: confidence interval for mean; Rayleigh, Kuiper and Rao tests

* from 3.0.0 includes rewrite of circmedian
* 1.0.2 NJC 15 April 2009
* 1.0.1 NJC 31 March 2004
* 1.0.0 NJC 20 January 2004

* from 3.0.0 includes rewrite of circrao
* 2.0.3 NJC 14 April 2009
* 2.0.2 NJC 6 May 2004
* 2.0.1 NJC 30 March 2004
* 2.0.0 NJC 15 January 2004
* 1.0.0 NJC 7 August 2001

program circsummarize
        version 9.2
        syntax varlist(numeric) [if] [in] ///
    [ , BY(varname) Level(int $S_level) Detail MISSing variablelabels *]

    quietly {
        count `if' `in'
        if r(N) == 0 error 2000

        local nvars : word count `varlist'
    
            if `nvars' > 1 & "`by'" != "" {
                    di as err "too many variables specified"
                    exit 103
            }

        if `nvars' > _N {
            preserve
            set obs `nvars'
        }

        tempvar what
        gen `what' = ""
        if "`by'" != "" {
            local which : var label `by'
            if `"`which'"' == "" local which "`by'"
            char `what'[varname] `"`which'"'
        }
        else char `what'[varname] "  "

        foreach dname in nobs vecmean vecstr median range {
            tempvar `dname'  
            gen ``dname'' = .
            local dnames `dnames' ``dname''  
        }

        char `nobs'[varname] "n"
        char `vecmean'[varname] "mean"
        char `vecstr'[varname] "strength"
        char `median'[varname] "median"
        char `range'[varname] "range"

                format `nobs'   %1.0f  
        format `vecmean' `median' `range' %2.1f                                          
                format `vecstr'   %4.3f                                        

        if "`detail'" != "" {
            foreach iname in ll ul ray kui rao {
                tempvar `iname'
                gen ``iname'' = .
                local inames `inames' ``iname''
            }

            format `ll' `ul' %2.1f
            format `ray' `kui' `rao' %5.4f

            char `ll'[varname] "lower"
            char `ul'[varname] "upper"
            char `ray'[varname] "Rayleigh"
            char `kui'[varname] "Kuiper"
            char `rao'[varname] "Rao"
        }
        
        local i = 0
            foreach v of local varlist {
                tempvar use
                    mark `use' `if' `in'
                    markout `use' `v'
            if "`by'" != "" {
                tempvar group
                egen `group' = group(`by') if `use', label `missing'
            }
            else local group `use'

/// don't want to -summarize- here, as it would give r-class results
mata: st_local("max", strofreal(max(st_data(., "`group'"))))  

                    forval j = 1/`max' {
                local ++i          
                tempvar thisuse
                gen byte `thisuse' = `group' == `j'
            
                if "`by'" != "" {
                                    local this : label (`group') `j'
                                }
                            else {
                    if "`variablelabels'" != "" {
                        local this : var label `v'
                    }
                    if `"`this'"' == "" local this "`v'"
                }  
                replace `what' = `"`this'"' in `i'
                local this

mata: summarize("`v'", "`thisuse'", "`dnames'", `i')
            
                if "`detail'" != "" {
mata: infer("`v'", "`thisuse'", "`inames'", `i', `level')
                }
                
                drop `thisuse'
            }

            drop `use'
            capture drop `group'
            }
    }

    list `what' `dnames' in 1/`i', noobs subvarname abbrev(16)

    if "`detail'" != "" {
        list `what' `inames' in 1/`i', noobs subvarname abbrev(16)
    }
end

mata:

// NJC 23 March 2013
void summarize(string scalar varname,
               string scalar usename,
           string scalar dnames,
               real scalar obsno)
{
    real colvector work, gaps, sumdev, diff  
    real scalar nobs, xsum, ysum, vecmean, vectstr, range, i  

    work = st_data(., varname, usename)
    nobs = length(work)

    xsum = sum(sin((work * pi()) / 180))
    ysum = sum(cos((work * pi()) / 180))
    vecmean = myatan2(xsum, ysum)

    vecstr = sqrt(xsum^2 + ysum^2) / nobs

    _sort(work, 1)  
    gaps = work[2..nobs] - work[1..nobs-1]
    gaps = gaps \ (work[1] - work[nobs] + 360)
    range = 360 - max(gaps)

    sumdev = J(nobs, 1, .)
    
    for(i = 1; i <= nobs; i++) {
        diff = abs(work :- work[i])
        diff = rowmin((diff, 360 :- diff))
        sumdev[i] = mean(diff)
    }

    work = select(work, sumdev :== min(sumdev))
    xsum = sum(sin((work * pi()) / 180))
    ysum = sum(cos((work * pi()) / 180))
    median = myatan2(xsum, ysum)

    st_store(obsno, tokens(dnames), (nobs, vecmean, vecstr, median, range))
    st_numscalar("r(N)", nobs)  
    st_numscalar("r(vecmean)", vecmean)  
    st_numscalar("r(vecstr)", vecstr)  
    st_numscalar("r(median)", median)  
    st_numscalar("r(mean_dev)", min(sumdev))  
    st_numscalar("r(range)", range)  
}

// NJC 23 March 2013
void infer(string scalar varname,
           string scalar usename,
       string scalar inames,
           real scalar obsno,
       real scalar level)
{
    real colvector work, cos2, sumdev, diff, rank
    real scalar nobs, xsum, ysum, vecmean, vecstr, disp, sem, mult, ul, ll,
        Z, factor, P_Ray, V_n, V, P_Kui, U, U_pc, P_Rao

    work = st_data(., varname, usename)
    nobs = length(work)

    xsum = sum(sin((work * pi()) / 180))
    ysum = sum(cos((work * pi()) / 180))
    vecmean = myatan2(xsum, ysum)

    vecstr = sqrt(xsum^2 + ysum^2) / nobs

    cos2 = cos((work :- vecmean) * pi() / 90)
    disp = (1 - mean(cos2)) / (2 * vecstr^2)
    sem = sqrt(disp / nobs)
    mult = invnormal((100 + level) / 200)
    ul = mod(vecmean + (180 / pi()) * asin(mult * sem), 360)
    ll = mod(vecmean - (180 / pi()) * asin(mult * sem), 360)

    Z = nobs * vecstr^2
    factor = 1 + (2 * Z - Z^2) / (4 * nobs) -
    (24 * Z - 132 * Z^2 + 76 * Z^3 - 9 * Z^4) / (288 * nobs^2)
    P_Ray = exp(-Z) * factor

    _sort(work, 1)
     gaps = work[2..nobs] - work[1..nobs-1]
    gaps = gaps \ (work[1] - work[nobs] + 360)
        gaps = gaps :- 360 / nobs    
    U = sum(select(gaps, (gaps :> 0)))
    U_pc = 100 * U / 360
    mean = 360 / exp(1)
    sd = 360 * sqrt(2 * exp(-1) - 5 * exp(-2)) / sqrt(nobs)
    if (U > mean) P_Rao = normal(-(U - mean) / sd)  
    else P_Rao = 1 - normal((U - mean) / sd)
 
    work = work / 360
    rank = (1::nobs)            
    V_n = max((rank / nobs) :- work) + max(work :- ((rank :- 1) / nobs))
    V  = V_n * (sqrt(nobs) + 0.155 + 0.24 / sqrt(nobs))
    // Stephens 1970 p.118: preferable to tabulated levels in Fisher
    // 3 dp accuracy for P < 0.447 (V > 1.26)
    P_Kui = (8 * V^2 - 2) * exp(-2 * V^2)

    st_store(obsno, tokens(inames), (ll, ul, P_Ray, P_Kui, P_Rao))
    st_numscalar("r(vm_ll)", ll)  
    st_numscalar("r(vm_ul)", ul)  
    st_numscalar("r(P_Ray)", P_Ray)  
    st_numscalar("r(V_n)", V_n)  
    st_numscalar("r(V)", V)  
    st_numscalar("r(P_Kui)", P_Kui)  
    st_numscalar("r(U)", U)
    st_numscalar("r(U_pc)", U_pc)
    st_numscalar("r(P_Rao)", P_Rao)  
}

// NJC 23 March 2013 NB: result in degrees
real scalar myatan2(real scalar sine, real scalar cosine) {
    real scalar s1, s2, work  

    s1 = sign(sine); s2 = sign(cosine)
    work = (180 / pi()) * atan(sine/cosine)

    if ((s1 == 1 & s2 == 1) | (s1 == 0 & s2 == 1)) return(work)
    else if (s1 == 1 & s2 == 0)                    return(90)
    else if (s1 == -1 & s2 == 0)                   return(270)
    else if (s2 == -1)                             return(180 + work)  
    else if (s1 == -1 & s2 == 1)                   return(360 + work)
    else if (s1 == 0 & s2 == 0)                    return(.)  
}

end

Comment

Maria Mihailovits

Join Date: Oct 2019

Posts: 2
#4

01 Oct 2019, 17:20

Thank you very much. Your response was very helpful.
Comment
Raul Pardo

Join Date: Oct 2021

Posts: 3
#5

07 Oct 2021, 21:51

I am using circsummarize to describe time of the day data (24 h clock) with the statistics mean direction and confidence intervals for variables regarding times of mosquitoes biting activity reported by housedholders in a Colombian malaria endemic region. I am no sure how the data should be recorded (degrees? or radians?), before to apply circsummarize.
In the help for circsummarize in Stata the description says that the statistics for circular variables with scales between 0 and 360 degrees. In contrast, in the help for circular I found: "Stata expects angles to be in radians (pi radians = 180°)" an the factors _pi / 180 and 180 / _pi are mentioned for conversion between angles and radians.
I apply circsummarize to my data recorded in both degrees and radians (degrees converted to radians) and althought the statistic mean direction an confidence intervals (converted to hours) are very similar, they are no equal. I would appreciate very much your help.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#6

08 Oct 2021, 02:43

"Stata expects" means that its trigonometric functions expect arguments in radians. This is in my experience just about universal across software, such that if a program offers a function to take degrees it should be flagged as such.

circsummarize expects arguments, meaning data as variables, in degrees. It doesn't employ any intelligence to guess whether you're using anything else, and, some extreme cases aside, nothing else will work correctly.

I have thought of allowing other arguments, but for conversions from time of day it is a matter of simple multiplication and division, so I reckon users should be up to that. Given units of hours, multiply by 15 first and divide results by 15 afterwards. This includes hours and minutes so a time of 10:30 should be fed as 10.5 hours, and so forth.
Comment
Raul Pardo

Join Date: Oct 2021

Posts: 3
#7

11 Oct 2021, 22:26

Dr Cox. So, the input data are in degrees (previous conversion of time, hh:mm:ss, to hours and its fraction). Thanks a lot. It has been very kind of you.
Comment

Announcement

Rayleigh test and circsummarize command in STATA

Comment

Comment

Comment

Comment

Comment

Comment