Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New version of xcontract on SSC

    Thanks as always to Kit Baum, a new version of the xcontract package is now available for download from SSC. In Stata, use the ssc command to do this, or adoupdate if you already have an old version of xcontract.

    The xcontract package is described as below on my website. The new version is updated to Stata Version 16, and has a new frame() option, allowing the user to save the output dataset (or resultsset) in a data frame.

    Users of older versions of Stata can still download versions of xcontract in Stata 10 or Stata 8 from my website by typing, in Stata,

    net from http://www.rogernewsonresources.org.uk/

    and selecting a Stata version at or below their own in which to download xcontract.

    Best wishes

    Roger

    ------------------------------------------------------------------------------------------
    package xcontract from http://www.rogernewsonresources.org.uk/stata16
    ------------------------------------------------------------------------------------------

    TITLE
    xcontract: Create dataset of variable combinations with frequencies and percents

    DESCRIPTION/AUTHOR(S)
    xcontract is an extended version of contract. It creates an output
    data set with 1 observation per combination of values of the
    variables in varlist and data on the frequencies and percents of
    those combinations of values in the existing data set, and,
    optionally, the cumulative frequencies and percents of those
    combinations. If the by() option is used, then the output data set
    has one observation per combination of values of the varlist
    variables per by-group, and percents are calculated within each
    by-group. The output data set created by xcontract may be listed to
    the Stata log, or saved to a data frame, or saved to a disk file, or
    written to the memory (overwriting any pre-existing data set).

    Author: Roger Newson
    Distribution-Date: 26december2019
    Stata-Version: 16

    INSTALLATION FILES (click here to install)
    xcontract.ado
    xcontract.sthlp
    ------------------------------------------------------------------------------------------
    (click here to return to the previous screen)


  • #2
    Roger: Thanks very much for this. I use -contract- quite often and feel that -xcontract- will be very valuable in many applications.

    Perhaps you (or someone else) can give me some guidance on one particular type of application. I sometimes need to loop over multiple "use...contract" command sequences. With a large master dataset, both the -use- and the -contract- commands can be "slow." My initial instinct with -xcontract- was that things might speed up if I kept the master dataset unchanged in the default frame and sent the contract-ed data into a different frame. But this doesn't seem to be the case, at least in the way I conceptualized the procedure. So I'm wondering if you might have suggestions as to a reconceptualization.

    For instance, this code attempts to mimic the idea of 3 loops over a "use...contract" command sequence using four different approaches, two with -contract- and two with -xcontract-:
    Code:
    cap drop _all
    set seed 2345
    set obs 1000000
    
    forval j=1(1)8 {
     qui gen x`j'=runiform()>.5
    }
    tempfile tdata
    save `tdata'
    
    timer clear
    
    loc reps=3
    
    timer on 1
    forval j=1(1)`reps' {
     use `tdata'
     contract x*, zero
     drop _all
    }
    timer off 1
    
    timer on 2
    use `tdata'
    forval j=1(1)`reps' {
     preserve
     contract x*, zero
     restore
    }
    timer off 2
    
    timer on 3
    use `tdata'
    forval j=1(1)`reps' {
     xcontract x*, zero frame(fcontract, replace)
    }
    timer off 3
    
    frame drop fcontract
    
    timer on 4
    use `tdata'
    forval j=1(1)`reps' {
     xcontract x*, zero frame(fcontract)
     frame drop fcontract
    }
    timer off 4
    
    timer list
    
    drop _all
    The timer report is
    Code:
    . timer list
       1:      6.68 /        1 =       6.6790
       2:      6.08 /        1 =       6.0760
       3:     13.17 /        1 =      13.1670
       4:     12.00 /        1 =      12.0040
    My (evidently naive) thinking was that keeping the master dataset in the default frame without repeatedly opening or restoring it would speed things up. Any insights are appreciated (I'm relatively new to frames, so perhaps that's where I'm creating problems for myself).

    In any event, thanks again Roger for your contribution.

    Comment

    Working...
    X