Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Repeated random sampling without replacement from a panel data

    I have annual stock returns for a number of firms for about 20 years. The total firm-year obs are about 15k. I wanted to pick 100k random samples, say 20% of the obs each year. I am using forvalues loop to pick random samples by year, then compute portfolio means by year, add column to identify simulation index, and store it. The concern is that it is taking too much time and sleep option is necessary to avoid read-only issue while saving.

    I was wondering if there is a better way to optimize, something like first use expand to first create 100k replicas and then compute returns by simulation index and year. I am okay with large file if it reduces the runtime.

    data set looks like-
    fid ayr ret
    abc 2001 0.012
    abc 2002 0.014
    abc .....
    abc 2020 0.032
    xyz 2005 0.265
    xyz 2006 0.023
    .....


    Code: I am using right now"

    save yr_ret.dta, replace

    local flag = 1
    set seed 1234
    forvalues i=1/100000 {
    display "starting sample `i'"
    use yr_ret.dta, replace
    sample 20, by (ayr)
    collapse (mean) eqret=ret (count) n=ret, by(ayr)
    gen indx=`i'
    if `flag'!=1 {
    append using eq_ret_ranpf.dta
    }
    save eq_ret_ranpf.dta, replace
    sleep 500
    local flag = 0
    }

    Appreciate if someone can help.

  • #2
    Welcome to Stata list. You will increase your chances of useful answer by following the FAQ on asking questions – provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

    You seem to be doing this in a very complicated and difficult way. I would look at using preserve and restore instead of repeatedly reading the sample. I would also look at using a another way to generate the means rather than collapsing the entire data set.

    In addition, if saving is causing trouble you could just write the results to a variable in the main data set.

    I might think of doing something like:
    g ret=.
    g estyr=.
    g iter=.
    local a=1

    forvalues i=1/10 {
    g s=runiform()
    forvalues year=1/3 {
    su ret if year=`year' & s<.200001
    replace ret=r(mean) in `a'/`a'
    replace estyear=`year' in `a'/`a'
    replace iter=`i' in `a'/`a'
    drop s
    local ++a
    }
    }

    Comment

    Working...
    X