Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Random assignment to run falsification test

    Hi !

    I need to match the adno values of id to id_master taking year into consideration and also such that the distance value is more than 500 kms. The matching should be random. I need to repeat it 100 times so that we generate 100 series of pseudo adno values to run the falsification test.

    Please find below is an extract of the data using dataex.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long(id_master id) int year double distance float adno_gd
    1013 1004 1990   543.813854021527        .7
    1013 1004 1991   543.813854021527      -.05
    1013 1004 1992   543.813854021527      -.05
    1013 1004 1993   543.813854021527     -2.05
    1013 1004 1994   543.813854021527  .3666667
    1013 1004 1995   543.813854021527  .3666666
    1013 1004 1996   543.813854021527  2.616667
    1013 1004 1997   543.813854021527     -1.55
    1013 1004 1998   543.813854021527 -.6333333
    1013 1004 1999   543.813854021527       .95
    1013 1004 2000   543.813854021527 -.6333333
    1013 1004 2001   543.813854021527       .45
    1013 1004 2002   543.813854021527  1.783333
    1013 1004 2003   543.813854021527        .7
    1013 1004 2004   543.813854021527       -.8
    1013 1004 2005   543.813854021527      -.05
    1013 1004 2006   543.813854021527      -1.3
    1013 1004 2007   543.813854021527 -.1333334
    1013 1004 2008   543.813854021527 -1.216667
    1013 1004 2009   543.813854021527     -2.05
    1013 1004 2010   543.813854021527     -1.05
    1013 1004 2011   543.813854021527 -2.133333
    1013 1009 1990 1038.3085512009636 -3.916667
    1013 1009 1991 1038.3085512009636 -4.333333
    1013 1009 1992 1038.3085512009636 -5.583333
    1013 1009 1993 1038.3085512009636 -6.333333
    1013 1009 1994 1038.3085512009636 -3.916667
    1013 1011 1990    1600.1835282544 -10.60667
    1013 1011 1991    1600.1835282544     -9.44
    1013 1011 1992    1600.1835282544    -10.94
    1013 1011 1993    1600.1835282544 -10.35667
    1013 1011 1994    1600.1835282544    -10.44
    1013 1013 1990                  0 -2.127083
    1013 1013 1991                  0  -4.54375
    1013 1013 1992                  0 -4.210417
    1013 1013 1993                  0  -4.29375
    1013 1013 1994                  0 -4.210417
    1013 1013 1995                  0 -4.627083
    1013 1013 1996                  0 -4.763636
    1013 1013 1997                  0 -4.627083
    1013 1013 1998                  0 -1.127083
    1013 1013 1999                  0   -.04375
    1013 1013 2000                  0  -2.54375
    1013 1013 2001                  0 -1.460417
    1013 1013 2002                  0  -3.79375
    1013 1013 2003                  0 -2.710417
    1013 1013 2004                  0 -5.877083
    1013 1013 2005                  0 -2.627083
    1013 1013 2006                  0   5.95625
    1013 1013 2007                  0  4.122917
    1013 1013 2008                  0  3.372917
    1013 1013 2009                  0  4.622917
    1013 1013 2010                  0   5.45625
    1013 1017 1990 1574.1372222136574 -.4004329
    1013 1017 1991 1574.1372222136574 -.1504329
    1013 1017 1992 1574.1372222136574 -.7337663
    1013 1017 1993 1574.1372222136574 -.0670995
    1013 1017 1994 1574.1372222136574 -.4004329
    1013 1021 1990   1651.35791739037 -3.324722
    1013 1021 1991   1651.35791739037 -1.074722
    1013 1021 1992   1651.35791739037 -2.991389
    1013 1021 1993   1651.35791739037 -1.741389
    1013 1021 1994   1651.35791739037 -2.408056
    1013 1021 1995   1651.35791739037 -1.158056
    1013 1021 1996   1651.35791739037     -2.68
    1013 1021 1997   1651.35791739037 -3.241389
    1013 1021 1998   1651.35791739037 -3.408056
    1013 1028 1990 1612.2428661768436 -2.622083
    1013 1034 1990 1615.1810133671506 -2.622083
    1013 1034 1991 1615.1810133671506 -1.622083
    1013 1034 1992 1615.1810133671506  -2.28875
    1013 1034 1993 1615.1810133671506  -3.78875
    1013 1034 1994 1615.1810133671506 -2.622083
    1013 1034 1995 1615.1810133671506 -1.955417
    1013 1034 1996 1615.1810133671506 -3.258182
    1013 1034 1997 1615.1810133671506 -1.955417
    1013 1034 1998 1615.1810133671506 -1.455417
    1013 1034 1999 1615.1810133671506     -2.35
    1013 1034 2000 1615.1810133671506  -2.28875
    1013 1034 2001 1615.1810133671506  1.163636
    1013 1034 2002 1615.1810133671506  3.011111
    1013 1034 2003 1615.1810133671506 -.1555555
    1013 1034 2004 1615.1810133671506 -2.238889
    1013 1034 2005 1615.1810133671506 -1.263636
    1013 1034 2006 1615.1810133671506 -1.519444
    1013 1034 2007 1615.1810133671506 -1.936111
    1013 1036 1990 1512.4867598248045       .21
    1013 1036 1991 1512.4867598248045 -1.123333
    1013 1036 1992 1512.4867598248045       .96
    1013 1036 1993 1512.4867598248045 -.2066666
    1013 1036 1994 1512.4867598248045 -.6233333
    1013 1036 1995 1512.4867598248045  .2933333
    1013 1036 1996 1512.4867598248045 -1.206667
    1013 1036 1997 1512.4867598248045 -1.206667
    1013 1036 1998 1512.4867598248045 -.1233334
    1013 1036 1999 1512.4867598248045  1.543333
    1013 1036 2000 1512.4867598248045 -.3733333
    1013 1038 1990  611.1062148626588  -7.70894
    1013 1038 1991  611.1062148626588 -8.146389
    1013 1038 1992  611.1062148626588 -9.063056
    end
    ------------------ copy up to and including the previous line ------------------

  • #2
    I'm not sure I understand what you mean, since your use of terminology is not familiar to me: First of all, "matching" would ordinarily describe something that is done with two or more observations, but you refer to matching "values." And I'm not sure what you mean by "100 series of pseudo adno values." I'm also unclear about the difference between id and id_master. Perhaps others on the list here will better understand what you intend, but presuming my advice turns out to be off base, you might want to get some help from a colleague to describe your situation and goal differently.

    However, I will show you something that might help you get started. In doing so, I'm assuming you mean: "I want to match pairs of observations on id and year, retaining only those for which distance is greater than 500. For any given id, among all pairs that match it on id and year, I want to randomly select one for each repetition." I show code below that does that fpr one repetition, but I do not implement your request to do this 100 times, since I can't really understand what you want as a final result. The -joinby- command is the key here, as is the separation of your data into two files. You might also find the user-contributed program -rangejoin- of use. I would note that your data example, is nicely prepared, but may not be the best example for us to help you, since there is only one id value in it.


    Code:
    // Create separate "using file"
    preserve
    drop id_master
    rename adno_gd adno_gd_from_using
    tempfile using
    save `using'
    restore
    // master file
    drop id
    rename id_master id
    //
    joinby id year using `using'
    keep if distance > 500
    //  Randomly pick one instance of each matching pair
    gen rand = runiform()
    sort rand
    bysort id: keep if _n ==1

    Comment


    • #3
      Thanks a lot !!!

      Comment

      Working...
      X