Matching observations in a quasi-random process

Steffen Pluetzke

Join Date: Mar 2019

Posts: 20
#1

Matching observations in a quasi-random process

02 Mar 2019, 04:23

Dear Statalisters,

I want to create a set of observations which is used to test different models later on. Lets say I have three Variables y and x1 and x2 with 1000 obervations each. The observations should be matched in a quasi-random process following a dependency structure between the Variables x1/x2 and y. This matching should have a stochastic component.
For example, I divide the observations in quintiles based on their values and create a matching-matrix with a negative dependency:
matrix match=
(0.025,0.05,0.1,0.2,0.625\
0.05,0.1,0.2,0.45,0.2\
0.1,0.2,0.4,0.2,0.1\
0.2,0.45,0.2,0.1,0.05\
0.625,0.2,0.1,0.05,0.025)

As a result of this process I would have a data set where 62.5% of the values of quintile 1 of y are correctly matched with quintile 1
5 of x1, 20% are falsly matched with quintile 4 and so on. Here, for quintile 3 the correct matching is only 40%.

Is such a matching process possible or are there other ways to accomplish this matching.
Prior to this I have generated y, which follows a mixed distribution and x1/x2 which are not normal distributed.

Kind regards
Steffen
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

04 Mar 2019, 11:40

You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. Also, simplify your problem as much as possible - it is hard to follow your explanation.

One way to sample in Stata is to generate a random number, sort on that random number, and then identify the first whatever observations as the selected observations. I wonder if you could do this repeatedly to get your desired outcomes.
Comment

Announcement

Matching observations in a quasi-random process

Comment