Good evening,
For an N dimensional vector, I want to form an N X N matrix of indicators showing whether two elements of the vector share the same value. E.g.,
In the example the first value of the vector v of 1 is unique, so it shares a value only with itself; same for the second, the third and forth elements share the same value with themselves and with each other, resulting in the 2 X 2 square of 1s on the east-south end of R.
I managed to do this in Mata using a double loop.
I am wondering whether there is some vectorised clever way to do this, which results in faster computation that the double loop I am doing below?
The two versions of the doble loop follow. One is just a direct double loop from 1 to N; in the second version I tried to be clever and to cut down the loop in half, as the R matrix is symmetric. However this does not seem to save much execution time.
In terms of the auto data I am looking for the following result, but hopefully in some vectorised way:
For an N dimensional vector, I want to form an N X N matrix of indicators showing whether two elements of the vector share the same value. E.g.,
Code:
. mat v = (1,2,4,4)
. mat list v
v[1,4]
c1 c2 c3 c4
r1 1 2 4 4
. mat R = (1,0,0,0 \ 0, 1 , 0, 0 \ 0 , 0, 1, 1 \ 0, 0, 1, 1)
. matlist R
| c1 c2 c3 c4
-------------+-------------------------------------------
r1 | 1
r2 | 0 1
r3 | 0 0 1
r4 | 0 0 1 1
I managed to do this in Mata using a double loop.
I am wondering whether there is some vectorised clever way to do this, which results in faster computation that the double loop I am doing below?
The two versions of the doble loop follow. One is just a direct double loop from 1 to N; in the second version I tried to be clever and to cut down the loop in half, as the R matrix is symmetric. However this does not seem to save much execution time.
Code:
sysuse auto
expand 400
replace rep = 6 if missing(rep)
putmata N = rep
timer on 1
mata:
obs = rows(N)
R = I(obs)
for (i=1; i<=obs; i++) {
for (j=1; j<i; j++) {
R[i,j]=N[i]==N[j]
}
}
R = R + R' - I(obs)
end
timer off 1
timer on 2
mata:
obs = rows(N)
Rcorrect = I(obs)
for (i=1; i<=obs; i++) {
for (j=1; j<=obs; j++) {
Rcorrect[i,j] = N[i]==N[j]
}
}
end
timer off 2
mata: mreldif(Rcorrect, R)
timer list
Code:
. sysuse auto
(1978 automobile data)
.
.
. keep in 1/13
(61 observations deleted)
.
. replace rep = 6 if missing(rep)
(2 real changes made)
.
. putmata N = rep
(1 vector posted)
.
.
. timer on 2
. mata:
------------------------------------------------- mata (type end to exit) ------------------------------------------------------------------------------------------------------
: obs = rows(N)
: Rcorrect = I(obs)
: for (i=1; i<=obs; i++) {
> for (j=1; j<=obs; j++) {
> Rcorrect[i,j] = N[i]==N[j]
> }
> }
: N
1
+-----+
1 | 3 |
2 | 3 |
3 | 6 |
4 | 3 |
5 | 4 |
6 | 3 |
7 | 6 |
8 | 3 |
9 | 3 |
10 | 3 |
11 | 3 |
12 | 2 |
13 | 3 |
+-----+
: Rcorrect
[symmetric]
1 2 3 4 5 6 7 8 9 10 11 12 13
+------------------------------------------------------------------+
1 | 1 |
2 | 1 1 |
3 | 0 0 1 |
4 | 1 1 0 1 |
5 | 0 0 0 0 1 |
6 | 1 1 0 1 0 1 |
7 | 0 0 1 0 0 0 1 |
8 | 1 1 0 1 0 1 0 1 |
9 | 1 1 0 1 0 1 0 1 1 |
10 | 1 1 0 1 0 1 0 1 1 1 |
11 | 1 1 0 1 0 1 0 1 1 1 1 |
12 | 0 0 0 0 0 0 0 0 0 0 0 1 |
13 | 1 1 0 1 0 1 0 1 1 1 1 0 1 |
+------------------------------------------------------------------+
: end
------------------

Comment