Obviously, a 300k by 300k matrix exceeds memory limits, so the matrix multiplication in #15 isn't feasible. However, we can set this up as a blocked matrix multiplication (thanks for the tip, ChatGPT):
Now, the time complexity is still \(O\left(n^2\right)\). But, on my machine, that code handles the example data in #12 instantly, as opposed to the 2 seconds the original code in #12 takes. Expanding from 1,000 to 5,000 observations, the blocked matrix multiplication still runs in less than a second, whereas the slightly revised code in #13 takes almost 10 seconds, and the original code takes more than half a minute. I timed the blocked matrix multiplication somewhere between 1.5 and 4 minutes, depending on the block size, for 50,000 observations (I didn't time any of the other approaches on that dataset). I assume the 300k problem will still take much longer, but at least it might be feasible. Otherwise, someone should look into parallelization.
Code:
putmata X = (ins*) mata : N = rows(X) n_competitors = J(N,1,0) blocksize = 1000 Xt = X' for (from=1; from<=N; from=from+blocksize) { to = min((from+blocksize-1,N)) xt = Xt[,from..to] n_competitors[from::to] = (colsum((X*xt):>0):-1)' } n_competitors = rowmax((n_competitors,J(N,1,0))) end getmata n_competitors = n_competitors
Comment