I was wondering how the sample selection with "use using (dataset) if (condition)" vs. "use (dataset) keep (condition)" differs.
For example, in the following code:
How does the implementation differ between
and
Is there any efficiency / speed / memory advantage or is the latter essentially doing the former but in a more compact syntax? It obviously doesn't matter for such a small dataset, but when I have a couple million observations a quicker selection would be amazing.
Full code for easy copying:
For example, in the following code:
Code:
clear set obs 10 gen var1 = _n gen var2 = _n * 2 gen var3 = _n * 3 gen var4 = _n * 4 gen var5 = _n * 5 tempfile test_data save `test_data', replace
Code:
use `test_data', clear keep if var1>=5
Code:
use using `test_data' if var1>=5, clear
Full code for easy copying:
Code:
clear set obs 10 gen var1 = _n gen var2 = _n * 2 gen var3 = _n * 3 gen var4 = _n * 4 gen var5 = _n * 5 tempfile test_data save `test_data', replace use `test_data', clear keep if var1>=5 use using `test_data' if var1>=5, clear
Comment