You are not logged in. You can browse but not post. Login or Register by clicking 'Login or Register' at the top-right of this page. For more information on Statalist, see the FAQ.
It depends. If all you are doing is a regression involving variables that were brought in (or updated) in the -merge-, the unmatched observations will have missing values on regression variables and will be excluded from the regression anyway. So you will get the same regression results whether you keep them or drop them. If your data set is large and you are getting close to hitting memory limits, then surely you would want to drop the unmatched observations to make more room.
But if you will need information from the unmatched observations for something after the regression, then evidently you need to keep them.
I "grew up" in the days when memory was very limited and expensive, so I learned miserly programming habits. I'm probably one of a handful of people who still makes a point of specifying byte storage type for my indicator (dummy) variables! And I always get rid of unneeded variables or observations--just out of habit. But in the modern world memory is cheap and abundant and there is less need to do that. I think there are still two good reasons for keeping data sets free of unnecessary observations and variables. The first is that the more observations you have, the longer it takes Stata to read through the whole data set whenever it performs a calculation. If your data set has a large number of unneeded observations this can noticeably slow down all your calculations. The same applies to variables, and, on top of that, when you are working with your data set writing code, it's a lot easier to find your variables' exact names in the Variables window if you don't have to scroll through a long list of other variables that you aren't even ever going to use!
Clyde Schechter I quite appreciate the detailed explanation. Back then sounds like a complete nightmare to run regressions. I'll keep all your points in mind. Cheers!
George Ford I was thinking about the same, while I don't like data to be messy, I'm also aware that I might run a few regressions where the unmatched might come in handy. Thank you for shedding light on that!
Comment