Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keep unmatched observation after merging datasets?

    Hello,

    Is it good to keep your unmatched observation or discard them, just to tidy things up. Will that spoil the outcomes of my regressions?

    Cheers
    Click image for larger version

Name:	Screenshot 2023-10-31 at 15.37.10.png
Views:	1
Size:	350.0 KB
ID:	1732182

  • #2
    It depends. If all you are doing is a regression involving variables that were brought in (or updated) in the -merge-, the unmatched observations will have missing values on regression variables and will be excluded from the regression anyway. So you will get the same regression results whether you keep them or drop them. If your data set is large and you are getting close to hitting memory limits, then surely you would want to drop the unmatched observations to make more room.

    But if you will need information from the unmatched observations for something after the regression, then evidently you need to keep them.

    I "grew up" in the days when memory was very limited and expensive, so I learned miserly programming habits. I'm probably one of a handful of people who still makes a point of specifying byte storage type for my indicator (dummy) variables! And I always get rid of unneeded variables or observations--just out of habit. But in the modern world memory is cheap and abundant and there is less need to do that. I think there are still two good reasons for keeping data sets free of unnecessary observations and variables. The first is that the more observations you have, the longer it takes Stata to read through the whole data set whenever it performs a calculation. If your data set has a large number of unneeded observations this can noticeably slow down all your calculations. The same applies to variables, and, on top of that, when you are working with your data set writing code, it's a lot easier to find your variables' exact names in the Variables window if you don't have to scroll through a long list of other variables that you aren't even ever going to use!

    Comment


    • #3
      If they are unmatched it won't affect the estimates.

      It depends on whether you want to (or someone using your data wants to) analyze what is missing or do imputation.

      Comment


      • #4
        Clyde Schechter I quite appreciate the detailed explanation. Back then sounds like a complete nightmare to run regressions. I'll keep all your points in mind. Cheers!

        Comment


        • #5
          George Ford I was thinking about the same, while I don't like data to be messy, I'm also aware that I might run a few regressions where the unmatched might come in handy. Thank you for shedding light on that!

          Comment

          Working...
          X