Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Automatized report of changes in observations?

    Dear list members,

    a curiosity: does anyone know if there is a user-written command, or at any rate a sample method (i.e. not involving the dissemination of count throughout the do file), to generate a log/report of all the (commands who produced) changes in observations in memory occurring during the execution of a range of commands? Such a command, to be turned on and off, like timer, could ease keeping track of changes in sample size, which might become laborious to keep track of when data management is long and complex. My impression is that the habit of reporting the flow of steps which determine the analysis sample, well established in some disciplines but not in others, is currently spreading - so such a command might be helpful. It's easy to imagine optional info and visualization for it.
    I'm using StataNow/MP 18.5

  • #2
    No , I suspect that the output would be way too noisy to be useful: every single replace or generate command would come into that log

    Instead I would say just specify block of code after which you want to count the number of nonmissing observations and use count
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Well, counting the number of observations (not cells) would not be affected by changes in the number or content of variables. It would be responsive to things such as keep, drop, dropmiss, obs, tsfill, append.. I don't think it would be too noisy. At any rate, it's potentially important information. In theory, assuming your initial data has some kind of connection with a well defined concept of population, anything you do subsequently that results in excluding observations can in principle introduce selectivity in estimators used afterwards.
      I'm using StataNow/MP 18.5

      Comment


      • #4
        I wrote trackobs (from SSC), which does something similar to what you want, a couple of years ago. According to the help file, the approach does not handle preserve and restore very well. I imagine that merge, append, and the like could cause problems, too.

        From a conceptual perspective, I find it hard, if not impossible, to describe in detail the flow that creates the analysis sample. In real-life scenarios, observations tend to be discarded for non-exclusive reasons. Say, you want to exclude observations for which (x == 42) or (y == 73). For some observations, both conditions will be true. Thus, the number of observations discarded because of (x == 42) will depend on whether or not you exclude observations with (y == 73) first.

        Comment


        • #5
          daniel klein hah! I knew someone else must have had the idea. The non-exclusivity issue can indeed be a problem, but analysts may well have a theoretical basis on which to choose the correct sequence. There might be a justified hierarchy of exclusion criteria. Thinking of my experience, this can definitely be the case. Alternatively (and perhaps for a reason), sometimes you don't care about "uniqueness" of the partials involved. You can even visualize it in a way so as to convey the issue to the audience.

          I'll have a look at your command. Thanks for pointing it out.
          Last edited by Matteo Pinna Pintor; 04 Jul 2024, 09:00.
          I'm using StataNow/MP 18.5

          Comment


          • #6
            The version on SSC seems to be outdated. The most recent version of trackobs can be downloaded from within Stata via:

            Code:
            net install trackobs , from(https://raw.githubusercontent.com/kleindaniel81/trackobs/master)

            Comment


            • #7
              Ah okay, thanks. Do you see ways to improve upon the acknowledged, above-mentioned limitations?
              I'm using StataNow/MP 18.5

              Comment


              • #8
                I introduced some bugs in the revised version that I have now (hopefully) fixed. I have also implemented the possibility of defining observations via variables so that a logical observation is no longer restricted to one row. The latest version remains available from GitHub.

                As for the limitations: merge and append seem to work pretty well. Things may break if you combine datasets that both have trackobs characteristics. This could be improved but it takes some thought and time.

                I don't think trackobs could ever work with preserve and restore because trackobs creates a new environment (or name space) and preserve and restore would always work within this environment instead of the callers' environment. I am open to any suggestions, though.

                Comment


                • #9
                  I have just uploaded another update that sort of fixes the remaining issues with merge, append, etc. The fix discards trackobs characteristics in the using datasets and retains those from the master dataset. The fix is incompatible with older versions of trackobs.

                  Also, here is a simple example of the added functionality for defining observations via variables:

                  Code:
                  . trackobs set
                  
                  trackobs counter : 0
                  trackobs group   : _n
                  
                  . trackobs : webuse nlswork
                  (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
                  
                  . trackobs set group idcode
                  
                  trackobs counter : 1
                  trackobs group   : idcode
                  
                  . trackobs : keep if collgrad
                  (23,739 observations deleted)
                  
                  . trackobs report
                  
                    +----------------------------------------+
                    |          Command   Obs. was   Obs. now |
                    |----------------------------------------|
                    |   webuse nlswork          0      28534 |
                    | keep if collgrad       4711        971 |
                    +----------------------------------------+
                  
                  .
                  Note that there are 28,534 observations, i.e., rows in nlswork.dta. However, there are only 4,711 observations identified by idcode. Of those, only 971 have graduated from college.
                  Last edited by daniel klein; 08 Jul 2024, 04:56.

                  Comment


                  • #10
                    This looks pretty cool Daniel, I will have a deeper look into it soon. Do you think the same limitations of preserve/restore also apply to frames?
                    I'm using StataNow/MP 18.5

                    Comment


                    • #11
                      Originally posted by Matteo Pinna Pintor View Post
                      This looks pretty cool Daniel, I will have a deeper look into it soon. Do you think the same limitations of preserve/restore also apply to frames?
                      Not exactly. The problem I see with frames is a bit less technical and more on a conceptual level. Suppose your current frame is called default (which it is if you have not renamed it). Suppose further you have a second frame, say foo. Now you type something like

                      Code:
                      trackobs : frames foo : command
                      What do you expect here? You probably want to record the number of observations in frame foo before and after command. And, perhaps you even expect these numbers to be stored as characteristics in frame foo. What actually happens is this: trackobs counts the number of observations in frame default, executes command in frame foo, then counts the number of observations in frame default again, and stores the information in characteristics in frame default. Maybe this is not surprising but I think there is potential for confusion and I do not see any use in it either.

                      Comment

                      Working...
                      X