Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pulling data forward

    Hi all,

    I have long format data in which respondents have varying amounts of observations. I am trying to pull data forward, such that if someone is missing on a variable in wave 7, it becomes equal to their wave 6 value. I am using the following code
    Code:
    foreach v of varlist *_copy {
        by hhidpn: replace `v' = `v'[_n-1] if missing(`v')
    }
    I have made a copy of each variable in my dataset in order to keep the original variable. With this code, many cases are working as expected. However, variables with extended missing (ex: .d) seem to be misbehaving. Take the following example. "rdiab" is .d in wave 11 and 1 in wave 12. After I create my copy variable and run the previous code, .d at wave 11 becomes "." What is happening? There is no _n - 1 observation in this case. Why is the observation not retaining its extended missing value?

    Code:
        +--------------------------------------------------------------------+
           |   hhidpn   wave   riwstat   rdiab   rdiab_~y   mean_e~c   dementia |
           |--------------------------------------------------------------------|
    46629. | 85001040     11         1      .d          .         12          1 |
    46630. | 85001040     12         1       1          1         12          1 |
           +--------------------------------------------------------------------+

  • #2
    See https://www.stata.com/support/faqs/d...issing-values/

    You can convert all extended missing values to be system missing before doing the replacement or define a condition for the extended missing values.


    Code:
    ... if missing(`v') | `v' >.
    as extended missing values are greater than system missing.


    Last edited by Andrew Musau; 24 May 2023, 11:05.

    Comment


    • #3
      That is the link I used but was still confused about the extended missing. The following syntax seems to work
      Code:
      foreach v of varlist *_copy {
          by hhidpn: replace `v' = `v'[_n-1] if `v' >=. & `v'[_n-1] <.
          }

      Comment


      • #4
        If you consider observation 1 then the previous observation is, or would be, observation 0; and it's legal to refer to that even though it doesn't exist. But the result is always default missing -- system missing . or empty string "" -- because Stata doesn't know any more than you do what it should be.

        Consider

        Code:
        . sysuse auto, clear
        (1978 automobile data)
        
        . l mpg in 1
        
             +-----+
             | mpg |
             |-----|
          1. |  22 |
             +-----+
        
        . replace mpg = .d in 1
        (1 real change made, 1 to missing)
        
        . l mpg in  1
        
             +-----+
             | mpg |
             |-----|
          1. |  .d |
             +-----+
        
        . replace mpg = mpg[_n-1] if missing(mpg) in 1
        (1 real change made, 1 to missing)
        
        . l mpg in 1
        
             +-----+
             | mpg |
             |-----|
          1. |   . |
             +-----+
        The larger point is that there never any gain in changing observation 1 using the unknown observation 0 because there is no information to do that usefully and in this particular case you've lost some detail.

        In context of by: your code needs to be

        Code:
         by hhidpn: replace `v' = `v'[_n-1] if missing(`v') & _n > 1
        EDIT: A problem with the syntax in #3 is that it will fail with string variables.

        EDIT 2: The history here should make sense. Long before time series operators Stata allowed subscripts to indicate particular values of variables, such as mpg[7]. And subscripts could be expressions. Often you want to refer to the previous value of a variable (or the value 2 before, and so on), so [_n-1] and similar are natural idioms for that given that _n is the current observation number. But should a reference to [0] be illegal? Stata's developers decided No; the way just about any statistical person should want that to work is for Stata to say "I don't know what that should be" in its own language and for missing value to be returned.
        Last edited by Nick Cox; 24 May 2023, 11:37.

        Comment

        Working...
        X