Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying missing values between identical value!

    Hi everyone,
    I have faced a problem, and it takes so much time for me and couldn't figure it out.
    I have a large dataset of people and the firm they were working over years. Since the data is sampled, some individuals have missing information about firms. I would like to replace firms in the years that the data is missing and, with a dummy variable, mark it for future usage.
    To clarify more, In a simple case, I would like to identify the ones with the red mark.
    If it would be helpful let me share my unsuccessful attempts
    Thanks in advance.​​​​​​​

    Click image for larger version

Name:	image.png
Views:	1
Size:	8.4 KB
ID:	1702550



  • #2
    Mehdi:
    I do hope that what follows can be helpful:
    Code:
    . set obs 6
    Number of observations (_N) was 0, now 6.
    
    . g wanted=1 in 1/3
    
    . replace wanted=2 in 5/6
    
    . bysort wanted: gen check_missing=1 if wanted==.
    
    . replace check_missing=0 if wanted!=.
    
    
    . list
    
         +-------------------+
         | wanted   check_~g |
         |-------------------|
      1. |      1          0 |
      2. |      1          0 |
      3. |      1          0 |
      4. |      2          0 |
      5. |      2          0 |
         |-------------------|
      6. |      .          1 |
         +-------------------+
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Please note from https://www.statalist.org/forums/help#stata that images of data are not as helpful as you hope. Please use dataex to provide data examples.

      This works for your example.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float group
      5
      5
      5
      .
      5
      5
      .
      .
      8
      .
      .
      8
      end
      
      gen long obsno = _n 
      
      ipolate group obsno, gen(group_i)
      
      gen wanted = missing(group) & group_i == group_i[_n-1]
      
      list, sepby(wanted)

      Code:
           +----------------------------------+
           | group   obsno   group_i   wanted |
           |----------------------------------|
        1. |     5       1         5        0 |
        2. |     5       2         5        0 |
        3. |     5       3         5        0 |
           |----------------------------------|
        4. |     .       4         5        1 |
           |----------------------------------|
        5. |     5       5         5        0 |
        6. |     5       6         5        0 |
        7. |     .       7         6        0 |
        8. |     .       8         7        0 |
        9. |     8       9         8        0 |
           |----------------------------------|
       10. |     .      10         8        1 |
       11. |     .      11         8        1 |
           |----------------------------------|
       12. |     8      12         8        0 |
           +----------------------------------+

      The idea is that interpolation within say 5 . 5 or 8 . . 8 will copy the constant on either side, but that won't work otherwise.

      In your full problem ipolate should be specified with a by() option.
      Last edited by Nick Cox; 20 Feb 2023, 05:53.

      Comment


      • #4
        Thanks for quick reply.
        I think I couldn't explain my problem. I would like to just mark the missing one between two same-number values and not others.

        Comment


        • #5
          Please see now revised #3.

          Comment

          Working...
          X