Shift non-missing values to the left in an observation

Anna Mejldal

Join Date: Nov 2016

Posts: 17
#1

Shift non-missing values to the left in an observation

08 Mar 2017, 05:49

Hi,
I cant figure out on how to shift data to the left so that missings are removed. My problem is that the number places that have to be shiftet are individual per observation.
In my real data, I have 70 days where observations are missing at the beginning, and for each day I have 4 variables which need to be shiftet, but I hope the solution for the simple case can be generalized.

A simple example data is this:

Code:

clear input id day1 day2 day3 day4 id day1 day2 day3 day4 1 5 4 . 2 2 . 1 2 5 3 . . . 4 4 . . 3 4 end

In this example, for id 2, I need to shift data by one, while for id 3 it has to be shiftet by 3 and sid 4 by two, so that I end up with this:

Code:

clear input id day1 day2 day3 day4 id day1 day2 day3 day4 1 5 4 . 2 2 1 2 5 . 3 4 . . . 4 3 4 . . end

I have tried to work with both loops and reshaping, but I cant get the individual number of shifts right.
Does anyone have a solution on this?
I am relatively new at STATA, so please bear over with me if this is a very easy problem :-)

Thanks!
-Anna
Tags: None

Mike Lacy

Join Date: Apr 2014
Posts: 2449

08 Mar 2017, 07:22

I'm thinking that you actually intend your first line to be

Code:

1 5 4 2 .

On that assumption, here's a brute force approach that seems to work. There's doubtless a more efficient way to do this, but this should be fast enough.

Code:

// Create bigger bank of test data than you gave.
clear
local N = 20
local nday = 10
set obs `N'
forval i = 1/`nday' {
  gen byte day`i' = ceil(runiform() * 5) if (runiform() > 0.4)
}  
list
// Real stuff starts here.
qui gen byte imiss = .
forval  j = 1/`nday'  {  
    local ndaym1 = `nday' - 1
    forval i = 1/`ndaym1' {
       qui replace imiss = missing(day`i')
       local next = `i' + 1
       qui replace day`i' = day`next' if imiss
       qui replace day`next' = . if imiss
    }
}    
list

Comment

Nick Cox

Join Date: Mar 2014
Posts: 36057

08 Mar 2017, 10:12

Here is another approach based on reshape. I am making the same assumption as Mike about observation #1.

You could reshape back. But the reason the problem arises in the first place is that this is an awkward data layout. For most Stata purposes, you're better off long.

Code:

clear
input id day1 day2 day3 day4
id day1 day2 day3 day4
1 5 4 . 2
2 . 1 2 5
3 . . . 4
4 . . 3 4
end

reshape long day , i(id) j(newday) 
rename day whatever 
bysort id (newday) : gen day = sum(whatever < .) if whatever < . 
bysort id (day) : replace day = _n 
drop newday  

list, sepby(id) 

     +---------------------+
     | id   whatever   day |
     |---------------------|
  1. |  1          5     1 |
  2. |  1          4     2 |
  3. |  1          2     3 |
  4. |  1          .     4 |
     |---------------------|
  5. |  2          1     1 |
  6. |  2          2     2 |
  7. |  2          5     3 |
  8. |  2          .     4 |
     |---------------------|
  9. |  3          4     1 |
 10. |  3          .     2 |
 11. |  3          .     3 |
 12. |  3          .     4 |
     |---------------------|
 13. |  4          3     1 |
 14. |  4          4     2 |
 15. |  4          .     3 |
 16. |  4          .     4 |
     +---------------------+

If you are determined to have your wide layout, you can get it with

Code:

reshape wide whatever, i(id) j(day)
rename (whatever*) (day=)

Comment

Anna Mejldal

Join Date: Nov 2016

Posts: 17
#4

09 Mar 2017, 00:21

Originally posted by Mike Lacy View Post

I'm thinking that you actually intend your first line to be

Code:

1 5 4 2 .

On that assumption, here's a brute force approach that seems to work. There's doubtless a more efficient way to do this, but this should be fast enough.

Actually, I want to keep missings in the "middle" of the data, only the missings to the left must be removed. But I think I somehow can figure out how to work around that.

Thanks a lot for both solutions! I will go with the first one since I'm more familiar with loops.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 36057

09 Mar 2017, 02:11

Here's my code adapted for that different problem:

Code:

clear
input id day1 day2 day3 day4
1 5 4 . 2
2 . 1 2 5
3 . . . 4
4 . . 3 4
end

reshape long day , i(id) j(newday) 
rename day whatever 
bysort id (newday) : gen day = sum(whatever < .) 

list 

replace day = . if day == 0 
bysort id (day newday) : replace day = _n 
drop newday  

list, sepby(id) 

     +---------------------+
     | id   whatever   day |
     |---------------------|
  1. |  1          5     1 |
  2. |  1          4     2 |
  3. |  1          .     3 |
  4. |  1          2     4 |
     |---------------------|
  5. |  2          1     1 |
  6. |  2          2     2 |
  7. |  2          5     3 |
  8. |  2          .     4 |
     |---------------------|
  9. |  3          4     1 |
 10. |  3          .     2 |
 11. |  3          .     3 |
 12. |  3          .     4 |
     |---------------------|
 13. |  4          3     1 |
 14. |  4          4     2 |
 15. |  4          .     3 |
 16. |  4          .     4 |
     +---------------------+

The reshape back to wide (not recommended) is the same code as before.

Last edited by Nick Cox; 09 Mar 2017, 02:18.

Announcement