Shifting observations across rows

Toly Siopsis

Join Date: Mar 2021

Posts: 3
#1

Shifting observations across rows

29 Mar 2021, 17:31

Hello,
I am fairly new to Stata and am having trouble even starting on this problem. I am working with the National Longitudinal Survey of Youth 1997 from the Bureau of Labor Statistics dataset. I am having some trouble with the household roster section. Individuals who belong to a household are recorded each year and given an identifying number (i.e. 1 for mother, 2 for father, 3 for sister, etc.). I was interested in isolating the siblings. Doing this, though, has left me with many missing values and sibling variables that go up to 40 because the number of household members changes from year to year.

I found some code on another forum that does this.

https://stackoverflow.com/questions/...tions-in-stata

However, the code here changes my numbers from 13-26 to 1-8. Additionally, it increased the number of variables I had. Originally, I had 30 sibling in 1997 and after running the code I had 37. How do I fix the code so these do not happen?
Tags: None
Ken Chui

Join Date: Aug 2014

Posts: 1058
#2

29 Mar 2021, 20:01

Hello and welcome!

Could you do the followings to improve the question's friendliness so that it'd be more likely to get answered?
Check the FAQ (https://www.statalist.org/forums/help#question) and especially please address items 11 and 12.

Instead of describing how the chosen solution didn't work, spare more focus on describing the structure of the original data set and the structure of the desired data set.
2 likes
Comment

Toly Siopsis

Join Date: Mar 2021
Posts: 3

30 Mar 2021, 14:16

Thank you Ken Chui for the welcome. I read the FAQ, but clearly I missed some details. First off, I am using Stata 16.

The issue I am having is that I want to reduce how wide my dataset is within certain variables. Currently my data looks like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(sibling_1_1997 sibling_2_1997 sibling_3_1997 sibling_4_1997 sibling_5_1997 sibling_6_1997 sibling_7_1997 sibling_8_1997 sibling_9_1997 sibling_10_1997 sibling_11_1997 sibling_12_1997 sibling_13_1997 sibling_14_1997 sibling_15_1997)
 0  .  . 13 14 14 . . . . . . . . .
 0  .  . 13  .  . . . . . . . . . .
 0  .  .  .  .  . . . . . . . . . .
 0  .  .  .  .  . . . . . . . . . .
 0  .  . 14  .  . . . . . . . . . .
 0 14  . 13 13  . . . . . . . . . .
13  0  . 13 13  . . . . . . . . . .
 0 14 14  .  .  . . . . . . . . . .
13  0 14  .  .  . . . . . . . . . .
13 14  0  .  .  . . . . . . . . . .
end

I want to shift the values over if there is a missing value before it. The following code is what I found and modified.

egen sib_1997_compressed = concat(sibling_1_1997-sibling_15_1997)
replace sib_1997_compressed = subinstr(sib_1997_compressed, ".", "", .)

count if sib_1997_compressed != ""
local j = 1
quietly while r(N) > 0 {
gen sib_`j'_1997 = real( substr(sib_1997_compressed, 1, 1) )
replace sib_1997_compressed = substr(sib_1997_compressed, 30, .)
local ++j
count if sib_1997_compressed != ""
}

Running the following code yields this.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(sib_1_1997 sib_2_1997 sib_3_1997 sib_4_1997 sib_5_1997 sib_6_1997 sib_7_1997 sib_8_1997 sib_9_1997 sib_10_1997 sib_11_1997 sib_12_1997 sib_13_1997 sib_14_1997 sib_15_1997)
0 1 3 1 4 1 4 . . . . . . . .
0 1 3 . . . . . . . . . . . .
1 9 0 . . . . . . . . . . . .
1 5 1 5 0 . . . . . . . . . .
0 1 4 . . . . . . . . . . . .
0 1 4 1 3 1 3 . . . . . . . .
1 3 0 1 3 1 3 . . . . . . . .
0 1 4 1 4 . . . . . . . . . .
1 3 0 1 4 . . . . . . . . . .
1 3 1 4 0 . . . . . . . . . .
end

So the values are all shifting, but they are also changing incorrectly. I assume this is because the code I am using was for a dataset which only contained values 1-4. How could I fix this?

I hope this is a more comprehensive question.

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

30 Mar 2021, 15:24

Thanks for making your question more comprehensible.

I wrote the code cited in #1 which is customised to the question asked on Stack Overflow. There the values in each row were all single-digit integers. Yours aren't all single-digit integers, so the code needs more care in concatenation and then in extracting the individual elements.

Hence for example

Code:

substr(sib_1997_compressed, 1, 1)

is wrong because it extracts one character at a time

-- while where 30 comes from in your change to

Code:

replace sib_1997_compressed = substr(sib_1997_compressed, 30, .)

is a mystery to me.

This is the previous code generalized with the result:

Code:

clear
input byte(sibling_1_1997 sibling_2_1997 sibling_3_1997 sibling_4_1997 sibling_5_1997 sibling_6_1997 sibling_7_1997 sibling_8_1997 sibling_9_1997 sibling_10_1997 sibling_11_1997 sibling_12_1997 sibling_13_1997 sibling_14_1997 sibling_15_1997)
 0  .  . 13 14 14 . . . . . . . . .
 0  .  . 13  .  . . . . . . . . . .
 0  .  .  .  .  . . . . . . . . . .
 0  .  .  .  .  . . . . . . . . . .
 0  .  . 14  .  . . . . . . . . . .
 0 14  . 13 13  . . . . . . . . . .
13  0  . 13 13  . . . . . . . . . .
 0 14 14  .  .  . . . . . . . . . .
13  0 14  .  .  . . . . . . . . . .
13 14  0  .  .  . . . . . . . . . .
end

egen sib_1997_compressed = concat(sibling_1_1997-sibling_15_1997), punct(" ")
replace sib_1997_compressed = trim(subinstr(sib_1997_compressed, ".", "", .))

count if sib_1997_compressed != ""

local j = 1
quietly while r(N) > 0 {
    gen sib_`j'_1997 = word(sib_1997_compressed, 1)
    replace sib_1997_compressed = trim(subinstr(sib_1997_compressed, sib_`j'_1997, "", 1))
    local ++j
    count if sib_1997_compressed != ""
}

destring sib_*_1997, replace


l sib_*_1997

     +-------------------------------------------+
     | sib_1_~7   sib_2_~7   sib_3_~7   sib_4_~7 |
     |-------------------------------------------|
  1. |        0         13         14         14 |
  2. |        0         13          .          . |
  3. |        0          .          .          . |
  4. |        0          .          .          . |
  5. |        0         14          .          . |
     |-------------------------------------------|
  6. |        0         14         13         13 |
  7. |       13          0         13         13 |
  8. |        0         14         14          . |
  9. |       13          0         14          . |
 10. |       13         14          0          . |
     +-------------------------------------------+

That said, the original poster on Stack Overflow was adamant that they didn't want a solution using reshape but without evidence to the contrary that would be a better solution.

Let's start again.

Code:

clear
input byte(sibling_1_1997 sibling_2_1997 sibling_3_1997 sibling_4_1997 sibling_5_1997 sibling_6_1997 sibling_7_1997 sibling_8_1997 sibling_9_1997 sibling_10_1997 sibling_11_1997 sibling_12_1997 sibling_13_1997 sibling_14_1997 sibling_15_1997)
 0  .  . 13 14 14 . . . . . . . . .
 0  .  . 13  .  . . . . . . . . . .
 0  .  .  .  .  . . . . . . . . . .
 0  .  .  .  .  . . . . . . . . . .
 0  .  . 14  .  . . . . . . . . . .
 0 14  . 13 13  . . . . . . . . . .
13  0  . 13 13  . . . . . . . . . .
 0 14 14  .  .  . . . . . . . . . .
13  0 14  .  .  . . . . . . . . . .
13 14  0  .  .  . . . . . . . . . .
end

gen long id = _n

reshape long sibling_@_1997, i(id) j(which)

drop if sibling == .

bysort id (which) : replace which = _n

rename sibling__1997 sibling
reshape wide sibling, i(id) j(which)

list

     +------------------------------------------------+
     | id   sibling1   sibling2   sibling3   sibling4 |
     |------------------------------------------------|
  1. |  1          0         13         14         14 |
  2. |  2          0         13          .          . |
  3. |  3          0          .          .          . |
  4. |  4          0          .          .          . |
  5. |  5          0         14          .          . |
     |------------------------------------------------|
  6. |  6          0         14         13         13 |
  7. |  7         13          0         13         13 |
  8. |  8          0         14         14          . |
  9. |  9         13          0         14          . |
 10. | 10         13         14          0          . |
     +------------------------------------------------+

Better yet would be this:

Code:

egen sib = concat(sibling_*), p(" ")
replace sib = subinstr(sib, ".", "", .)
split sib, destring 

     +---------------------------+
     | sib1   sib2   sib3   sib4 |
     |---------------------------|
  1. |    0     13     14     14 |
  2. |    0     13      .      . |
  3. |    0      .      .      . |
  4. |    0      .      .      . |
  5. |    0     14      .      . |
     |---------------------------|
  6. |    0     14     13     13 |
  7. |   13      0     13     13 |
  8. |    0     14     14      . |
  9. |   13      0     14      . |
 10. |   13     14      0      . |
     +---------------------------+

Last edited by Nick Cox; 30 Mar 2021, 15:34.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35724
#5

30 Mar 2021, 15:38

Duplicate post.
Comment
Toly Siopsis

Join Date: Mar 2021

Posts: 3
#6

31 Mar 2021, 19:05

Thank you so much for the help!

I have been browsing the forum for a short time, but every forum post I read was answered by you. You have been a great help in this issue directly as well as countless other of my issues indirectly.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#7

01 Apr 2021, 02:57

Very good to hear, both specifically and generally. Thanks for the thanks! It's a team effort, however.
Comment

Announcement