Deleting the missing values (not entire observations) of a variable

Eric Bell

Join Date: Nov 2021

Posts: 6
#1

Deleting the missing values (not entire observations) of a variable

14 Nov 2021, 12:43

Suppose I have the following table:
ID Preferences

1 2

1 .

1 .

2 .

2 3

3 .

3 .

3 4

4 1

I have other columns before 'ID' and after 'Preferences'. What I want to get is two columns of the following type on the right of the above table:
ID Preferences

1 2

2 3

3 4

4 1

In other words, I want to delete the missing values of the variable 'Preferences'. I don't want to delete observations. I only want to record 'Preferences' with non-missing values. So

Code:

drop Preferences if Preferences != .

would not work in this case. This would delete observations with missing values for Preferences, which is not what I want.

How can I do this?

Last edited by Eric Bell; 14 Nov 2021, 12:48.
Tags: data, interaction, Suggestion, syntax

Andrew Musau

Join Date: Oct 2014
Posts: 10197

14 Nov 2021, 13:38

Stata is not a spreadsheet, so you cannot think like that. An observation defines entries of the same entity. For example, in the auto dataset

Code:

. sysuse auto
(1978 Automobile Data)

. l in 1

     +----------------------------------------------------------------------------------------------------------------+
     | make          price   mpg   rep78   headroom   trunk   weight   length   turn   displa~t   gear_r~o    foreign |
     |----------------------------------------------------------------------------------------------------------------|
  1. | AMC Concord   4,099    22       3        2.5      11    2,930      186     40        121       3.58   Domestic |
     +----------------------------------------------------------------------------------------------------------------+

the first observation belongs to "AMC Concord". I cannot just delete one entry of this observation and move up entries of other cars to align with it.

Comment

Eric Bell

Join Date: Nov 2021

Posts: 6
#3

14 Nov 2021, 14:17

Interesting thing to know.

My problem is that I have some columns before the above table. So my actual table looks something like:

ID1 Num_Females ID Preferences

1 10 1 2

2 15 1 .

3 20 1 .

4 10 2 .

5 3 2 3

6 6 3 .

7 17 3 .

8 13 3 4

9 12 4 1

So you see that ID1 does not have repetitions. You can think of ID1 as a household and ID as an individual member of a household.

Ultimately, I want a data that would look like this:

ID1 Num_Females ID Preferences

1 10 1 2

2 15 2 3

3 20 3 4

4 10 4 1

5 3 5 [a number depending on the original data]

6 6 6 [ditto]

7 17 7 [ditto]

8 13 8 [ditto]

9 12 9 [ditto]

The first two columns here are left untouched. The third column does not have repetitions, so 'ID' would be household, the same as ID1.

Would it be possible to get the last two columns in Stata?
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10197

14 Nov 2021, 15:52

Assuming no duplicates (or preferences are constant within id):

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(id1 num_females id preferences)
1 10 1 2
2 15 1 .
3 20 1 .
4 10 2 .
5  3 2 3
6  6 3 .
7 17 3 .
8 13 3 4
9 12 4 1
end

frame put id preferences, into(id)
drop id preferences
frame id{
    rename id id1
    bys id1 (preferences): keep if _n== 1
}
frlink 1:1 id1, frame(id)
frget preferences, from(id)
frame drop id

Res.:

Code:

. l, sep(0)

     +--------------------------------+
     | id1   num_fe~s   id   prefer~s |
     |--------------------------------|
  1. |   1         10    1          2 |
  2. |   2         15    2          3 |
  3. |   3         20    3          4 |
  4. |   4         10    4          1 |
  5. |   5          3    .          . |
  6. |   6          6    .          . |
  7. |   7         17    .          . |
  8. |   8         13    .          . |
  9. |   9         12    .          . |
     +--------------------------------+

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

14 Nov 2021, 15:54

This perhaps builds on your previous discussion at

https://www.statalist.org/forums/for...for-each-group

I have followed up on that post.
Comment

ID	Preferences
1	2
1	.
1	.
2	.
2	3
3	.
3	.
3	4
4	1

ID	Preferences
1	2
2	3
3	4
4	1

ID1	Num_Females	ID	Preferences
1	10	1	2
2	15	1	.
3	20	1	.
4	10	2	.
5	3	2	3
6	6	3	.
7	17	3	.
8	13	3	4
9	12	4	1

ID1	Num_Females	ID	Preferences
1	10	1	2
2	15	2	3
3	20	3	4
4	10	4	1
5	3	5	[a number depending on the original data]
6	6	6	[ditto]
7	17	7	[ditto]
8	13	8	[ditto]
9	12	9	[ditto]

Announcement

Deleting the missing values (not entire observations) of a variable

Comment

Comment

Comment

Comment