Identifying completely missing (empty) rows

Krishanu Karmakar

Join Date: Jun 2014

Posts: 20
#1

Identifying completely missing (empty) rows

22 Aug 2014, 11:08

Hi All,

I have a data set comprising only off string values but with many missing values and sometimes completely missing rows. A similar small data set can be inputted as follows
(operating system windows 7)

Code:

version 12.1 clear all set obs 6 input str1 var1 str1 var2 str3 var3 a b c a c "" "" d "" "" "" "" k "" "" "" "" end

Note that the fourth and the sixth rows/observations are completely missing or empty. What I want to do is delete these two rows only. The actual dataset has 29 variables and 1284 rows (and there are many such datasets) hence eyeballing and deleting the missing rows is slightly problematic. So if anyone knows how to do it with any particular set of commands for one dataset, it would greatly help me. All variables are string variables, no numeric variables.

Last edited by Krishanu Karmakar; 22 Aug 2014, 11:18.
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#2

22 Aug 2014, 11:23

try the "rowmiss" option to the -egen- command; e.g., egen countmiis=rowmiss(var1 var2 var3)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#3

22 Aug 2014, 11:24

For a convenience command, find and install dropmiss (SJ).

Code:

search dropmiss dropmiss, obs

For a principled approach,

Code:

drop if missing(var1) & missing(var2) & missing(var3)

will work for your example. For more complicated examples, you could use the rowmiss() function of egen

Code:

egen nmiss = rowmiss(var*) drop if nmiss == 29
1 like
Comment
Krishanu Karmakar

Join Date: Jun 2014

Posts: 20
#4

22 Aug 2014, 11:32

Thank you both.
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#5

22 Aug 2014, 13:36

Usual caveat when working with strings: "" is missing, while " " (blank space) is not. So if you are fully 100% confident in your data, then the suggested solutions will work. But if you have any doubt, might run a loop to run trim() on them first, which isn't too bad, given only 29 variables. Something like:

Code:

foreach var of varlist var1-var3 { replace `var'=trim(`var') }

Could be a bit messy if you have a mix of strings and numeric variables, but again, with only 29 of them, not too bad, and better safe than sorry.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#6

22 Aug 2014, 13:50

Ben makes a good point. dropmiss (see #3) also has an option to catch multiple spaces when the user wants to regard such values as missing too.
Comment
Krishanu Karmakar

Join Date: Jun 2014

Posts: 20
#7

23 Aug 2014, 01:36

Yes that is a concern. Thank you for pointing it out. Till now I did not face the problem, but there is no assurance that I will not face the problem mentioned by Ben with my other datasets. I have been using the trim() function at a latter step, but I think I should use it earlier (in reference to my workflow) and using a loop like Ben's.
Comment
Ritika Khurana

Join Date: May 2017

Posts: 81
#8

06 Jun 2017, 00:59

I want to delete rows with missing observations across 110 variables. My dataset has noth string and numeric values.
First I tried what Nick has suggested
egen nmiss = rowmiss(var*) drop if nmiss == 110 It says var* not found. Then I realised my data has both string and numeric values. Next, I tried what Ben has suggested i.e. loop. foreach var of varlist hhid-gc { replace `var'=trim(`var') } Here, hhid and gc are my first and last variable respectively. But there is an error, type mismatch. I know this error is because there string as well as numeric data. I don't understand why m i getting this error ? From Ben's comment I gathered -trim- command allows string " " (singlr blank spaces).
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#9

06 Jun 2017, 01:17

Copying code carries an obligation to think about whether it makes sense for your situation.

The syntax var* makes sense if and only if your variable names have names like that.

The most that trim() -- which is a function, not a command -- can do for you is remove leading and trailing spaces from string variables, and it won't ignore numeric variables.

The help explains:

trim(s)
Domain: strings
Range: strings without leading or trailing blanks
Description: returns s without leading and trailing blanks; equivalent to ltrim(rtrim(s)). trim(" this ") = "this"

If spaces in string variables are to you not informative, you can do this, identifying the string variables and then looping over those alone:

Code:

ds, has(type string) foreach var in `r(varlist)' { replace `var'=trim(`var') }

You can now install missings from the Stata Journal and then go

Code:

missings dropobs, force

See http://www.statalist.org/forums/foru...aging-missings

and in Stata

Code:

search missings

to find the latest download location.
Comment

Announcement

Identifying completely missing (empty) rows

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment