Dropping variables that are not present in (almost) every country

Adam Sadi

Join Date: Jul 2022
Posts: 68

Dropping variables that are not present in (almost) every country

29 Aug 2022, 06:03

Hello Statalisters!

I have a huge cross-country dataset with a lot of variables. Some of them are available for every country, some of them are only avaialble for a handful of countries. For the sake of having a useful, readable dataset that fits the needs of my study I'd like to keep all variables that are available in each country, or let's say at least 80% of my sample of countries. Is there a quick way to do this? Here's an example of my data:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str32 country_name float var1 double(var2 var3)
"Afghanistan"                    0 .    .
"Albania"                        0 .    .
"Algeria"                        0 .    .
"Angola"                         0 .    .
"Argentina"                      0 .    .
"Australia"              1.2258065 .    .
"Austria"                        0 .    .
"Azerbaijan"                     0 .    .
"Bahrain"                        0 .    .
"Bangladesh"                     0 .    .
"Barbados"                       0 .    .
"Belarus"                        0 .    .
"Belgium"                        0 .    .
"Benin"                          0 .    .
"Bhutan"                         0 .    .
"Bolivia"                        0 .    .
"Bosnia and Herzegovina"         0 .    .
"Botswana"                       0 .    .
"Brazil"                         0 . .678
"Bulgaria"                       0 .    .
"Burkina Faso"                   0 .    .
"Burma/Myanmar"                  0 .    .
"Burundi"                        0 .    .
"Cambodia"                .1612903 .    .
"Cameroon"                       0 .    .
end

For instance, in this exampleI would like to keep solely var1, because even if it is full of 0s, it still gathers information for every country of this sample. However, var2 only gathers data for a few countries that are not in this example and var3 only gathers data for Brazil. I'd like to ask Stata to drop var2 and var3, and more generally every variable that are missing for a majority of countries.

Thank you for the help!

Regards,

Adam

Last edited by Adam Sadi; 29 Aug 2022, 06:08.

Tags: None

Hemanshu Kumar

Join Date: Mar 2015

Posts: 1548
#2

29 Aug 2022, 06:13

You can use the -missings- command available from SSC. For instance,

Code:

missings report, min(20) drop `r(varlist)'

will tell you all the variables have at least 20 missing observations, and drop those.
Comment
Adam Sadi

Join Date: Jul 2022

Posts: 68
#3

29 Aug 2022, 08:31

Hemanshu :

-missings- should work in theory, thank you so much for your helpful comment.

However when trying to install the package I get the following message :

Code:

. ssc install missings remote connection failed http://fmwww.bc.edu/repec/bocode/m/ either 1) is not a valid URL, or 2) could not be contacted, or 3) is not a Stata download site (has no stata.toc file). r(677);
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1548
#4

29 Aug 2022, 08:44

I was just able to use that command successfully, though the first time it gave me an error as well. Maybe there's a server issue.

That said, you might want to try this instead. It seems to have a more updated version:

Code:

net install dm0085_2.pkg
Comment

Nick Cox

Join Date: Mar 2014
Posts: 36054

29 Aug 2022, 08:50

missings is best referenced to and downloaded from the Stata Journal site. An earlier version is also available at SSC.

Perhaps Adam tried at a bad time, or there is some other issue with his internet access.

An otherwise unpredictable search term is dm0085:

Code:

. search dm0085, entry

Search of official help files, FAQs, Examples, and Stata Journals

SJ-20-4 dm0085_2  . . . . . . . . . . . . . . . . Software update for missings
        (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
        Q4/20   SJ 20(4):1028--1030
        sorting has been extended for missings report

SJ-17-3 dm0085_1  . . . . . . . . . . . . . . . . Software update for missings
        (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
        Q3/17   SJ 17(3):779
        identify() and sort options have been added

SJ-15-4 dm0085  Speaking Stata: A set of utilities for managing missing values
        (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
        Q4/15   SJ 15(4):1174--1185
        provides command, missings, as a replacement for, and extension
        of, previous commands nmissing and dropmiss

(end of search)

. ssc desc missings

--------------------------------------------------------------------------------------------------------------
package missings from http://fmwww.bc.edu/repec/bocode/m
--------------------------------------------------------------------------------------------------------------

TITLE
      'MISSINGS': module to manage missing values

DESCRIPTION/AUTHOR(S)
      
       missings includes utility commands for managing variables  that
      (may) have missing values, which variously report, list  and
      tabulate missing values; generate a variable containing  numbers
      of missing values; and drop variables and/or observations that
      are all missing.   missings is intended to unite and supersede
      the author's previous commands nmissing and dropmiss.
      
      KW: missings
      KW: drop
      KW: data management
      KW: missing values
      
      Requires: Stata version 9
      
      Distribution-Date: 20170511
      
      Author: Nicholas J. Cox, Durham University
      Support: email [email protected]
      

INSTALLATION FILES                             (type net install missings)
      missings.ado
      missings.sthlp
--------------------------------------------------------------------------------------------------------------
(type ssc install missings to install)

Comment

Adam Sadi

Join Date: Jul 2022

Posts: 68
#6

29 Aug 2022, 10:49

I could download the package. Thank you again for your help! -missings- was very useful.
Comment

Announcement

Dropping variables that are not present in (almost) every country

Comment

Comment

Comment

Comment

Comment