Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping variables that are not present in (almost) every country

    Hello Statalisters!

    I have a huge cross-country dataset with a lot of variables. Some of them are available for every country, some of them are only avaialble for a handful of countries. For the sake of having a useful, readable dataset that fits the needs of my study I'd like to keep all variables that are available in each country, or let's say at least 80% of my sample of countries. Is there a quick way to do this? Here's an example of my data:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str32 country_name float var1 double(var2 var3)
    "Afghanistan"                    0 .    .
    "Albania"                        0 .    .
    "Algeria"                        0 .    .
    "Angola"                         0 .    .
    "Argentina"                      0 .    .
    "Australia"              1.2258065 .    .
    "Austria"                        0 .    .
    "Azerbaijan"                     0 .    .
    "Bahrain"                        0 .    .
    "Bangladesh"                     0 .    .
    "Barbados"                       0 .    .
    "Belarus"                        0 .    .
    "Belgium"                        0 .    .
    "Benin"                          0 .    .
    "Bhutan"                         0 .    .
    "Bolivia"                        0 .    .
    "Bosnia and Herzegovina"         0 .    .
    "Botswana"                       0 .    .
    "Brazil"                         0 . .678
    "Bulgaria"                       0 .    .
    "Burkina Faso"                   0 .    .
    "Burma/Myanmar"                  0 .    .
    "Burundi"                        0 .    .
    "Cambodia"                .1612903 .    .
    "Cameroon"                       0 .    .
    end
    For instance, in this exampleI would like to keep solely var1, because even if it is full of 0s, it still gathers information for every country of this sample. However, var2 only gathers data for a few countries that are not in this example and var3 only gathers data for Brazil. I'd like to ask Stata to drop var2 and var3, and more generally every variable that are missing for a majority of countries.

    Thank you for the help!

    Regards,

    Adam

    Last edited by Adam Sadi; 29 Aug 2022, 06:08.

  • #2
    You can use the -missings- command available from SSC. For instance,

    Code:
    missings report, min(20)
    drop `r(varlist)'
    will tell you all the variables have at least 20 missing observations, and drop those.

    Comment


    • #3
      Hemanshu :

      -missings- should work in theory, thank you so much for your helpful comment.

      However when trying to install the package I get the following message :

      Code:
      . ssc install missings
      remote connection failed
      http://fmwww.bc.edu/repec/bocode/m/ either
        1)  is not a valid URL, or
        2)  could not be contacted, or
        3)  is not a Stata download site (has no stata.toc file).
      r(677);

      Comment


      • #4
        I was just able to use that command successfully, though the first time it gave me an error as well. Maybe there's a server issue.

        That said, you might want to try this instead. It seems to have a more updated version:

        Code:
        net install dm0085_2.pkg

        Comment


        • #5
          missings is best referenced to and downloaded from the Stata Journal site. An earlier version is also available at SSC.

          Perhaps Adam tried at a bad time, or there is some other issue with his internet access.

          An otherwise unpredictable search term is dm0085:


          Code:
          . search dm0085, entry
          
          Search of official help files, FAQs, Examples, and Stata Journals
          
          SJ-20-4 dm0085_2  . . . . . . . . . . . . . . . . Software update for missings
                  (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                  Q4/20   SJ 20(4):1028--1030
                  sorting has been extended for missings report
          
          SJ-17-3 dm0085_1  . . . . . . . . . . . . . . . . Software update for missings
                  (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                  Q3/17   SJ 17(3):779
                  identify() and sort options have been added
          
          SJ-15-4 dm0085  Speaking Stata: A set of utilities for managing missing values
                  (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                  Q4/15   SJ 15(4):1174--1185
                  provides command, missings, as a replacement for, and extension
                  of, previous commands nmissing and dropmiss
          
          (end of search)
          
          . ssc desc missings
          
          --------------------------------------------------------------------------------------------------------------
          package missings from http://fmwww.bc.edu/repec/bocode/m
          --------------------------------------------------------------------------------------------------------------
          
          TITLE
                'MISSINGS': module to manage missing values
          
          DESCRIPTION/AUTHOR(S)
                
                 missings includes utility commands for managing variables  that
                (may) have missing values, which variously report, list  and
                tabulate missing values; generate a variable containing  numbers
                of missing values; and drop variables and/or observations that
                are all missing.   missings is intended to unite and supersede
                the author's previous commands nmissing and dropmiss.
                
                KW: missings
                KW: drop
                KW: data management
                KW: missing values
                
                Requires: Stata version 9
                
                Distribution-Date: 20170511
                
                Author: Nicholas J. Cox, Durham University
                Support: email [email protected]
                
          
          INSTALLATION FILES                             (type net install missings)
                missings.ado
                missings.sthlp
          --------------------------------------------------------------------------------------------------------------
          (type ssc install missings to install)

          Comment


          • #6
            I could download the package. Thank you again for your help! -missings- was very useful.

            Comment

            Working...
            X