Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • delete variable if observation is missing

    Hi everyone,

    I'm trying to import a batch of 1000 csv files which are downloaded from a specific database. However, the database, for some reason includes additional empty columns at the end of the file which poses some problems in the following data treatment procedures (requires reshapping). Those files are imported without names in the variables meaning that stata assignes v1...vn names to the variables.

    I know which columns should be deleted it the value of the 4th (and on, but only need the 4th one) is missing. Is there a way going through each 4th observation of each variable and delete that variables if it returns missing? I know the command should use the foreach loop, but I'm unable to delete the variable. I've tried the following without any luck

    foreach v of varlist v*{
    drop `v' if missing(`v'[4])
    }

  • #2
    It seems you need a quite simple command.

    Unfortunately, I cannot test it now, but you may try something like creating an ordered variable, say, "id", the likes of - gen id = _n - then - drop v* if id == 4 & missing(v*)

    Hope that helps.
    Last edited by Marcos Almeida; 26 Mar 2017, 08:28.
    Best regards,

    Marcos

    Comment


    • #3
      Some confusion here. The help for drop explains that you can drop variables or drop observations but not both at the same time.

      There is no need to loop explicitly here. Using findname (SJ)


      Code:
      findname, all(missing(@[4]))
      drop `r(varlist)' 
      will drop all variables which contain missings in the 4th observation.

      Comment


      • #4
        Another fairly simple solution, from first principles, is close to what you tried originally. While I don't think this is a better solution than what Nick proposes in #3, I'm showing it because I think it raises an important point about Stata that is likely to trip you up again in the future if it isn't pointed out now.

        Code:
        foreach v of varlist v* {
            if missing(`v'[4]) {
                drop `v'
            }
        }
        Note the difference between the -if- command (-help ifcmd-) used here, and the -if- condition (or qualifier) that you attempted to use in #1. It takes a while for people to learn the difference between them and when to use which one. The basic principles are this:

        The -if- qualifier (e.g. gen x = y if z > 0-) tells Stata to go through the data in set observation by observation and apply the command only to those observations where the condition following -if- is satisfied. It tells Stata which observations in the data set the command should be applied to.

        By contrast, the -if- command does not tell Stata to select only certain observations. Instead, it tells Stata to determine whether the condition specified is true or not, and if it is true, to proceed with the command(s) contained within the curly braces. If the condition is not true, then the commands are skipped altogether. They are either applied or not; there is no selection of a subset of the data to apply them to (unless the commands explicitly contain their own -if- or -in- qualifiers.)

        In your particular situation, -drop `v' if missing(`v'[4])- cannot work because you cannot -drop- a variable in only certain observations--the very idea of it makes no sense. Moreover even using the condition -missing(`v'[4])- makes little sense in an -if- qualifier, because it does not distinguish a subset of observations anyway.

        What you want to do is determine whether the condition missing(`v'[4]) is true, and if so you want to drop the variable `v' (as a whole). So there is no selection of a subset of observations to apply the command to. So this requires the -if- command, not the -if- qualifier.
        Last edited by Clyde Schechter; 26 Mar 2017, 10:28.

        Comment


        • #5
          Dear all,

          thank you so much for the feedback! I'll go with solution presented by Clyde. It seems more natural to me as it fits what I was trying to do first.

          Comment

          Working...
          X