Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Selecting multiple variables (with * or ?)

    Dear all,

    Can I please ask you a question about selecting multiple variables by the use of the asteriks and/or question mark (or perhaps even an other option I have not considered yet)

    I am programming a missing data report (see the attached table at the bottom of the post to get an idea of what I would like to achieve). For each variable (see variable_02fudb_visdat as an example) I have created a new variable with prefix Q_ to which I will assign the value 99 (data not required), 1 (data present) or 2 (data missing). In this example _02fudb_visdat (the visit date of the 2nd patient follow-up visit) is only seen as missing, if 02fudb_visyn == 1 (meaning that the visit took place).

    So I have initially written this (which works):

    replace Q__02fudb_visdat = 99 /* Set to Not Required by default */
    replace Q__02fudb_visdat = 2 if _02fudb_visyn == 1 & missing(_02fudb_visdat) /* Set to Missing */
    replace Q__02fudb_vistdat = 1 if !missing(_02fudb_visdat) /* Set to Present */

    The problem is that I have 13 visits (and each visit has many variables like the example variable _02fudb_visdat), so I would like to re-write this code using foreach, to save myself a lot of work. The variables I want to refer to only differ in the prefix (01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13), so I thought the use of * and ? would help me out here (have already explored the manual regarding this)

    The first line of code I have successfully replaced with the following:

    foreach x of varlist Q_*fudb_fup_date {
    replace `x' = 99 /* Set to Not Required by default */
    }

    But the second line of code I cannot seem to get working (specifically the text in red):

    Idea 1:
    foreach x of varlist Q_*fudb_fup_date {
    replace `x' = 2 if _02fudb_visdat == 1 & missing(_??fudb_visdat) /* missing */
    }

    --> this code gives an error: _??fudb_fup_date invalid name


    Based on the posts I found on the forum and based on the Stata documentation (picture below), I thought this would be working? Can someone explain why Stata considers this invalid?

    Click image for larger version

Name:	Abbreviating_variables.PNG
Views:	1
Size:	25.3 KB
ID:	1493765


    Idea 2:
    foreach x of varlist Q_*fudb_fup_date {
    replace `x' = 2 if _02fudb_visdat == 1 & missing(_*fudb_visdat) /* missing */
    }

    --> this also gives an error: _ ambiguous abbreviation

    Can someone explain why this is not allowed?

    Anyway, after some thinking, I realized this probably would not have worked anyway, because both _02fudb_visdat and Q__02fudb_visdat can be abbreviated with _*fudb_visdat, while I only want to refer to _02fudb_visdat (and the fudb_visdat for the other follow-up visits) in the red section.

    Idea 3:
    It is an option to generate variables that do not start with _Q, but rather end with _Q, so I don't have two variables (_02 and Q__02) that can be abbreviated in the same way. Unfortunately, I already wrote the code for the missing data report for most of my non-repeating forms, so it would be quite some work to go back and change all this.

    I hope you have an idea how I can refer to _01fudb_visdat-_13fudb_visdat with an abbreviation without this abbreviation applying to Q__01fudb_visdat-Q__01fudb_visdat as well.

    Or perhaps you have an entirely different suggestion how to tackle this?

    Thank you for you ideas!

    Kind regards,

    Moniek


    Example table of what I would like to achieve:
    subjectid _02fudb_visdat Q__02fudb_visdat _02fudb_visyn
    001 2 1
    002 99
    003 18Jan2019 1 1

  • #2
    #1 and #2 can be explained with reference to #1 alone.

    Code:
    replace `x' = 2 if _02fudb_visdat == 1 & missing(_??fudb_visdat)
    A wildcard won't work here with missing() if only because if it evaluates to several variable names, then those variable names must be separated by commas. This follows from the help for missing() rather than documentation of wildcards.

    In fact, missing() won't accept a wildcard even if the wildcard points to a single variable. As for why that doesn't work even though perhaps a user might think it should, the answer is probably just that no one at StataCorp thought of allowing it. In fact, it's pretty standard that Stata functions don't accept wildcards, mostly because they wouldn't make sense in general and also because whenever you know you need a single variable you wouldn't or shouldn't want to specify it using a wildcard. At least that's my guess. I just use this code; I didn't write it, manifestly. Tell me if I am wrong.

    This should work:

    Code:
    unab myvars : _??fudb_visdat
    local myvars : subinstr local myvars " " "," , all
    replace `x' = 2 if _02fudb_visdat == 1 & missing(`myvars')
    #3

    _01fudb_visdat-_13fudb_visdat sounds like _??fudb_visdat

    I don't understand the question here, as you used this earlier.

    Comment


    • #3
      Thank you for your reply Nick,

      Would you mind if I ask you some more questions about the suggested code? My apologies if I am asking very basic questions, I have searched for a while first to try and figure it out myself, but did not get there.

      Line 1 I understand: unab expands and unabbreviates the specified variables, in this case _??fudb_visdat and saves these variables in a local called myvars.
      --> As a sanity check I wanted to double check whether this worked with keep `myvars' . It worked as expected, only keeping the 13 variables that can be abbreviated to _??fudb_visdat (_01fudb_visdat, _02fudb_visdat, 03_fudb_visdat etc.). The only weird thing is that the code works if I run both lines at the same time, but if I run both lines separate from each other, I receive an error. Do you have an idea why this is the case?

      Code:
      unab myvars: _??fudb_fup_date
      keep `myvars'
      --> works
      
      --> when I first ran: unab myvars: _??fudb_fup_visdat
      --> and then: keep `myvars'
      I received an error: varlist, if exp, or in range required
      Regarding the second line of code: I am a bit lost as to what this this does and how this helps with my problem...I know that subinstr replaces an old string with a new, so if it would have format like "visdat", "visdate", I understand it can change all my variable names that have "visdat" in them, to have "visdate" instead. But I have never seen use of subinstr with 3 pair of quotation marks before the comma.

      Regarding the third line of code:
      It gives an error: varlist required

      What I have written initially is [see below], but I am wondering how I can write this in as few lines of code as possible instead of in 13 lines (as this is only the code for one variable, there are many more variables I need to do this for in my dataset!).

      Code:
      replace Q__01fudb_fup_visdat = 2 if 01fudb_visyn == 1 & missing(_01fudb_fup_visdat) /* Set Q_ variable to missing */
      replace Q__02fudb_fup_visdat = 2 if 02fudb_visyn == 1 & missing(_02fudb_fup_visdat /* Set to Q_ variable to missing */
      replace Q__03fudb_fup_visdat = 2 if 03fudb_visyn == 1 & missing(_03fudb_fup_visdat /* Set to Q_ variable to missing */
      replace Q__04fudb_fup_visdat = 2 if 04fudb_visyn == 1 & missing(_04fudb_fup_visdat) /* Set to Q_ variable to missing */
      replace Q__05fudb_fup_visdat = 2 if 05fudb_visyn == 1 & missing(_05fudb_fup_visdat) /* Set to Q_ variable to missing */
      replace Q__06fudb_fup_visdat = 2 if 06fudb_visyn == 1 & missing(_06fudb_fup_visdat) /* Set to Q_ variable to missing */
      replace Q__07fudb_fup_visdat = 2 if 07fudb_visyn == 1 & missing(_07fudb_fup_visdat) /* Set to Q_ variable to missing */
      replace Q__08fudb_fup_visdat = 2 if 08fudb_visyn == 1 & missing(_08fudb_fup_visdat) /* Set to Q_ variable to missing */
      replace Q__09fudb_fup_visdat = 2 if 09fudb_visyn == 1 & missing(_09fudb_fup_visdat) /* Set to Q_ variable to missing */
      replace Q__10fudb_fup_visdat = 2 if 10fudb_visyn == 1 & missing(_10fudb_fup_visdat) /* Set to Q_ variable to missing */
      replace Q__11fudb_fup_visdat = 2 if 11fudb_visyn == 1 & missing(_11udb_fup_visdat) /* Set to Q_ variable to missing */
      replace Q__12fudb_fup_visdat = 2 if 12fudb_visyn == 1 & missing(_12fudb_fup_visdat) /* Set to Q_ variable to missing */
      replace Q__13fudb_fup_visdat = 2 if 13fudb_visyn == 1 & missing(_13fudb_fup_visdat) /* Set to Q_ variable to missing */
      Thank you for your help and patience!

      Best wishes,

      Moniek

      Comment


      • #4
        Originally posted by Moniek Bresser View Post
        The only weird thing is that the code works if I run both lines at the same time, but if I run both lines separate from each other, I receive an error. Do you have an idea why this is the case?
        Actually, this is not weird but what you should expect; it is what local means. When you run one line of code from a do-file, then local macros defined in that part of the do-file are local to, i.e., they do not exists outside of that part of the do-file. Hence, when you only run the second line, local myvars does no longer exist.

        Originally posted by Moniek Bresser View Post
        I know that subinstr replaces an old string with a new, so if it would have format like "visdat", "visdate", I understand it can change all my variable names that have "visdat" in them, to have "visdate" instead. But I have never seen use of subinstr with 3 pair of quotation marks before the comma.
        Type

        Code:
        help extended functions
        to learn more about this syntax (and many other useful tools).

        Originally posted by Moniek Bresser View Post
        Regarding the third line of code:
        It gives an error: varlist required
        You probably do not define local x in the part of the code that you execute; see your first question.

        Originally posted by Moniek Bresser View Post
        I am wondering how I can write this in as few lines of code as possible instead of in 13 lines (as this is only the code for one variable, there are many more variables I need to do this for in my
        I might go with

        Code:
        forvalues i = 1/13 {
            local number : display %02.0f `i'
            local basename  `number'fudb_fup_visdat
            local shortname `number'fudb_visyn
            replace Q__`basename' = 2 if `shortname' == 1 & missing(_`basename')
        }
        If you want fewer lines, you could put everything inside the loop into one line; I would not do that.

        Edit:

        There seems to be a bit more to your second question. Nick suggested to code

        Code:
        local myvars : subinstr local myvars " " "," , all
        He suggested this so you could feed a comma separated list of variable names to missing(). After

        Code:
        unab myvars : _??fubd_visdat
        `myvars' looks something like this

        Code:
        _01fudb_visdat _02fudb_visdat ...
        You cannot feed this to missing() because missing() expects to see something like this

        Code:
        _01fudb_visdat, _02fudb_visdat, ...
        well, more literally, there will be no spaces left

        Code:
        _01fudb_visdat,_02fudb_visdat,...
        (anyway, note the commas).

        Nick's suggested code puts commas where there used to be spaces, thus, creating the type of input that missing() expects to see.

        Best
        Daniel
        Last edited by daniel klein; Yesterday, 04:57. Reason: notation missing -> missing()

        Comment

        Working...
        X