Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate variable with names from macro and drop specific variables from dataset

    Hello, I find almost always an answer in this forum. This time I didn't find an answer, that's why I ask here. I searched for similar problems, but couldn't find any.

    The problem is as follows. I have a local containing several variablenames with the same prefix, say var1,...,var100.
    I want to calculate the sum of each variable if the observation exceeds some benchmark, because I only want to keep those variables.

    Code:
    unab vars: var*
    foreach var in `vars' {
    local i = `i' + 1
    gen temp1=0
    replace temp1=1 if `var'>benchmark
    egen temp2=sum(temp1)
    local sumtotal`i'=temp2
    drop temp*
    }
    capture set obs `i'
    gen sumtotal = .
    forval j = 1/`i' {
      replace sumtotal = `sumtotal`j'' in `j'
    }
    
    gen best=0
    replace best=1 if sumtotal>5 & sumtotal!=.
    
    save "${stata}\temp.dta", replace

    Problem1: I would like to have a new variable containing the names of the local vars (var1, var2,...,var100) in the first hundred rows. Is this possible?
    The following didn't work.

    Code:
    use "${stata}\temp.dta", clear
    gen varname=.
    forvalues k=1/100{
    replace varname=`vars' in `k'
    }

    Problem2: I only want to keep those variables, that are "best" for subsequent calculations.
    Example: when best=1 for var36 and var62 and best=0 for the other 98 variables.
    Is there a command to keep only var36 and var62 and drop the other ones?

    I tried:
    Code:
    unab varlist : _all
    unab exclude : var*
    local best="var36 var62"
    local varlist : list varlist - exclude + best
    keep `varlist'
    Of course I could type something like, but that seems tedious.
    Code:
    drop var1 var2 ... var35 var37... var61 var63 ... var100
    Is there a easier way? Like to catch those best variables?
    I have this dummy variable best that is 1 only in two rows. Maybe there is an easier way to keep the variables by using only those rows, where the dummy=1.


    Many tanks in advance for your help,


    Patrick












  • #2
    For problem 1:


    Code:
    gen varname=""
    forvalues i=1/100 {
        local j: word `i' of `vars'
        replace varname="`j'" in `i'
            }
    Stata/MP 14.1 (64-bit x86-64)
    Revision 19 May 2016
    Win 8.1

    Comment


    • #3
      If I misunderstood your problem and you want varname to contain the full content of `vars' in 1/100:

      Code:
      gen varname=""
      replace temp= "`vars'" in 1/100
      Stata/MP 14.1 (64-bit x86-64)
      Revision 19 May 2016
      Win 8.1

      Comment


      • #4
        Regarding problem 2, I'm not sure how the variable best identifies variables and not observations, so I can't provide you with a better method. Here's the "fix" for your code:

        Code:
        unab varlist: _all
        unab exclude: var*
        local best "var36 var62"
        local new_exclude: list exclude - best
        local keep: list varlist - new_exclude
        keep `keep'
        Stata/MP 14.1 (64-bit x86-64)
        Revision 19 May 2016
        Win 8.1

        Comment


        • #5
          Hello Carole, thanks a lot for your help.

          To explain my problem a little deeper: I want to compare different definititions. I already programmed the different methods and saved them as a new variable (what I wrote as var1, var2,...).
          Then, I want to find the best and only work with those definitions later on.
          The benchmark is some kind of "true" value and I check how close the definitions is to that value. Only the x closest are useful for subsequent work.
          To do that I generated a variable "deviation", where I put the deviation of each definition from its true value in the ith row (for i=1,..100) with a loop, so the rows 101,..., till end are missing values.
          Then I generated a new dummy variable "best", that is 1, if the deviation is small enough, 0 else.
          So i have a column, where the first 100 rows are either 1 or 0, the rest is.
          I thought about comparing these first 100 rows with the local vars with all definitions, and generate a new local only containing those variables that have a 1 in the best column.
          The problem is that I do not want to count the best values by hand as in the example above with var36 and var62 because I want to comtinoue with various definitions, so I thought about a method that could identify the best definitions and keep only those variables.


          Comment


          • #6
            I think I understand what you are doing and that the following will get you there faster. My concern with storing your info in the first 100 obs is that if you do something that changes the sort of the data, those values could end up dispersed in the dataset.

            This creates a local macro best that contains those variables that are greater than benchmark at in at least x cases--where x in your example is 5 (and in my example is 100), so you'll want to change that value in red.

            Code:
            *create some fake data
            clear
            set obs 200
            gen benchmark=runiform()
            forvalues i=1/100 {
                gen var`i'=runiform()
                }
                
            *make a local macro containing "best vars"
            forval i= 1/100 {
                count if var`i'>benchmark
                if `r(N)' > 100    local best `best' var`i'
                }
            mac list _best
            Stata/MP 14.1 (64-bit x86-64)
            Revision 19 May 2016
            Win 8.1

            Comment


            • #7
              Hi Carole,

              thank you for the code and the advice.

              Is there a command to keep only the variables in the macro?
              For example I tried the following unsuccesfully after running your code:

              keep list _best
              keep _best
              keep `best'
              keep if varlist==best

              Comment


              • #8
                Code:
                keep `best'

                Comment

                Working...
                X