Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • A problem of Stata loop with if...else command

    Hello all:

    I wrote a loop to calculate cumulative percentage of the categories in each year of several variables. I put
    if !missing(`var')
    to tell Stata to skip the year if the variable is missing (see the simplified codes below. I removed most calculation part). This loop works well for some variables but not for some others. In the example below it works for DE_will but not for DE_income. The error message is
    r(111); __000001 not found
    By -set trace- command I find -if !missing(`var')- does not recognize the case that DE_income is missing in 2000, and therefore the following -egen- command does not work. I checked these variables but did not find any reason. Can you figure it out? (It seems the example data file is too large to be uploaded. If you want to test the code, I can send you the data.)

    Code:
    use example, clear
    
    foreach var of varlist DE_will  DE_income  {
     
    foreach n of numlist 2000 2003 2004 2005 2006 2007 2009 2011 2012 2013 {
    preserve
    keep if !missing(`var') & year==`n'
    
    if !missing(`var') {
    di `n'
    bysort `var': egen c_`var'=count(`var')
    
    restore
    }
    else {
    restore
    }
    
    }    
    }

  • #2
    I think you are confusing the if-qualifier with the if-command. See help ifcmd vs help if.

    I have to admit I don't even know what -if !missing(`var') {- does exactly because - if exp{ -requires a simple expression, e.g. if 4 = 1... Whereas -if !missing(`var')- creates a variable equal to one if the observation for that variable is not missing and zero if it is.

    Comment


    • #3
      Xiaochi,

      your code seems to have several issues that cumulate. In order to help with your problem, you should write down what you want to achieve and give us a data example using -dataex- (see the FAQ in section 12 for advice on how to give a good-to-use example).

      Some issues that can easily be identifed are:
      1. You use the command if instead of the if qualifier. Jesse already pointed out that this seems to be undesired for your problem. ifcmd, when used with a variable inside the if-expression, only looks at the first observation in the dataset, and does not read the whole dataset to perform something for each observation.
      2. You calculate variables using -egen-. Afterwards, you restore the original data. All variables that have been calculated in the meantime are lost upon restoring the original data.
      Please give us a data example and write down what you want to achive for a more constructive advice.

      Regards
      Bela

      Comment


      • #4
        I am sorry for the confusion I caused. What I want to do here is to evaluate the loss of information of 60 recoded ordinal variables, which is calculated by using the dispersion of recoded variable (`var') to be divided by the dispersion of the old variable (`var'_o). The dispersion is calculated from the sum square of the difference between the cumulated proportion of each category and 1/2 (Blair and Lacy, 2000 ). Since the variables are longitudianl, I need to calculate this statistic by each year. Finally, I save the results in global macro (This code is cumbersome. I thought about using `r()', for example, -tab- with `r()', but it seems Stata does not store this kind of information. I will appreciate it if you have better solution.)

        Here is the code:

        Code:
        clear
        input int DE_income float year double DE_will float(DE_income_o DE_will_o)
        . 2000 2 . 2
        . 2000 2 . 2
        . 2000 2 . 2
        . 2000 2 . 2
        . 2000 1 . 1
        . 2000 1 . 1
        . 2000 2 . 2
        . 2000 1 . 1
        . 2000 1 . 1
        . 2000 1 . 1
        2 2006 . 2 .
        2 2006 . 2 .
        2 2006 . 2 .
        2 2006 . 2 .
        1 2006 . 1 .
        2 2006 . 2 .
        9 2006 . 9 .
        3 2006 . 3 .
        1 2006 . 1 .
        2 2006 . 2 .
        end
        
        foreach var of varlist DE_income DE_will   {
        foreach n of numlist 2000 2013 {
        preserve
        keep if !missing(`var') & year==`n'
        
        if !missing(`var') {
        
        // Recoded variable //
        bysort `var': egen c_`var'=count(`var')         // Frequency of each category
        bysort `var': gen p_`var'=c_`var'/_N if _n==1   // Proportion of each category 
        gen cp_`var'=sum(p_`var') if !mi(p_`var')       // Cumulative proportion   
        drop if cp_`var'==1                             // Just need k-1, k is the total number of categories 
        egen l_1 =sum((cp_`var'-1/2)^2)                 // Dispersion 
        
        // Old variable //
        bysort `var'_o: egen c_`var'_o = count(`var'_o)
        bysort `var'_o: gen p_`var'_o = c_`var'_o/_N if _n==1
        gen cp_`var'_o = sum(p_`var'_o) if !mi(p_`var'_o)
        drop if cp_`var'_o == 1
        egen l_2 =sum((cp_`var'_o-1/2)^2)
        
        gen k = l_1/l_2
        sum k
        
        global `var'`n' `r(mean)'
        restore
        }
        else {
        restore
        }
        
        }    
        }
        The problem is, like I said in the thread, the if !missing() does not recognize some variables when they are missing, which lead to the error of the following -egen- command. Since I use -egen- to generate cumulatrive percentage. I have to put some commands to tell Stata to skip the year if data are missing.

        Comment


        • #5
          the if !missing() does not recognize some variables when they are missing
          As others in this thread have already pointed out, you are misunderstanding what the if command -if missing(`var') { - does. Carefully re-read #2 and #3 in this thread.

          Also read -help if- and -help ifcmd- and the corresponding manual sections: the two uses of the word "if" do very different things and you are using the wrong one.

          Comment


          • #6
            I appreciate all your suggestions. I read this http://www.stata.com/support/faqs/pr...-if-qualifier/, but I still don't get it.
            Code:
             keep if !missing(`var') & year==`n'
            means Stata drops all observations if the variable in the specified year is missing, so it does not matter -ifcmd- looks at the first observation. As you can see in the example, DE_income is missing in 2000 and DE_will is missing in 2006. This loop works on DE_income but not on DE_will.

            (The years should be 2000 and 2006 in the loop)
            Last edited by Xiaochi; 10 Sep 2016, 12:57.

            Comment


            • #7
              OK. Fair enough. But that also means that the -if- command does nothing at all, so it has no reason to be there.

              I've played with your data and code a bit, and I believe the problem is that your program is breaking because there are combinations of `var' and `n' for which there are no observations. For example there are no observations when DE_will is non-missing and year is either 2000 or 2013: DE_will is only non-missing when year = 2006. Apparently some of the -egen- functions are throwing errors when you try to apply them to an empty data set. The following revision of your code runs without error messages in your example data:

              Code:
              clear
              input int DE_income float year double DE_will float(DE_income_o DE_will_o)
              . 2000 2 . 2
              . 2000 2 . 2
              . 2000 2 . 2
              . 2000 2 . 2
              . 2000 1 . 1
              . 2000 1 . 1
              . 2000 2 . 2
              . 2000 1 . 1
              . 2000 1 . 1
              . 2000 1 . 1
              2 2006 . 2 .
              2 2006 . 2 .
              2 2006 . 2 .
              2 2006 . 2 .
              1 2006 . 1 .
              2 2006 . 2 .
              9 2006 . 9 .
              3 2006 . 3 .
              1 2006 . 1 .
              2 2006 . 2 .
              end
              
              foreach var of varlist DE_income DE_will   {
                  foreach n of numlist 2000 2013 {
                      preserve
                      keep if !missing(`var') & year==`n'
                      quietly count
                      if r(N) > 0 {
              
                          // Recoded variable //
                          bysort `var': egen c_`var'=count(`var')         // Frequency of each category
                          bysort `var': gen p_`var'=c_`var'/_N if _n==1   // Proportion of each category 
                          gen cp_`var'=sum(p_`var') if !mi(p_`var')       // Cumulative proportion   
                          drop if cp_`var'==1                             // Just need k-1, k is the total number of categories 
                          egen l_1 =sum((cp_`var'-1/2)^2)                 // Dispersion 
              
                          // Old variable //
                          bysort `var'_o: egen c_`var'_o = count(`var'_o)
                          bysort `var'_o: gen p_`var'_o = c_`var'_o/_N if _n==1
                          gen cp_`var'_o = sum(p_`var'_o) if !mi(p_`var'_o)
                          drop if cp_`var'_o == 1
                          egen l_2 =sum((cp_`var'_o-1/2)^2)
              
                          gen k = l_1/l_2
                          sum k
              
                          local `var'`n' `r(mean)'
                      }
                      restore
                  }    
              }
              However, it produces no output for DE_will for the reasons noted above, and only produces DE_income output for 2000 because there are no observations where DE_income is non-missing and year == 2013.

              The key to getting this to run is, after you -keep-, counting whether there are any observations left to process, and skipping the computations if there are none. The key change is in bold face. I have also reformatted the code for improved readability. And I have changed your creation of global macros to locals because global macros are inherently unsafe and their use when there is any alternative is poor programming practice (though it probably wasn't doing any harm in this particular situation).

              Comment

              Working...
              X