Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • sort may not be combined with by

    Hi,
    Using the below current code, I could generate my desired output. That is the 95% credible interval for my variables of interests consiting of HLE_ ULE_ TLE_ PLE_. I would like to run this code by two additional variables (Age_Group and sort). I tried to include "by" option but got an error: sort may not be combined with by.
    Thanks
    Nader

    *my current code
    // keep the middle 950 values for each variable
    local N_low = 26
    local N_high =975

    // keep track of the original order of the data
    generate seq = _n

    foreach v of varlist HLE_ ULE_ TLE_ PLE_ {
    sort `v'
    generate trim_`v' = `v'
    replace trim_`v' = . if ! inrange(_n,`N_low',`N_high')
    }

    *what I tried
    // keep the middle 950 values for each variable
    local N_low = 26
    local N_high =975

    // keep track of the original order of the data
    bysort Age_Group sort: generate seq = _n

    foreach v of varlist HLE_ ULE_ TLE_ PLE_ {
    bysort Age_Group sort: sort `v'
    bysort Age_Group sort: generate trim_`v' = `v'
    bysort Age_Group sort: replace trim_`v' = . if ! inrange(_n,`N_low',`N_high')
    }



  • #2
    I don't understand the relevance of what you did to calculating credible intervals (something I don't work with). However, I can comment on the syntax errors.

    Perhaps you intended:
    Code:
    sort Age_Group sort `v'  // sort by `v' among observations sorted by Age_Group and the variable sort.
    rather than
    Code:
    bysort Age_Group sort: sort `v'
    Stata's error message here means what it says: The sort command is not allowed after a by command. Quite a few Stata commands are not permitted with the by command.

    It also appears that no "by" or "bysort" is relevant to this line:
    Code:
    bysort Age_Group sort: generate trim_`v' = `v'
    Nothing in this depends on Stata operating on only a subset of observations. This makes me think that you don't completely understand what by and bysort are supposed to do, so reading the documentation on them would likely help you here and more generally.

    Finally: Even though it is not causing a problem here, I would strongly recommend against naming a variable "sort." Stata will permit this, but giving a variable the name of an existing command could well lead to mistakes by human beings.

    Comment


    • #3
      Thanks Mike. Helpful advice. I renamed sort to sample_new. I tried the below "what I tried code" and got an error: invalid syntax. I appreciate your advice on this.

      Also, Given that I would like to replace the values of trim=`v' with missing if they do not fall within my desired range defined by

      replace trim_`v' = . if ! inrange(_n,`N_low',`N_high')
      then I think the sort of bysort is relevant in the above line of code because I would like to do so by my variables of interest (Age_Group, sample_new). What do you think?


      *What I tried
      // keep the middle 950 values for each variable
      local N_low = 26
      local N_high =975

      bysort Age_Group sample_new: generate seq = _n


      foreach v of varlist HLE_ ULE_ TLE_ PLE_ {
      sort Age_Group sample_new `v'
      generate trim_`v' = `v'
      replace trim_`v' = . if ! inrange(_n,`N_low',`N_high')
      }

      Comment


      • #4
        The below code worked for me. The point is that the loop part and local parts have to run together not separately.

        // keep the middle 950 values for each variable
        local N_low = 26
        local N_high =975

        // keep track of the original order of the data
        bysort Age_Group sample_new: generate seq = _n


        foreach v of varlist HLE_ ULE_ TLE_ PLE_ {
        sort Age_Group sample_new `v'
        generate trim_`v' = `v'
        bysort Age_Group sample_new: replace trim_`v' = . if ! inrange(_n,`N_low',`N_high')

        }


        Comment


        • #5
          I think your approach is confused here and I echo Mike's comments that you should read up generally on the programming chapter available in the PDF documentation included with your installation of Stata.

          You have written:

          Code:
          // keep the middle 950 values for each variable
          local N_low = 26
          local N_high =975
          
          bysort Age_Group sample_new: generate seq = _n  // you do not appear to use or need seq
          
          foreach v of varlist HLE_ ULE_ TLE_ PLE_ {
          sort Age_Group sample_new `v'  // now you sort by age, sample new and `v'
          generate trim_`v' = `v'
          replace trim_`v' = . if ! inrange(_n,`N_low',`N_high')  // I don't see the relavance of sorting for this step.
          }
          But this all appears to be legal Stata syntax and it should not generate an error.

          Since you have not provided a reproducible data example, or shown exactly what code you ran and where exactly it failed (along with Stata's error message), I can only guess at what you are trying to do. Are you simply trying to extract the percentile values which make up your confidence limits?

          Something like this may work and is more direct

          Code:
          foreach v of varlist HLE_ ULE_ TLE_ PLE_ {
          bysort age_group sample_new (`v'): gen ci_`v' = `v'[_n] if inlist(_n, `N_low', `N_high')
          }
          This will pull out the 2.5th and 97.5th percentile (as I infer from your choice of low and high) observation, stick them in a new variable with prefix -ci_-. From here you can sort each new variable in turn and access the values programmatically, say by using -ci_HLE[1]- and -ci_HLE[2]- for the lower and upper limits respectively.

          Comment

          Working...
          X