Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing and recoding values of variables

    Hi all!

    I have a set of variables that contain both numeric and string values. I have converted all the variables into numeric only, but can't seen to get the codes right to classify the values into groups.

    *making list
    global mycomean afb1_mean afb2_mean afm1_mean afg1_mean afg2_mean ///
    fb1_mean ota_mean cit_mean dhcit_mean don_mean donglca_mean zen_mean
    describe $mycomean

    *keping the original string vars of mean mycotoxin levels
    foreach var of global mycomean {
    gen o_`var' = `var'
    }

    *string to numeric
    foreach var of global mycomean {
    encode `var', generate(n_`var')
    }

    *replace values
    foreach i of varlist n_zen_mean n_donglca_mean n_don_mean n_dhcit_mean ///
    n_cit_mean n_ota_mean n_fb1_mean n_afg2_mean n_afg1_mean n_afm1_mean ///
    n_afb2_mean n_afb1_mean {
    replace `i' = 0 if `i' == .
    replace `i' = 1 if `i' == >LOD
    replace `i' = 2 if `i' = > 0
    }

    the >LOD part remains even after destringing the vars.

    Can anyone have a look and tell me where I'm getting the code wrong?

  • #2
    -replace `i' = 1 if `i' == >LOD- is not legal Stata syntax. Neither is -replace `i' = 2 if `i' = > 0-. If this is actually the code you ran, it will stop after -replace `i' = 0 if `i' == .- which is your last syntactically legal command.

    I'm not sure what your data really look like: when asking for code you should always show an example of your data, and you should always use the -dataex- command to do that. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    I'm trying to mentally reconstruct what your data might be like given the code you have written, but I can't really find any data that makes sense with this code, even after you correct the syntax errors pointed out. My best guess is that your original variable are string variables that contain mostly string images of numeric values, things like "7.2" or "25", etc. But some of them exceed the limit of detection of the assay and so instead contain ">LOD" instead of the string image of a number. If I have that right, -encode- will just create garbage from that. You need instead
    Code:
    destring `var', generate(n_`var') ignore(">LOD")
    It appears that you then want to change the resulting quasi-continuous variables into a three-way classification: 0 for originally missing value, 2 for an original value > 0, and 2 for originally ">LOD".

    So that would work as follows:
    Code:
    foreach m of global mycomean {
        replace n_`m' = 0 if missing(n_`m')
        replace n_`m' = 1 if o_`m' == ">LOD"
        replace n_`m' = 2 if n_`m' > 0 & !missing(n_`m')
    }
    Now, as I say, I'm imagining your data and so this code may not actually work if the data are not really as I imagine. But the code will perhaps point you in the right direction.

    Being in a merciful mood at the moment, I will spare you my rants about why mycomean should be a local and not a global, and about why turning continuous variables into categories is almost always a terrible idea.

    Comment


    • #3
      Hi Clyde!

      Thanks so much for replying! Next time I'll definitely use the dataex command!

      Destring helped, but I still had to replace the >LOD with a number to get at what I wanted. Had to create 2 other vars for that, but finally got there!

      p.s: The plan is to use the variables in their continuous format, this is for exploratory purpose since the data is quite messy.

      Comment

      Working...
      X