Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • value label for an encoded numeric variable

    Hi everyone,

    I have been facing this problem for a quite couple of times now & would appreciate your guidance & help in regard it please.

    I have a variable in my data set called BMI (Body mass index), even though it is supposed to be numeric but I believe due to multiple mistakes/errors while entering the values in the excel sheet where the data is being collected, ultimately STATA read the variable as string variable rather than a numeric variable.

    so because I have around 2000 individuals with their BMI, & its hard to manully correct all of the BMI values, I used the following syntax:


    encode BMI , gen (BMI_numric)
    order BMI_numric, after (BMI)
    label dir
    codebook BMI BMI_numric


    the problem is that their is a "value label" for the newly n=generated numeric variable "BMI_numeric", where this "value label" have the values I want them to be for my new variable.
    what is happening that my numeric variable is taking values of (1,2,3,4... 2000) in a serial ascending manner for all observations, instead of taking the value that is equal to the "value label"

    how can I correct this please ? & are my syntax correct to use or not ?

    here is 10 observations from my data, I posted the stata syntax & I wrote theoutput in an excel & then copied & pated here, as I didnot know how to post the output with a nice & neet display here? :

    first: with the "value label" for my "newly generated numeric variable", and this is the way I wanted to look like:


    list BMI BMI_numric in 1/10

    BMI BMI_num
    34.69 34.69
    29 29
    30.98 30.98
    25.98 25.98
    18 18
    48.42 48.42
    26.18 26.18
    27.6 27.6
    21.3 21.3
    25.15 25.15

    Second: Here is the newly generated numeric variable, actual values without the value label, which is not what I want or need:





    list BMI BMI_num in 1/10 , nolabel

    BMI BMI_num
    34.69 509
    29 329
    30.98 407
    25.98 216
    18 15
    48.42 673
    26.18 223
    27.6 277
    21.3 67
    25.15 189



    Thank you & your guidance are highly appreciated.

    Best Regards
    Last edited by rena jk; 02 Apr 2017, 14:14.

  • #2
    Hella Rena,

    You will need to - destring - the variable. Sometimes, you have letters, words, typos which will pose an obstacle to it. Then , you use the - force - option. By using - force - you will get missing values in those cells.

    Another way to tackle: you could try to spot the typos. Indeed, they will be missing values. Then, you compare the "new missing values" with the original data, at the same row, so as to edit the typos accordingly.

    Last but not least, shall you know the "cause" of the problem, you may use the - ignore - option.

    Below, the excerpt from the Stata Manual. You should take a look at it.

    Hope that helps.
    Last edited by Marcos Almeida; 02 Apr 2017, 14:33.
    Best regards,

    Marcos

    Comment


    • #3
      Thank you so much Macros Almedia, your advice helped out & your quick response is highly appreciated.
      thank you

      Comment

      Working...
      X