Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing Blue Character Values

    Hi,

    I am struggling at the moment with two -probably basic- issues.

    1. I am working on the Italian dataset which collected the results of the PIAAC questionnaire (a research of the OECD). The majority of the variables appear red in the browser because they are string. After transforming the variables from string to numeric through the command encode, I obtained blue data which are the value labels of the variables.
    Can I work (=doing regressions and do further analyses) with blue data or does it have to be black? How can I transform blue data in black?

    2. The blue data still has some values which are characters: V, N, U etc. See image below. These actually represent missing values and I would like to remove them. How can I do this?

    Attached Files

  • #2
    the non-numeric characters are because you, or someone, has defined value labels for this variable; the name of the value label is shown under "properties" on the far right, or you can find it by going:
    Code:
    d b_q01bE
    see
    Code:
    help label
    to change or drop these

    Comment


    • #3
      In addition to Rich's comments, I'd add the following (which I was just about to post when his note appeared.)

      Your use of -encode- was the source of the problem. You wanted -destring-. Look at the -help- for these two commands and compare their different purposes. Using -encode- when -destring- was relevant will cause your numeric data to become very wrong, i.e., "garbage. Those numeric variables showing up in red as strings suggests that the data set that you loaded was in some way messed up by including non-numeric characters in some of the raw data observations for these numeric variables, which forced Stata to treat *all* the observations for those variables as strings. (If Stata sees *any* non-numeric characters in any observations for a variable, it assumes all of the observations are strings.) There was likely some kind of mistake in the importing or the preparation of the raw data. Errors in preparation are possible but not likely in data generated by a statistical agency. You may need to show us how you (or whomever) imported this data, as there is likely an error in that process.

      "Blue data" is just the default color for value labels of variables when displayed in the data editor, and ha no influence on statistical calculations. The -encode- command automatically supplied those labels. However, what matters here is fixing the presence of non-numeric characters for numeric variables in the data set.

      Comment

      Working...
      X