Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to Replace a string variable with a missing numeric value

    I know there are ways to replace a numeric value with a missing value (dot), using mvdecode and that there are various ways to replace a specific string with a blank string but i am unaware of a way to go from specific string to missing numeric value.

    In particular i have a data set with a few categorical variables. some of the rows are designated as missing things such as gender or age bucket with a "?" and id like to change that to a missing numeric value so that I can run a regression.

    Thanks

  • #2
    If you want to run a regression you need a numeric predictor any way. Changing awkward categories to "." won't achieve that and in any case Stata doesn't regard that as meaning string missing. It regards empty strings "" as being string missing.

    In general you can ignore whatever categories you want including the numeric equivalent of the string value "?"

    Concretely if your categories were "cat" "dog" and "?"and you only care about "cat" "dog" then a string variable like this

    Code:
    input whatever
    "cat"
    "dog"
    "?"
    can be processed like this

    Code:
    label define whatever 1 "cat" 2 "dog"
    Code:
    encode whatever, label(whatever) gen(wanted)
    replace wanted = . if !inlist(wanted, 1, 2)
    and then i.wanted can be used in regression-like exercises.
    Last edited by Nick Cox; 17 Oct 2018, 10:58.

    Comment


    • #3
      Jeremy:
      welcome to this forum.
      Surely there are more efficient ways to do what you're after; any way, you may want to try something along the following lines:
      Code:
       set obs 1
      number of observations (_N) was 0, now 1
      
      . g A="?"
      
      . replace A="."
      (1 real change made)
      
      . destring A, replace
      A: all characters numeric; replaced as byte
      (1 missing value generated)
      
      . list A
      
           +---+
           | A |
           |---|
        1. | . |
           +---+
      PS: As said above, there are more efficient ways...Thanks, NIck to show them!
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Thanks Carlo and Nick,

        My numeric predictor is the price that each customer paid for an item which is included in the data set, just was trying to make sure my dummy variables actually had 2 possible values and not 3.

        Thanks for the help!!

        Comment

        Working...
        X