Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dealing with a very unusual value in a variable

    Hello,

    I have been working on cleaning a dataset to practice stata. In the dataset, there is a variable that contains data on land area. I was able to clean every entry in that variable except for two entries that are mixed fractions. I have never seen this before in a dataset and am not sure how I would convert them to decimal format (I would like to convert ½ to 0.5, and so forth). The output is shown below(the rows with the erroneous entries are highlighted in red):

    tab 'varname'

    Area Freq. Percent Cum.

    1 628 25.41 25.41
    2 521 21.08 46.50
    3 257 10.40 56.90
    4 203 8.22 65.12
    5 146 5.91 71.02
    10 85 3.44 74.46
    0.5 81 3.28 77.74
    6 72 2.91 80.66
    7 44 1.78 82.44
    1.5 41 1.66 84.10
    20 41 1.66 85.75
    50 37 1.50 87.25
    8 37 1.50 88.75
    15 33 1.34 90.08
    100 30 1.21 91.30
    12 21 0.85 92.15
    30 16 0.65 92.80
    9 15 0.61 93.40
    2.5 13 0.53 93.93
    25 11 0.45 94.37
    .5 10 0.40 94.78
    18 9 0.36 95.14
    40 9 0.36 95.51
    0.25 7 0.28 95.79
    35 6 0.24 96.03
    45 6 0.24 96.28
    ½ 6 0.24 96.52
    .25 5 0.20 96.72
    13 5 0.20 96.92
    60 5 0.20 97.13
    0.75 4 0.16 97.29
    1.50 4 0.16 97.45
    14 4 0.16 97.61
    200 4 0.16 97.77
    0.50 3 0.12 97.90
    10.5 3 0.12 98.02
    11 3 0.12 98.14
    16 3 0.12 98.26
    . 2 0.08 98.34
    150 2 0.08 98.42
    17 2 0.08 98.50
    2.50 2 0.08 98.58
    34 2 0.08 98.66
    500 2 0.08 98.75
    53 2 0.08 98.83
    55 2 0.08 98.91
    70 2 0.08 98.99
    700 2 0.08 99.07
    .8 1 0.04 99.11
    0.7 1 0.04 99.15
    10000 1 0.04 99.19
    120 1 0.04 99.23
    1600 1 0.04 99.27
    19 1 0.04 99.31
    2000 1 0.04 99.35
    21 1 0.04 99.39
    22 1 0.04 99.43
    23 1 0.04 99.47
    28 1 0.04 99.51
    2½ 1 0.04 99.55
    3.5 1 0.04 99.60
    32 1 0.04 99.64
    42 1 0.04 99.68
    450 1 0.04 99.72
    4900 1 0.04 99.76
    63 1 0.04 99.80
    65 1 0.04 99.84
    67 1 0.04 99.88
    75 1 0.04 99.92
    80 1 0.04 99.96
    8100 1 0.04 100.00

    Total 2,471 100.00

    Thanks for your help!


  • #2
    For your future posts, please familiarize yourself with the dataex command for presenting data examples (see FAQ Advice #12 for details). You will have to replace all fractional values, one at a time.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str7 Area
    "2¾"
    "3.2"
    "5½"
    "4"
    "7½"
    "4¾" 
    end
    
    gen wanted=Area
    replace wanted= subinstr(wanted, "½", ".5", .)
    replace wanted= subinstr(wanted, "¾", ".75", .)
    destring wanted, replace
    Res.:

    Code:
    . l, sep(0)
    
         +---------------+
         | Area   wanted |
         |---------------|
      1. |   2¾     2.75 |
      2. |  3.2      3.2 |
      3. |   5½      5.5 |
      4. |    4        4 |
      5. |   7½      7.5 |
      6. |   4¾     4.75 |
         +---------------+

    Comment

    Working...
    X