Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data Cleaning

    Hi,

    My variable region looks like this and I'm not able to find the reason for each category repeating itself. How would you suggest I should take this forward.

    Click image for larger version

Name:	Screenshot 2023-03-17 223900.png
Views:	1
Size:	22.4 KB
ID:	1706065





  • #2
    It looks like the label repeats itself, but the value of the variable does not. If I were you, I might start by returning to the codebook or other data documentation and trying to figure out what the labels should be for each value. You can see the value (instead of the label) with the following line:

    Code:
    tab region, nolabel

    Comment


    • #3
      The -fre- command witten by Ben Jann will do this job #2 better.
      Code:
      ssc install fre
      fre region

      Comment


      • #4
        Thank you Daniel Schaefer & Chen Samulsion

        Comment


        • #5
          Click image for larger version

Name:	Screenshot 2023-03-18 113204.png
Views:	1
Size:	25.1 KB
ID:	1706100


          How should I proceed after this?

          Comment


          • #6
            I think Daniel Schaefer was on target here. The answer lies in how these data were produced. But there is an easier route to follow. A leading space in some cases but not others would account for your results.

            Code:
            . clear
            
            . set obs 4
            Number of observations (_N) was 0, now 4.
            
            . gen foo = word("Central Eastern Central Eastern", _n)
            
            . tab foo
            
                    foo |      Freq.     Percent        Cum.
            ------------+-----------------------------------
                Central |          2       50.00       50.00
                Eastern |          2       50.00      100.00
            ------------+-----------------------------------
                  Total |          4      100.00
            
            . replace foo = " " + foo in 3/4
            variable foo was str7 now str8
            (2 real changes made)
            
            . tab foo
            
                    foo |      Freq.     Percent        Cum.
            ------------+-----------------------------------
                Central |          1       25.00       25.00
                Eastern |          1       25.00       50.00
                Central |          1       25.00       75.00
                Eastern |          1       25.00      100.00
            ------------+-----------------------------------
                  Total |          4      100.00
            
            . fre foo
            
            foo
            -------------------------------------------------------------
                            |      Freq.    Percent      Valid       Cum.
            ----------------+--------------------------------------------
            Valid   Central |          1      25.00      25.00      25.00
                    Eastern |          1      25.00      25.00      50.00
                    Central |          1      25.00      25.00      75.00
                    Eastern |          1      25.00      25.00     100.00
                    Total   |          4     100.00     100.00          
            -------------------------------------------------------------
            In my example fre is showing this space -- and so was tabulate. It is just hard to see.

            For leading space, read also any space-like characters. If
            Code:
            replace region = trim(region)
            does not solve the problem, then find out which characters are present using chartab from SSC.
            Last edited by Nick Cox; 18 Mar 2023, 05:04.

            Comment

            Working...
            X