Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Check that string variable equals label of encoded variable

    Hi,

    I am merging several files and I would like to verify that I have done it properly. If I made an error, some encoded variable labels mismatch the original string variable there were initially converted from. How can I flag observations for which the string does not match the label of the encoded variable? Thanks.

    I enclose the data that I want to check ("stock" and "Code" should match, but sometimes do not) but unfortunately "dataex" does not replicate the labels I see in the screen. I enclose a screenshot to complement the explanation
    Click image for larger version

Name:	Capture.JPG
Views:	1
Size:	90.0 KB
ID:	1563647
    .

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float mydate double stock str6 Code
    564 1 "130088"
    565 1 "130088"
    566 1 "130088"
    567 1 "130088"
    568 1 "130088"
    569 1 "130088"
    570 1 "130088"
    571 1 "130088"
    572 1 "130088"
    573 1 "130088"
    574 1 "130088"
    575 1 "130088"
    576 1 "130088"
    577 1 "130088"
    578 1 "130088"
    579 1 "130088"
    580 1 "130088"
    581 1 "130088"
    582 1 "130088"
    583 1 "130088"
    584 1 "130088"
    585 1 "130088"
    586 1 "130088"
    587 1 "130088"
    588 1 "130088"
    589 1 "130088"
    590 1 "130088"
    591 1 "130088"
    592 1 "130088"
    593 1 "130088"
    594 1 "130088"
    595 1 "130088"
    596 1 "130088"
    597 1 "130088"
    598 1 "130088"
    599 1 "130088"
    600 1 "130088"
    601 1 "130088"
    602 1 "130088"
    603 1 "130088"
    604 1 "130088"
    605 1 "130088"
    606 1 "130088"
    607 1 "130088"
    608 1 "130088"
    609 1 "130088"
    610 1 "130088"
    611 1 "130088"
    612 1 "130088"
    613 1 "130088"
    614 1 "130088"
    615 1 "130088"
    616 1 "130088"
    617 1 "130088"
    618 1 "130088"
    619 1 "130088"
    620 1 "130088"
    621 1 "130088"
    622 1 "130088"
    623 1 "130088"
    624 1 "130088"
    625 1 "130088"
    626 1 "130088"
    627 1 "130088"
    628 1 "130088"
    629 1 "130088"
    630 1 "130088"
    631 1 "130088"
    632 1 "130088"
    633 1 "130088"
    634 1 "130088"
    635 1 "130088"
    636 1 "130088"
    637 1 "130088"
    638 1 "130088"
    639 1 "130088"
    640 1 "130088"
    641 1 "130088"
    642 1 "130088"
    643 1 "130088"
    644 1 "130088"
    645 1 "130088"
    646 1 "130088"
    647 1 "130088"
    648 1 "130088"
    649 1 "130088"
    650 1 "130088"
    651 1 "130088"
    652 1 "130088"
    653 1 "130088"
    654 1 "130088"
    655 1 "130088"
    656 1 "130088"
    657 1 "130088"
    658 1 "130088"
    659 1 "130088"
    660 1 "130088"
    661 1 "130088"
    662 1 "130088"
    663 1 "130088"
    end
    format %tm mydate
    label values stock stock
    label def stock 1 "130088", modify



  • #2
    unfortunately "dataex" does not replicate the labels I see in the screen
    Of course not. It displays the data as it is stored in the dataset, so that the reader can recreated your data, and then, following the output of the example observations, gives commands to create and apply the labels assigned to the values in the observations. From your sample, we see that the label assigned to a value of 1 for the variable stock is "130088". This is analogous to the way the mydate variable is displayed as a SIF monthly date value, but the %tm format is given to display that SIF in a recognizable HRF.

    Your dataex output would have been more instructive had you used the if and in qualifiers to select the observations you showed in your screen shot.

    My guess is that you have applied the encode command to the variable Code to create the variable stock separately in each of the datasets before you did your mergeing. This is a recipe for disaster, because encode starts with a code of 1 and works its way up separately in each dataset, so the encoding is not consistent between datasets.

    The correct approach is to merge your datasets and then apply encode to the string variable in the fully-merged datasets.

    To answer your specific question,
    Code:
    // use a short sample of data from dataex, including the value label definitions
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float mydate double stock str6 Code
    564 1 "130088"
    565 1 "130088"
    end
    format %tm mydate
    label values stock stock
    label def stock 1 "130088", modify
    // now convert the ebcoded values back into strings
    decode stock, generate(Code2)
    list, clean
    Code:
    . list, clean
    
           mydate    stock     Code    Code2  
      1.   2007m1   130088   130088   130088  
      2.   2007m2   130088   130088   130088

    Comment


    • #3
      I am aware of the danger of encode, but thanks for the reminder! My dataset is highly dynamic (changes every day with live data...) and I need to check that all Excel spreadsheets have updated properly.

      That said, the solutions you provided is great, it really helps. Thanks!

      Comment


      • #4
        Francois Durant, how did you merge those several files?

        Comment

        Working...
        X