Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging two files, id variable long format in one dataset and str8 format in the other!

    Hi everyone,
    I try to merge two files with an id variable which has a long format in one dataset and str8 format in the other. The id variable contains nonnumeric characters so I tried to convert the id variable in the first dataset from numeric to string using tostring command but I got values of 1, 2, 3 etc!
    Any help is appreciated.
    Thanks

  • #2
    Can you show us a snippet of data? What you describe sounds like you went about it correctly, but somehow something odd happened. Use the -list- command, and please paste into code tags (hit the "A" above and to the right of where you're typing a post, then the "#" sign to get code tags).

    Comment


    • #3
      Code:
      list cusip8 strcusip8 in 1/10
      
           +---------------------+
           |   cusip8   strcus~8 |
           |---------------------|
        1. | 00002130          1 |
        2. | 00002130          1 |
        3. | 00002130          1 |
        4. | 00002130          1 |
        5. | 00002130          1 |
           |---------------------|
        6. | 00002130          1 |
        7. | 00002130          1 |
        8. | 00020910          2 |
        9. | 00020910          2 |
       10. | 00036020          3 |
           +---------------------+
      Thanks Ben.

      Comment


      • #4
        Does cusip8 have value labels attached?

        Note that long and str8 are storage types, not (display) formats.

        Comment


        • #5
          Code:
          describe cusip8
          
                        storage   display    value
          variable name   type    format     label      variable label
          -------------------------------------------------------------------------------------------------------------
          cusip8          long    %8.0g      cusip8s    cusip8
          
          .
          Yes, cusip8 has a value label as shown. Does it matter!
          Thanks.

          Comment


          • #6
            Now I'm really confused. If they're already in the same file, why merge? I was expecting two snippets of data, one from one, and one from the other. Seems like they have a 1:1 correspondence; all 00002130 correspond to 1, all 0002910 correspond to 2, etc. Hmm. Why are we merging?
            Last edited by ben earnhart; 05 Nov 2014, 10:01.

            Comment


            • #7
              Sorry I confused you. But yes, I'm merging two files but I copied just one snippet because I thought that if I can convert the cusip8 in the first dataset to a string variable, I'll have no trouble doing the merge with the second file (which already contains cusip8 in string format).

              Comment


              • #8
                On #5: Yes, it matters. If the real information is in the value labels, you need decode, not tostring. That's the reason you got results like 1, 2, etc.

                (Once again, string here connotes storage type, not display format. Learning the right terminology will help you understand Stata better, and help others to understand your questions.)

                Comment


                • #9
                  Thanks for correcting me Prof. Cox. And yes I found another post of you on the same subject and you recommended decode instead of tostring. And it worked. Thanks a lot.

                  Comment


                  • #10
                    Code:
                    describe gvkey
                    
                                  storage   display    value
                    variable name   type    format     label      variable label
                    -------------------------------------------------------------------------------------------------------------
                    gvkey           str6    %9s                   gvkey
                    Sorry, I'm having a similar issue now. I have a string variable which is gvkey, it is indeed a numeric variable. So I tried to use destring it since it doesn't have a value label. But when the new numeric variable is generated, the zeros in the beginning are deleted! Example: gvkey is 023052, the numeric gvekey generated is 23052.
                    Thanks for your help.

                    Comment


                    • #11
                      Not enough information here for any advice but "Don't do that then". If you wish to match keys with leading zeros, they must both be string.

                      Comment


                      • #12
                        Originally posted by Nick Cox View Post
                        Not enough information here for any advice but "Don't do that then". If you wish to match keys with leading zeros, they must both be string.
                        Thanks Prof. Cox, but the main issue is that I need to destring the gvkey variable (which does not have a value label) to a numeric variable with preserving all the numeric characters in it. So, is there an option in destring command to preserve all the characters! Or is there another command! Thanks.

                        Comment


                        • #13
                          I don't understand the premise here that you need to destring. You're assuming a solution without explaining the problem.

                          It seems that you have two files and gvkey is an identifier in one, perhaps both. What was said in my last post is true. If you wish to match keys with leading zeros, they must both be string.

                          We need much more information on your variables in two datasets to be able to help you.


                          Comment

                          Working...
                          X