Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate/replace string variables

    I am using StataIC 14 and have a fairly large data set (6000 observations) for which I want to include latitude/longitude ('x' and 'y') based upon other geographic identifiers (house number). I have these data coupled in another, smaller dataset in microsoft excel and so am trying to use a formula as opposed to manually going through and inserting 'x' or 'y' such as :

    replace x=-43.027524 if house_num==700
    replace y=120.045524 if house_num==700

    However I consistently get the error:

    type mismatch
    r(109);

    My 'x' and 'y' variables are currently type 'strL' as is my house_num. I have tried to change the string type to 'str10' for x, or 'str3' for house_num but has made no difference.

    Any help would be appreciated,

    Cheers.

  • #2
    It's not what I would recommend for the coordinates, but if you insist on keeping them as strings, then you'll either need to enclose the numerals within double quotation marks or else use the string() function.
    Code:
    replace x = "-43.027524" if house_num == "700"
    replace y = string(120.045524,"%10.6f") if house_num == string(700, "%3.0f")
    There are good reasons for keeping an identifier variable as a string, but you might want to consider using the destring command for the coordinates and working with them as numeric.

    Comment


    • #3
      Thankyou for the code and the advice Joseph. All is working now.

      Cheers.

      Comment


      • #4
        Originally posted by Joseph Coveney View Post
        It's not what I would recommend for the coordinates, but if you insist on keeping them as strings, then you'll either need to enclose the numerals within double quotation marks or else use the string() function.
        Code:
        replace x = "-43.027524" if house_num == "700"
        replace y = string(120.045524,"%10.6f") if house_num == string(700, "%3.0f")
        There are good reasons for keeping an identifier variable as a string, but you might want to consider using the destring command for the coordinates and working with them as numeric.

        Hi, I have string variable bostonid, and I want to simplify them into containing either "boston, vir, or slc." Because of the large numbers (1980 observations), I sorted bostonid, and tried to replace them all with "boston" for in range of 1 to 56. Can you please see what went wrong?
        Attached Files

        Comment


        • #5
          Please see William's post (#4 in http://www.statalist.org/forums/foru...-combine-merge) alerting you to good practice here on asking good questions, advice that you are repeatedly asked to read before posting.

          Your screenshots here are unreadable by me and I suspect by many other people on the list.

          Comment


          • #6
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str16 bostonid
            ""               
            ""               
            ""               
            "db-bos-1066-001"
            "db-bos-1067-001"
            "db-bos-1097-001"
            "db-bos-1098-001"
            "db-bos-1099-001"
            "db-bos-1101-001"
            "db-bos-1113-001"
            "db-bos-1118-001"
            "db-bos-1120-001"
            "db-bos-1145-001"
            "db-bos-1172-001"
            "db-bos-1251-001"
            "db-bos-1251-001"
            "db-bos-1275-001"
            "db-bos-1480-001"
            "db-bos-1490-001"
            "db-bos-1500-001"
            "db-bos-1514-001"
            "db-bos-2022-001"
            "db-bos-2030-001"
            "db-bos-3472-001"
            "db-bos-3477-001"
            "db-bos-3489-001"
            "db-bos-3498-001"
            "db-bos-3499-001"
            "db-bos-3502-001"
            "db-bos-3507-001"
            "db-bos-3508-001"
            "db-bos-3511-001"
            "db-bos-3515-001"
            "db-bos-3518-001"
            "db-bos-3524-001"
            "db-bos-3528-001"
            "db-bos-3534-001"
            "db-bos-3537-001"
            "db-bos-3543-001"
            "db-bos-4002-001"
            "db-bos-4003-001"
            "db-bos-4008-001"
            "db-bos-4009-001"
            "db-bos-4010-001"
            "db-bos-4012-001"
            "db-bos-4013-001"
            "db-bos-4016-001"
            "db-bos-4019-001"
            "db-bos-4024-001"
            "db-bos-4036-001"
            "db-bos-4038-001"
            "db-bos-4042-001"
            "db-bos-4053-001"
            "db-bos-4059-001"
            "db-bos-4060-001"
            "db-bos-4062-001"
            "db-bos-4066-001"
            "db-bos-4069-001"
            "db-bos-4074-001"
            "db-slc-3550-001"
            "db-slc-3551-001"
            "db-slc-3552-001"
            "db-slc-3553-001"
            "db-slc-3554-001"
            "db-slc-3555-001"
            "db-slc-3557-001"
            "db-slc-3560-001"
            "db-slc-3561-001"
            "db-slc-3562-001"
            "db-slc-3563-001"
            "db-slc-3564-001"
            "db-slc-3565-001"
            "db-vir-3660-001"
            "db-vir-3664-001"
            "db-vir-3668-001"
            "db-vir-3670-001"
            "db-vir-3680-001"
            "db-vir-3682-001"
            "db-vir-3734-001"
            "db-vir-3735-001"
            end
            ------------------ copy up to and including the previous line ------------------

            Hi, above is how my variable bostonid looks like.
            I tried Joseph's recommendation but it doesn't work. I want to replace
            "db-slc-3560-001"
            "db-slc-3561-001"
            "db-slc-3562-001"
            "db-slc-3563-001"
            "db-slc-3564-001" to "slc"


            db-bos-3477-001"
            "db-bos-3489-001"
            "db-bos-3498-001" to "boston"

            I entered command:

            . replace bostonid = string(boston,"%16s") if bostonid in 1/56
            boston ambiguous abbreviation
            r(111);

            . replace bostonid = string(1,"%16s") if bostonid in 1/56
            type mismatch
            r(109);

            Any help will be much appreciated!

            Comment


            • #7
              Please don't post the same question in so many threads.

              From your examples, I guess that you want something like

              Code:
              replace bostonid = "boston" if substr(bostonid, 4, 3) == "bos"
              and

              Code:
              replace bostonid = "slc" if substr(bostonid, 4, 3) == "slc"
              Your code attempts are not close. Don't blame Joseph's recommendation: his code suggestions, for a different problem, don't resemble your attempts and are in fact closer to what you need.

              Among various problems

              Code:
               
              if bostonid
              is meaningless and thus illegal for a string variable: only numeric values can be true or false and thus the argument of an if qualifier.

              Code:
              string(1)
              would just be the string "1": I can't see why you think that would work. The string format "%16s" is, however, illegal as an argument to string() as only a numeric display format would be legal there.

              Code:
               string(boston)


              has Stata looking for a variable called boston, but with the consequence stated. Stata doesn't get as far as telling you about the same format problem.

              Comment


              • #8
                See also my further recommendations at this other post.

                Comment


                • #9
                  thank you very much!

                  Comment


                  • #10
                    I want to generate a new string variable for state abbreviations based upon state names. I have tried several alternatives, most recently based upon Nick Cox's post here:

                    replace cdnew = "AK" if substr(cd) == "Alaska"
                    invalid syntax
                    r(198);

                    I have generated cdnew as string and cd is in the data set as string with state names

                    What am I doing wrong?

                    Comment


                    • #11
                      What you are doing wrong is

                      1. What Stata is objecting to: substr(cd) == "Alaska" is an illegal use of substr(). That function requires 3 arguments, which also include the beginning position of the substring and how long it is.

                      2. Not showing us an example of cd so that we can make a better suggestion.

                      My guess is that you want AK as result whenever cd includes Alaska as a substring (which could mean the entirety of the string). If so, then one of various ways to do it is

                      Code:
                      replace cdnew = "AK" if strpos(cd, "Alaska")
                      which in turn is equivalent to

                      Code:
                      replace cdnew = "AK" if strpos(cd, "Alaska") > 0

                      Comment


                      • #12
                        Your use of the -substr()- function is the source of the syntax error. -substr()- requires three arguments. The first is the name of the string variable from which you want to extract the substring, the second is the position of the start of the substring in the string variable, and the third is the length of the substring. So you might want -...if substr(cd, 1, 6) == "Alaska"-. But I'm not sure -substr()- is even the right approach here. Without seeing example data, I can't know what is in the variable cd. But if cd contains the actual fullnames of states, then you would just want -if cd == "insert_name_of_state_here"-, because you would want to test for complete equality. Another possibility is that cd contains the name of the state somewhere within it, but not necessarily at the beginning, and there may be extraneous material after the state name as well. In that case, what you really need is the -strpos()- function: -replace cdnew = "AK" if strpos(cd, "Alaska")-.

                        Added: Crossed with #11.

                        Comment


                        • #13
                          I have tried:
                          replace cdnew = "AK" if strpos(cd, "Alaska") > 0
                          (0 real changes made)

                          . replace cdnew = "AK" if strpos(cd, "Alaska")
                          (0 real changes made)

                          and
                          replace cdnew = "AK" if strpos(cd, 1, 6) == "Alaska"
                          invalid syntax
                          r(198);
                          and:
                          . replace cdnew = "AK" if strpos(cd, "Alaska")
                          (0 real changes made)

                          The data set is attached.
                          Attached Files

                          Comment


                          • #14
                            You say the dataset is attached, but

                            1. People are asked not to post .dta attachments: please read https://www.statalist.org/forums/help#stata

                            2. There are no variables cd* in that dataset, so it doesn't directly illuminate the problem. This works fine

                            Code:
                             list if strpos(district, "Alaska")
                            so your problem appears to be different. One reason the code might fail would be that the text in question is, say, "ALASKA".

                            3. The dataset attached is malformed for Stata purposes as metadata appear in the first observation and as a side-effect all variables are string. It is probably simplest for you to repeat import excel or import delimited and flag that the first row of the data file is to be treated as indicating variable names.

                            Comment

                            Working...
                            X