Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • String Variables Not Visible?

    Hi,

    I am relatively new to using STATA so I apologize if this is something obvious.

    I have a variable that has values like "ypN0" "cM0" "4A" - so I coded this as a string variable.

    When I run the script, the columns for these variables come up entirely blank.

    Just to make sure there was actual data in those columns, I switched out "str" for "long" - and got a very long list of errors saying, for example, "cM0 cannot be read as a number".
    So I know the values are there, they're just not showing up.

    What other data type could it be? It's not numeric, and string isn't working out. I did try "str5" as well since the max amount of characters for this data is 5, but still blank.

    Thanks!

  • #2
    What "script"? Without knowing what you are doing, it is difficult to provide any insights.

    Comment


    • #3
      Originally posted by Jeph Herrin View Post
      What "script"? Without knowing what you are doing, it is difficult to provide any insights.
      Sorry, it's a script for pulling coded information from a database, so I have to create/identify all the variables their types, label them, etc. So I have a long list of variables, but there's about 4 variables (all the same type, they describe cancer staging) that should be strings that just aren't appearing at all.

      Example of a part of the list:

      str CASE_ID 1- 37 ///
      str FACILITY_ID 38- 47 ///
      byte FACILITY_TYPE_CD 48- 48 ///
      byte FACILITY_LOCATION_CD 49- 49 ///
      int AGE 50- 52 ///
      byte SEX 53- 53 ///

      And so on.

      But these cancer staging ones, ex: "str CANCER_STAGE #- # ///" just are not populating into STATA. I know the values are there because when I try to use float/int/long/byte, I get the error that it cannot be read as a number.

      I hope this is more helpful!

      Comment


      • #4
        It may just be the display format. Have you tried listing them?, e.g.,

        Code:
        list CANCER_STAGE* in 1/10

        Comment


        • #5
          Originally posted by Andrew Musau View Post
          It may just be the display format. Have you tried listing them?, e.g.,

          Code:
          list CANCER_STAGE* in 1/10
          I tried listing as well, but the results are still blank. I also tried some other simple commands like tabulating to see what would come up, but it just returns with "no observations".

          I do know that the data is there because I can see every entry when I try to run it as a byte (or other numeric variable type) - it will go through each entry and return with "cannot be read as a number" for each one.

          But when I have it listed as a string variable, no errors come up, the columns are just empty.

          Comment


          • #6
            I also tried some other simple commands like tabulating to see what would come up, but it just returns with "no observations"
            This suggests that the variable names were imported, but not their entries. You have to go back to the import and figure out what went wrong.

            I do know that the data is there because I can see every entry when I try to run it as a byte (or other numeric variable type) - it will go through each entry and return with "cannot be read as a number" for each one.
            Not sure what you mean here. Is this when importing the variables? Can you show the exact commands that you ran?
            Last edited by Andrew Musau; 30 Jun 2021, 11:46.

            Comment


            • #7
              Originally posted by Andrew Musau View Post

              This suggests that the variable names were imported, but not their entries. You have to go back to the import and figure out what went wrong.



              Not sure what you mean here. Is this when importing the variables? Can you show the exact commands that you ran?
              Yes sorry, when importing the variables.
              All I theoretically should have to do to import the data for this variable is:

              infix str CANCER_STAGE 1- 37 ///
              using "C:\Users\me\datafile.DAT"

              label var CANCER_STAGE "Cancer Stage"


              When that didn't work out, I went back and edited it as:

              byte CANCER_STAGE 1- 37 ///

              Just to see what errors/entries would appear as a byte instead of a string.

              When I run the script, I get at least a few thousand lines of:
              'ypT1' cannot be read as a number for CANCER_STAGE[333294]
              'ypT3' cannot be read as a number for CANCER_STAGE[333440]
              'ypT1' cannot be read as a number for CANCER_STAGE[333452]
              'ypT3' cannot be read as a number for CANCER_STAGE[333456]
              'ypT0' cannot be read as a number for CANCER_STAGE[333457]
              'ypT3' cannot be read as a number for CANCER_STAGE[333458]
              for every entry.

              I want to be able to see those "ypT#" values, that's exactly what I'm looking for - but they're just not there as a string, I only know they exist in the data set because of this error message.

              I hope that makes sense. Since I'm new, my lingo isn't great.




              Comment


              • #8
                That is strange. Is it a regular text file? If so, what do you get running the following?

                Code:
                import delimited using "C:\Users\me\datafile.DAT", clear rowrange(333294:333458)
                Last edited by Andrew Musau; 30 Jun 2021, 12:43.

                Comment


                • #9
                  In case you are working with data from a SQL database, you can directly use ODBC based data loading.

                  I had used similar approach fro loading data from MySQL on a Win10 OS laptop,

                  Code:
                  version 12.1
                  clear
                  capture log close
                  log using DataLoadSDI.smcl, replace
                  
                  // List of DBs
                  local dblist " india1 india3  india4 india6 india7  india8  india9  " // NAMES OF MYSQL Databases
                   
                  foreach d in   `dblist' {
                      cd "Data"
                      odbc load, ///
                        exec("SELECT * FROM `d'.sdi") ///
                        conn("DRIVER={MySQL ODBC 5.3x64Unicode};SERVER=localhost;PORT=3308;DATABASE=`d';UID=root;PWD=root;") clear  sqlshow // MySQL ODBC 5.3x64Unicode is the name of the mySQL ODBC driver installed in the windows 10 laptop. The driver is called by its name
                        save sdi_`d'.dta, replace
                       count    
                  }
                  
                  // Create Shell / Empty DTA File
                  use sdi_india1.dta, clear
                  drop in 1/-1
                  save "india_sdi_merged.dta", replace
                  count
                  
                  // Append each database
                  foreach d in   `dblist'    {
                      use sdi_`d'.dta, clear
                      count
                      append using india_sdi_merged.dta
                      save india_sdi_merged.dta, replace
                  }
                  compress
                  count
                  
                  // DATA MANAGEMENT
                  rename FirstName FirstName_Respondent
                  rename MI MI_Respondent
                  rename LastName LastName_Respondent
                  rename PatientNUmber PatientNumber_Respondent
                  
                  // DEFINE Value Labels in another DO file and execute it
                  do "labels_demo.do"
                  
                  // APPLY value Labels
                  label value area lbl_Area
                  label value sex lbl_Sex
                  label value edu lbl_edu
                  
                  // FINAL SAVE
                  save india_sdi_merged.dta, replace
                  Also if there are multiple tables that require JOINs then rather than pass a complex query to ODBC LOAD, I had created MySQL Views in the database and then called the view directly using a simple Select * from ViewName query as shown above.

                  You can consult "help odbc" for import options that may be helpful or more applicable.


                  Hopefully this is helpful
                  Vivek
                  Stata 15.1 (MP 2 core)
                  https://www.epidemiology.tech/category/stata/
                  Google Scholar Profile

                  Comment


                  • #10
                    Also the issue may be leading or training spaces ?
                    Have you tried to trim the data
                    can you share output of

                    Code:
                    des CANCER_STAGE
                    codebook CANCER_STAGE
                    gen CANCER_STAGE_2 = ustrtrim(stritrim(CANCER_STAGE))

                    Stata 15.1 (MP 2 core)
                    https://www.epidemiology.tech/category/stata/
                    Google Scholar Profile

                    Comment


                    • #11
                      Originally posted by Vivek Gupta View Post
                      Also the issue may be leading or training spaces ?
                      Have you tried to trim the data
                      can you share output of

                      Code:
                      des CANCER_STAGE
                      codebook CANCER_STAGE
                      gen CANCER_STAGE_2 = ustrtrim(stritrim(CANCER_STAGE))
                      variable name type format label variable label
                      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                      CANCER_STAGE str15 %15s Cancer Stage

                      .
                      . codebook CANCER_STAGE

                      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                      CANCER_STAGE Cancer Stage
                      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                      type: string (str15), but longest is str0

                      unique values: 0 missing "": 7,514/7,514

                      tabulation: Freq. Value
                      7,514 ""

                      .
                      . gen CANCER_STAGE_2 = ustrtrim(stritrim(CANCER_STAGE))
                      (7,514 missing values generated)



                      --------------

                      The values generated were still blank -
                      So it seems like the issue is in "but longest is str0"? Is there a workaround for these seeming to be "str0"?
                      I'm not sure exactly what that means or how to fix it.

                      Comment


                      • #12
                        Just to update, I may be out of luck if it's genuinely a "str0" variable, which is what it seems like. I can't find any info on a workaround or way to recode/edit the values or variable?

                        Comment


                        • #13
                          if there isn't, there should be so I would write to tech-support; please report their response back here if you do write

                          Comment


                          • #14
                            I think Stata is having a problem reading your dataset; perhaps it is not pure text. The Stata hexdump command might help us figure out the problem. I'd like to see the output of the following two command when applied to your datafile.DAT; it will show us exactly what the first 100 bytes are.
                            Code:
                            . ls dumpexample.txt
                            
                            -rw-r--r--@ 1 lisowskiw  staff  41 Jul  1 11:34 dumpexample.txt
                            
                            . hexdump dumpexample.txt, to(100)
                                             |                                         |    character
                                             |           hex representation            |  representation
                                     address |  0 1  2 3  4 5  6 7  8 9  a b  c d  e f | 0123456789abcdef
                            -----------------+-----------------------------------------+-----------------
                                           0 | 3132 3334 3536 3738 395f 5468 6973 2069 | 123456789_This i 
                                          10 | 7320 615f 7465 7374 3536 3738 395f 3132 | s a_test56789_12 
                                          20 | 3334 3536 3437 3839 5f                  | 34564789_        
                            
                            .

                            Comment


                            • #15
                              Update! It is "fixed".
                              Thank you for all the help. All of the commands helped me pinpoint the problem and were very useful to learn as I get used to utilizing Stata.

                              I'm actually unsure at which point the problem was solved, because originally I was unable to see any values for the variable, but I also made a lot of edits in one go.

                              After that, "a few" values (or so I thought) populated after making various edits, so I was convinced I was still doing something wrong that could somehow be impeding all of the data getting imported.
                              Apparently it was a new variable introduced into the database, so even though I'm working with over 340,000 rows of data, there were only 8,000 values for this variable so far. A lot is just genuinely missing.


                              Comment

                              Working...
                              X