Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Import space-delimited file - how to adjust variable names one column to the left?

    I'm importing space-delimited txt files into stata. This is how the data looks like in a text editor.



    When I import the data using the following code:
    Code:
    import delimited "file.txt", delimiter(space, collapse) varnames(2) rowrange(5)
    The variables are mislabeled, due to the lack of a variable name for the 'hour' column.
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str10 date str5(sp10 dn10) str4 sg10 str5(at10 t25) float v7
    "01/01/2001" "24:00" "7.8" "183." "9.4"  "19.9" -99
    "02/01/2001" "24:00" "7.6" "185." "10.7" "19.4" -99
    "03/01/2001" "24:00" "7.3" "189." "8.6"  "19.5" -99
    "04/01/2001" "24:00" "6.2" "169." "12.0" "20.2" -99
    end
    Is there a way to correct this? I have attempted to adapt a solution using macros based on this https://www.statalist.org/forums/for...he-next-column but have failed to do so. Would it involve reordering the columns to call the other column?

    Appreciate any advice. Thanks!
    Last edited by Al Ko; 14 Mar 2022, 02:45.

  • #2
    This is what I attempted to do:
    Code:
    ds
    local vbles `r(varlist)'
     
    tokenize `vbles'
    
    drop `2'
    
    while "`1'" != "" {
        rename `3' `2'
        macro shift 1
    }
    I get a syntax error on the rename, but it does do the renaming. Would it be possible to add the third row as a variable label?
    Last edited by Al Ko; 14 Mar 2022, 03:22.

    Comment


    • #3
      Code:
      ds date v7, not
      rename (*) (date hour `r(varlist)')

      Comment


      • #4
        Dear Stata members,

        Sorry for revisiting this again, I have a large text data file and I want to import it,


        File specs:

        - Text Document (.txt)
        - Size: 17GB
        - Variables : 20 variable
        - observations : 20 million

        Comp specs:
        - Core i5
        - 16.0 GB RAM
        - Stata/SE 17.0

        I tried to import this file using import delimited command , it takes about 15 mins and at the end stata crashed and the following messages appeared





        Note: Unmatched quote while processing row 36851985; this can be due to a formatting problem in the file or because a quoted data
        element spans multiple lines. You should carefully inspect your data after importing. Consider using option bindquote(strict)
        if quoted data spans multiple lines or option bindquote(nobind) if quotes are not used for binding data.


        insufficient memory
        Stata could not obtain sufficient memory either because set max_memory is set too low or because the operating system said
        no. Stata needed more memory to process character strings, probably strLs.

        To contact Technical Services, see https://www.stata.com/tech-support/contact/
        r(9377);





        Could anyone help me to solve this issue?

        Comment


        • #5
          You have two issues. The suggestion in the error message to use "bindquote(nobind)" is probably on point. If set then the end of line will terminate any unmatched quotes. That may give you missing data for that observation, but reading can continue with the remaining observations.

          For the space problem, you have to sacrifice some of the observations using the "rowrange" option or some of the variables using the "colrange" option to the -import delimited- command. -encode- and -compress- might help with this. Note that if your text file is 17GB, it isn't surprising that you run out of space on a 16GB computer.

          Comment


          • #6
            thanks for your reply and your help,

            Could you please give more explanation about the second point as I use the "Menu" bar , how to sacrifice some observations by using -encode- and -compress-? can you provide the code or help me by an example

            Comment

            Working...
            X