Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Importing files into a coloumn

    Hi felllow stata users,

    I am trying to write a code to import around 100 text files with unfomatted data into column. Can someone please help me with this code. I have tried ChatGPT and a few other sources but it hasn't worked. Ideally, I would like to get all files into the column at once. Also, could you please indicate how long it will take approximately to complete once I have entered the code.

    Here's the code I am trying

    local folder "C:\Users\Documents\Stata" // Specify the folder where the files are located
    local filelist : dir "`folder'" files "*.txt" // List all files in the folder with the extension .txt

    gen myvar = "" // Create an empty variable to store the file contents

    foreach file of local filelist {
    file open myfile using "`folder'\`file'", read // Open each file for reading
    file read myfile mydata // Read a line from the file
    file close myfile // Close the file

    replace myvar = "`myvar' `mydata'" // Append the file contents to the myvar variable
    }


    Look forward to hearing from the community.

  • #2
    Things to think about:
    1. Do you want the whole file? Your code seems interested in only the first line. Don't you want to loop over the lines in each file? See: https://stats.oarc.ucla.edu/stata/fa...nal-text-file/
    2. myvar is going to be a single very wide observation. Is that what you want? That isn't what I would think of as all the data in a column.

    Comment


    • #3
      Dear Daniel,

      Thank you for replying and sharing the link. I want the whole file. I have generated three other variables which are ID, Company name, Name of the file, and Contracts - the coloumn in which I want import the text from all the .txt files. There are hundred files so I thought it'd generate 100 observations under myvar, would that happen with the code I have? Sorry, I have been mainly a GUI user for quantitative studies but this is a text analysis project so I am struggling with codes.
      Last edited by Moulik Zaveri; 24 Jun 2023, 07:07.

      Comment


      • #4
        You would need
        Code:
        set obs 100
        local n=0
        before you starting reading from the file. That way you have some way to put data from each company into a separate observation in the workspace. The -file- command doesn't keep track of observation numbers, you have to do that yourself. Then put

        Code:
        local n=n+1
        in the loop over companies and

        Code:
        local contacts `contacts' " " `myvar'
        in the loop over lines allows you to accumulate multiple lines into one line of the local macro "contacts" and
        Code:
        replace contacts[n]=`contacts'
        lets you put the long string into the workspace as observation n.

        Try building up the program one step at a time. Once you have it working you will be a Stata expert.

        Code:
        set trace on
        will help.

        Comment


        • #5
          Hi Daniel

          Thanks a lot for the tips and guidance. It is much appreciated.

          I think I am not doing something right. Would it be too much to ask if I requested you to send me the exact code from scratch on what I'm trying to achieve. I have been trying different combinations and options but not getting anywhere. I would keep trying if I didn't have a deadline. I have attached my original .do file before your advice. But when I plug in the codes you recommended, I am getting errors, maybe because I am not doing it right. For instance, when I put local n=0 and local n=n+1, I am getting an error "n not found". Plus I am still stuck with how to get Stata to import the files in the column and read them. Sigh!! Please help!
          Attached Files

          Comment


          • #6
            Moulik Zaveri could you show one of your text files as an example? and specify in more general terms what you are trying to achieve? Reading the data into one column may or may not be what you actually want to do.

            Comment


            • #7
              Hemanshu Kumar thank you for reaching out. We are doing a project in which we are analyzing contracts, which are text files that contains unformatted text data such as headings, subheadings and paragraphs after paragraphs. Think of them user agreement statements/contracts. I can't share the text file due to confidentiality agreement. But these text files are not delimited or fixed format files. We have 100 of these contracts from 100 firms. My PI has setup a Stata file which has ID (number), Firm name, File name, and Contract as variables. All three except ID are string variables. I am required to import the text files into the Contracts column.

              Comment


              • #8
                You might want to do something like this:

                Code:
                cd "C:\Users\mzaveri\Contract Text Files"
                local filelist: dir "." files "*.txt"
                local numfiles: word count `filelist'
                
                //Create empty datafile and variables
                set obs `numfiles'
                gen `c(obs_t)' id = _n
                gen str firnname = ""
                gen str filename = ""
                gen strL contract = ""
                
                //Import contracts to variable contract
                
                tempname myfile
                local fn = 1
                foreach file of local filelist {
                    replace filename = `"`file'"' in `fn'
                    file open `myfile' using "`file'", read text
                    file read `myfile' mydata
                    while r(eof) == 0 {
                        replace contract = contract + `" `mydata'"' in `fn'
                        file read `myfile' mydata
                    }
                    file close `myfile'
                    local ++fn
                }
                Last edited by Hemanshu Kumar; 26 Jun 2023, 01:54.

                Comment


                • #9
                  Glimpsing over this, also see

                  Code:
                  help fileread()

                  Comment


                  • #10
                    daniel klein 's suggestion in #9 makes the code even simpler:

                    Code:
                    cd "C:\Users\mzaveri\Contract Text Files"
                    local filelist: dir "." files "*.txt"
                    local numfiles: word count `filelist'
                    
                    //Create empty datafile and variables
                    set obs `numfiles'
                    gen `c(obs_t)' id = _n
                    gen str firnname = ""
                    gen str filename = ""
                    gen strL contract = ""
                    
                    //Import contracts to variable contract 
                    forval fn = 1/`numfiles' {
                        local file: word `fn' of `filelist'
                        replace filename = `"`file'"' in `fn'
                        replace contract = fileread(`"`file'"') in `fn'
                    }

                    Comment


                    • #11
                      Daniel Feenberg daniel klein and Hemanshu Kumar thank you so much for your inputs and advises. It is much appreciated.

                      Hemanshu Kumar that final simplified code worked like a magic. Thanks a ton, you have saved me a lot of time, hassle and trouble. Thank you!!!

                      Comment

                      Working...
                      X