No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Import delimited not importing all observations

    My colleagues and I are having issues importing a number of pipe (“|”) delimited .txt files of 5-10GB in size. The data are stored on a remote server and we are using a remote desktop connection to access a network copy of Stata/IC 15.1 64bit installed on a different server in the same virtual network. The OS is Windows Server 2016 Standard configured as RDS VM in Azure

    The issue we are having is that when we import the datasets into Stata it appears Stata is occasionally not reading in all observations. This is occurring sporadically as on some occasions it imports the appropriate number of observations whereas on others it does not. We have not been able to identify any common factor underlying the times we get fewer observations.

    As an example I have shown the code below in which we sought to import the same file, which we know to contain 101,809,217 observations, twice:
    . import delimited using "`filename’", clear delimiters("|")
    (10 vars, 101,809,217 obs)
    . import delimited using "`filename’", clear delimiters("|")
    (10 vars, 39,894,187 obs)
    Does anyone know why this is occurring?


  • #2
    My guess here would be that there's some kind of timing-related data transfer across a network issue that is affecting Stata's interaction with the server's provision of read services. I'm not an expert here, but I would guess that the server engages in various file buffering practices, especially with files this large, that could lead to problems. I wonder if you could get around this problem by using the rowrange() option on -import delimited- to read smaller chunks of the whole file and save them as Stata files (possibly using tempfiles) and then append them into one file. There's an existing program called -chunky- (see -ssc describe chunky-) that predated -import delimited-, but which does part of the housekeeping for this sort of thing.
    However, I'd say that a diagnosis of your problem is going to require help from Stata's tech support people.