Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • .dta file corrupt. The file unexpectedly ended before it should have. r(612);

    Hello everyone,
    I have a large dataset and after some cleaning I tried to save it to the directory. It saves normally but when i try to use it again the error below shows:

    . use employee_national
    .dta file corrupt
    The file unexpectedly ended before it should have.

    I tried adding space and a comment as I have seen others do but with no success. Yesterday the code ran perfectly and I was able to use the dataset. I do not know what happened today. I also tried restarting my computer. Note that I am working on Harvard's Research Computing environment.
    I do not know if this helps but please find my full code below:

    *Equal Opportunity Index
    *GOSI analysis
    *Employment by Industry- By gender and nationality
    *February 19th, 2018
    *Chaza Abou Daher
    . clear
    . cd "/nfs/home/C/cha022/GOSI-Index"
    . use "/nfs/home/C/cha022/shared_space/ci3_nali/GOSI_2016.dta"
    *opening the log
    . log using "Employee.txt", text replace
    *remove missing employers and industries
    . gen missemployer= (owner_id_700==.)
    . tabulate missemployer
    . drop if missemployer==1
    *95500 observations were dropped, for having no company name
    . gen missindustry= (activitysubgroup==.)
    . tabulate missindustry
    . drop if missindustry==1
    *212 more observations were dropped, for having no industry name
    . duplicates drop
    *4,693,171 observations deleted, for being duplicates across all variables
    . decode activitysubgroup, gen (subgroupname)
    . tostring owner_id_700, gen (owner_id_str) format (%17.0g)
    . gen emp_dura=end_date-start_date if !missing(end_date)
    . gen ongoing= (emp_dura==.)
    *generate dummy variable for saudi non saudi nationality
    . gen saudinational=0
    . replace saudinational=1 if nationality==1
    *salary growth by nationality by employer
    . gen saudisalarygrowth= (saudis_salary_2016 - saudis_salary_2009) / saudis_salary_2009
    . gen nonsaudisalarygrowth= (nonsaudis_salary_2016 - nonsaudis_salary_2009) / nonsaudis_salary_2009
    *salary growth by employee
    . gen salarygrowth= (salary_2016 - salary_2009) / salary_2009
    *employee growth by nationality by employer
    . gen saudigrowth= (saudis_2016 - saudis_2009)/ saudis_2009
    . gen nonsaudigrowth= (nonsaudis_2016 - nonsaudis_2009) / nonsaudis_2009
    *dentify one time employees, employees that changed employment within the same company and employees that changed employment across companies
    . duplicates tag id, generate (dup_employee)
    . duplicates tag owner_id_700 id, generate (dup_employee_employer)
    . duplicates tag owner_id_700 id occupation, generate (dup_employee_employer_posit)
    *delete all observations with employment duration less than a month
    . drop if emp_dura < 30
    *delete all observations with salary_2016 and salary_2012 less or equal to 1000 SAR
    . drop if salary_2016 <=1000
    . drop if salary_2012 <=1000
    *add current employees by nationaliy
    . gen ongoing_saudi= ongoing * saudinational
    * specify salary by gender in 2016
    . gen salary_fem_2016= gender * salary_2016 if gender==2
    . gen salary_mal_2016= gender * salary_2016 if gender==1
    . save employee_national, replace
    *closing the log
    . log close
    *end of dofile

  • #2
    Administrative bump, post was flagged as spam.

    Comment


    • #3
      There is noting obviously wrong with your code. And even if there were, Stata should be able to read a data set it has saved, even if the data cleaning produced incorrect data. I think you need help from technical support. The question is whether this is a Stata problem (in which case you need help from Stata technical support) or a hardware problem (in which case you need help from your institution's technical support). Are you encountering difficulty opening any other files? Is the problem only with Stata, or are other applications experiencing difficulties as well?

      Have you updated Stata? Have you tried reinstalling Stata? (You may not be able to do these things yourself in a network environment; you might have to ask your network technical support to try those things for you.)

      If you just list the file from your directory using your operating system's file management system, do the dates for created and last accessed correspond to when you created the file and last accessed it yourself?

      Try -hexdump-ing the file. First -hexdump- a file that Stata has no problem reading, perhaps the auto.dta, so you can see what an uncorrupted .dta file looks like. Then try -hexdump empoyee_national- to see if you can see a problem yourself.

      What I've proposed here are some scattershot efforts to pin down the source of the problem. If it is a Stata problem that doesn't go away with an update or fresh installation, it's going to be something deep in the Stata code and you will need to consult StataCorp's technical support. I suspect it's more likely to be a hardware problem or a network problem.



      Comment


      • #4
        Clyde's list is very full, but here's another:

        If I type a .dta dataset file the starting material does usually confirm that I have what I want to have.

        Code:
        . type auto.dta
        <stata_dta><header><release>117</release><byteorder>LSF</byteorder><K>..</K><N>J...</N>
        > <label>.1978 Automobile Data</label><timestamp>.13 Apr 2016 17:45</timestamp></header
        It does happen, however, that someone gets confused and thinks (e.g.) that saving with .dta extension is enough white magic when all along the file is something quite different.

        So, have a look at the file as typed in Stata to save yourself some embarrassment.

        Equivalently, have a look at the file in your favourite text editor.

        Comment


        • #5
          Thank you both for your great advice! much appreciated

          Comment

          Working...
          X