Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Values of variable changed after saving and reopening .dta file

    Dear Stata Users,

    I am encountering an issue I have not been able to explain and would appreciate your insights (details/example below).

    I am working with a large dataset (~20 million observations) that includes a variable d10, of type byte (%8.0g), which contains some missing values. After saving and reopening the dataset, there are new values (and a different distribution of d10).

    Has anyone experienced something similar or can suggest a possible explanation?

    Many thanks in advance,
    Francesco

    PS: If repeated, the distribution changes again.


    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    tab d10, m /*only frequencies*/

    d10 | Freq.
    ------------+-----------------------------------
    1 | 1,831,686
    2 | 1,769,036
    3 | 1,934,389
    4 | 1,914,070
    5 | 1,894,011
    6 | 2,044,012
    7 | 2,180,404
    8 | 2,192,355
    9 | 2,047,698
    10 | 1,734,772
    . | 37,855
    ------------+-----------------------------------
    Total | 19,580,288

    save "dataset0_test.dta", replace


    use "dataset0_test.dta", clear
    tab d10, m

    d10 | Freq.
    ------------+-----------------------------------
    0 | 409,325
    1 | 1,793,906
    2 | 1,733,184
    3 | 1,892,326
    4 | 1,874,283
    5 | 1,853,590
    6 | 2,002,021
    7 | 2,137,323
    8 | 2,148,649
    9 | 2,002,380
    10 | 1,696,193
    . | 37,108
    ------------+-----------------------------------
    Total | 19,580,288

  • #2
    Can you try:

    Code:
    use "dataset0_test.dta", clear
    cf d10 using dataset0_test.dta
    save "dataset0_test.dta", replace
    use "dataset0_test.dta", clear
    cf d10 using dataset0_test.dta
    Last edited by Hua Peng (StataCorp); 04 Aug 2025, 14:06.

    Comment


    • #3
      Thanks for your suggestion, and apologies for the delayed response.

      This issue has also affected a few colleagues recently, but the good news is that we are no longer able to reproduce it. It appears the problem was linked to recent changes at the university IT system level, specifically during the data migration from the old to the new infrastructure.

      Now that the migration is complete, the issue seems to have resolved itself.

      Thanks again,
      Francesco

      Comment

      Working...
      X