Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Values of variable changed after saving and reopening .dta file

    Dear Stata Users,

    I am encountering an issue I have not been able to explain and would appreciate your insights (details/example below).

    I am working with a large dataset (~20 million observations) that includes a variable d10, of type byte (%8.0g), which contains some missing values. After saving and reopening the dataset, there are new values (and a different distribution of d10).

    Has anyone experienced something similar or can suggest a possible explanation?

    Many thanks in advance,
    Francesco

    PS: If repeated, the distribution changes again.


    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    tab d10, m /*only frequencies*/

    d10 | Freq.
    ------------+-----------------------------------
    1 | 1,831,686
    2 | 1,769,036
    3 | 1,934,389
    4 | 1,914,070
    5 | 1,894,011
    6 | 2,044,012
    7 | 2,180,404
    8 | 2,192,355
    9 | 2,047,698
    10 | 1,734,772
    . | 37,855
    ------------+-----------------------------------
    Total | 19,580,288

    save "dataset0_test.dta", replace


    use "dataset0_test.dta", clear
    tab d10, m

    d10 | Freq.
    ------------+-----------------------------------
    0 | 409,325
    1 | 1,793,906
    2 | 1,733,184
    3 | 1,892,326
    4 | 1,874,283
    5 | 1,853,590
    6 | 2,002,021
    7 | 2,137,323
    8 | 2,148,649
    9 | 2,002,380
    10 | 1,696,193
    . | 37,108
    ------------+-----------------------------------
    Total | 19,580,288

  • #2
    Can you try:

    Code:
    use "dataset0_test.dta", clear
    cf d10 using dataset0_test.dta
    save "dataset0_test.dta", replace
    use "dataset0_test.dta", clear
    cf d10 using dataset0_test.dta
    Last edited by Hua Peng (StataCorp); 04 Aug 2025, 14:06.

    Comment

    Working...
    X