Dear Stata Users,
I am encountering an issue I have not been able to explain and would appreciate your insights (details/example below).
I am working with a large dataset (~20 million observations) that includes a variable d10, of type byte (%8.0g), which contains some missing values. After saving and reopening the dataset, there are new values (and a different distribution of d10).
Has anyone experienced something similar or can suggest a possible explanation?
Many thanks in advance,
Francesco
PS: If repeated, the distribution changes again.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tab d10, m /*only frequencies*/
d10 | Freq.
------------+-----------------------------------
1 | 1,831,686
2 | 1,769,036
3 | 1,934,389
4 | 1,914,070
5 | 1,894,011
6 | 2,044,012
7 | 2,180,404
8 | 2,192,355
9 | 2,047,698
10 | 1,734,772
. | 37,855
------------+-----------------------------------
Total | 19,580,288
save "dataset0_test.dta", replace
use "dataset0_test.dta", clear
tab d10, m
d10 | Freq.
------------+-----------------------------------
0 | 409,325
1 | 1,793,906
2 | 1,733,184
3 | 1,892,326
4 | 1,874,283
5 | 1,853,590
6 | 2,002,021
7 | 2,137,323
8 | 2,148,649
9 | 2,002,380
10 | 1,696,193
. | 37,108
------------+-----------------------------------
Total | 19,580,288
I am encountering an issue I have not been able to explain and would appreciate your insights (details/example below).
I am working with a large dataset (~20 million observations) that includes a variable d10, of type byte (%8.0g), which contains some missing values. After saving and reopening the dataset, there are new values (and a different distribution of d10).
Has anyone experienced something similar or can suggest a possible explanation?
Many thanks in advance,
Francesco
PS: If repeated, the distribution changes again.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tab d10, m /*only frequencies*/
d10 | Freq.
------------+-----------------------------------
1 | 1,831,686
2 | 1,769,036
3 | 1,934,389
4 | 1,914,070
5 | 1,894,011
6 | 2,044,012
7 | 2,180,404
8 | 2,192,355
9 | 2,047,698
10 | 1,734,772
. | 37,855
------------+-----------------------------------
Total | 19,580,288
save "dataset0_test.dta", replace
use "dataset0_test.dta", clear
tab d10, m
d10 | Freq.
------------+-----------------------------------
0 | 409,325
1 | 1,793,906
2 | 1,733,184
3 | 1,892,326
4 | 1,874,283
5 | 1,853,590
6 | 2,002,021
7 | 2,137,323
8 | 2,148,649
9 | 2,002,380
10 | 1,696,193
. | 37,108
------------+-----------------------------------
Total | 19,580,288
Comment