Something has gone awry with the Forum software in the above captioned thread. It is apparently not possible to make additional posts to it. So I am resuming her thread here, with a response to her #8 at https://www.statalist.org/forums/for...-panel-dataset.
Emily has in fact identified the problem herself. It is a precision problem. The school ID variables she shows in her new (made-up) data example are 11 digits long. That is way to long to fit into either a float or a long. These variables need to be doubles. The "random" changes in the low order digits she is getting arise from the fact that the code that Nick and I have offered create intermediate school id containing variables that are created as floats (by default since no type was specified). Since these numbers are too large to fit into a float, they are instead created as the nearest rounded off binary value that will fit. The solution is to make sure that all of the variables are created as doubles (which can handle up to 16 decimal digits.)
will resolve this problem with my code. Similar changes to Nick's code will resolve the problem with his.
I understand the problem with showing example data from confidential data sources. In general, the exact values of the variables are not needed for most of the problems encountered on Statalist. But the structure of the data is critically important. And data storage types are part of that. When this thread began, O.P. exhibited a data tableau showing 5 digit school IDs. Admittedly, when thinking about the particular question posed, it would not immediately jump to my mind that the difference between 5 digit and 11 digit IDs would be salient, let alone crucial. But it turned out to be so. So the general moral of the story is: if you have to make up data when posting an example, do your best to make the made-up data "look like" the real data.
Emily has in fact identified the problem herself. It is a precision problem. The school ID variables she shows in her new (made-up) data example are 11 digits long. That is way to long to fit into either a float or a long. These variables need to be doubles. The "random" changes in the low order digits she is getting arise from the fact that the code that Nick and I have offered create intermediate school id containing variables that are created as floats (by default since no type was specified). Since these numbers are too large to fit into a float, they are instead created as the nearest rounded off binary value that will fit. The solution is to make sure that all of the variables are created as doubles (which can handle up to 16 decimal digits.)
Code:
egen double mode_school = mode(school_id) by id (wave), sort: gen double last_school = school_id if _n == 1 by id (wave): replace last_school = cond(missing(school_id), last_school[_n-1], /// school_id) if _n > 1 by id (wave): replace mode_school = last_school[_N] if missing(mode_school)
I understand the problem with showing example data from confidential data sources. In general, the exact values of the variables are not needed for most of the problems encountered on Statalist. But the structure of the data is critically important. And data storage types are part of that. When this thread began, O.P. exhibited a data tableau showing 5 digit school IDs. Admittedly, when thinking about the particular question posed, it would not immediately jump to my mind that the difference between 5 digit and 11 digit IDs would be salient, let alone crucial. But it turned out to be so. So the general moral of the story is: if you have to make up data when posting an example, do your best to make the made-up data "look like" the real data.
Comment