After following the recent interesting thread about hashing an ID string, I wanted to post something of a discussion query about encryption in Stata. I'd be interested to hear what others do about encryption, and get some comment about whether having a user-written module for encryption might be convenient.
I've met my own past needs for encryption by sorting the data file into random order, assigning a sequential id from that order, removing the true id, saving a linkage file containing the actual and random ids, and encrypting that linkage file with some non-Stata utility program. Once, I worked with some U.S. federal government educational data in which we were required to store some of the data on separate media, in encrypted form, and only use it with a "decrypt on the fly" utility.
Despite the existence of these solutions, I've had the thought that having some kind of encryption capacity as a command within Stata would be convenient. For my own amusement, I wrote a small and simple program (using Mata) to use Stata to encrypt an entire data set using a pad file containing random bytes as long as the file to be encrypted, which was fast and presumably quite hard to break. A similar program could presumably be used to encrypt a single variable. I'd also think that it would be possible to use Stata to call some external program (e.g., a compression utility with AES-256 encryption capacity), but I haven't looked at that seriously. I'd also suppose one could use some existing Python code to do encryption, giving a platform-independent feature from within Stata, but I'm not knowledgeable about that.
So, I'm posting to hear what people think, and possibly to stimulate someone with expertise to think about creating a user-written module, if that would be relevant.
I've met my own past needs for encryption by sorting the data file into random order, assigning a sequential id from that order, removing the true id, saving a linkage file containing the actual and random ids, and encrypting that linkage file with some non-Stata utility program. Once, I worked with some U.S. federal government educational data in which we were required to store some of the data on separate media, in encrypted form, and only use it with a "decrypt on the fly" utility.
Despite the existence of these solutions, I've had the thought that having some kind of encryption capacity as a command within Stata would be convenient. For my own amusement, I wrote a small and simple program (using Mata) to use Stata to encrypt an entire data set using a pad file containing random bytes as long as the file to be encrypted, which was fast and presumably quite hard to break. A similar program could presumably be used to encrypt a single variable. I'd also think that it would be possible to use Stata to call some external program (e.g., a compression utility with AES-256 encryption capacity), but I haven't looked at that seriously. I'd also suppose one could use some existing Python code to do encryption, giving a platform-independent feature from within Stata, but I'm not knowledgeable about that.
So, I'm posting to hear what people think, and possibly to stimulate someone with expertise to think about creating a user-written module, if that would be relevant.
Comment