Hi,
For a project, I would like to create a program resembling something like this:
What that program would do is, for the selected variables contained in the varlist and in the dataset file_to_recode, recode the values of varlist based on the instructions contained in another file called recode_instructions that would be defined at the value level (so one observation by variable value). The matching_table file would also be a third dataset containing the old variable names to be recoded and the new ones.
The reason I want to do this is that I have a vast number of datasets measuring the same things, but under different names, measurements, and categories. So, rather than doing a line-by-line recoding which can contain mistakes, I wanted to create this gigantic table called "recode_instructions" that would list, for every old variable value, its standardized value and name. It’s still done manually, but it seems safer and clearer to me to just call a command that does this based on an annex dataset.
I do not have the data I’m working on yet; however, I wanted to discuss this plan with you because, as experienced users, you might know potential (unavoidable?) obstacles that will arise in this job. First of all, before going any further, do you know if any community-contributed command does this kind of recoding task based on a table? If not, then I think there would be a need for this command for data cleaners. Does this seem like a hard job to do?
I am just asking for broad insights and remarks and not necessarily for code as this is the very first step of my project. Any comment would be greatly appreciated
For a project, I would like to create a program resembling something like this:
Code:
name_command varlist file_to_recode recode_instructions matching_table
The reason I want to do this is that I have a vast number of datasets measuring the same things, but under different names, measurements, and categories. So, rather than doing a line-by-line recoding which can contain mistakes, I wanted to create this gigantic table called "recode_instructions" that would list, for every old variable value, its standardized value and name. It’s still done manually, but it seems safer and clearer to me to just call a command that does this based on an annex dataset.
I do not have the data I’m working on yet; however, I wanted to discuss this plan with you because, as experienced users, you might know potential (unavoidable?) obstacles that will arise in this job. First of all, before going any further, do you know if any community-contributed command does this kind of recoding task based on a table? If not, then I think there would be a need for this command for data cleaners. Does this seem like a hard job to do?
I am just asking for broad insights and remarks and not necessarily for code as this is the very first step of my project. Any comment would be greatly appreciated

Comment