I have a dataset with 134 variables , 4,640 observations. The dataset is identified by more than one variable. How do I know which variables identify each observation in the dataset?
My main issue is there is no information about the structure of the dataset. I only know that several variables identify the observations in the dataset.
My plan is that if I know the combinations of these variables that identify the dataset, I reduce the identifying variables by collapsing to two important ones. Then, I will "xtset" the dataset using the two main variables and do some panel regression analysis.
So far, to find these variables, I run a command, for example, "duplicates report x1 x2 x3" where x1 x2 or x3 , etc.. are the variables that I am guessing to identify the dataset until I find no duplicate in the output of the command.
I also used "isid x1 x2 x3" to check whether any selected combination of variables identifies the dataset or not.
The issue with this trial and error is, I have several variables (134 variables in total). Therefore, it is very cumbersome, and I cannot easily find the right combination of variables that identify the observations in the dataset.
Please let me know if you have an easy way of solving this problem.
Thank you very much, and looking forward for your suggestions.
My main issue is there is no information about the structure of the dataset. I only know that several variables identify the observations in the dataset.
My plan is that if I know the combinations of these variables that identify the dataset, I reduce the identifying variables by collapsing to two important ones. Then, I will "xtset" the dataset using the two main variables and do some panel regression analysis.
So far, to find these variables, I run a command, for example, "duplicates report x1 x2 x3" where x1 x2 or x3 , etc.. are the variables that I am guessing to identify the dataset until I find no duplicate in the output of the command.
I also used "isid x1 x2 x3" to check whether any selected combination of variables identifies the dataset or not.
The issue with this trial and error is, I have several variables (134 variables in total). Therefore, it is very cumbersome, and I cannot easily find the right combination of variables that identify the observations in the dataset.
Please let me know if you have an easy way of solving this problem.
Thank you very much, and looking forward for your suggestions.
Comment