Hi all!
I am working on the individual level data on first-generation immigrants in the US, who have declared their birthplaces as "historical" countries in the 1940, 1950, and 1960 samples. My task is to transform these birthplaces/"historical" countries to "modern" ones. In this regard, I need to create a transition matrix for each sample to effectively accomplish this task. These old samples comprise between 75 and 120 birthplace/"historical" country records. Here are the steps that I am planning to follow:
1. I initially need to create a transition matrix of zeros that specify responses given by these immigrants to the birthplace question as rows and all modern countries as columns. Each cell in this matrix needs to yield the probability of a birthplace/"historical" country being exactly the same as a "modern" country. As such, all cells need to contain values in [0,1], and rows need to sump up to 1.
2. If it's the case that a "historical" country and a "modern" one are the same (e.g., France), the entry in a given cell needs to switch from 0 to 1. On the other hand, if a "historical" country (e.g., USSR) is in several "modern" countries (e.g., Ukraine, Russia, etc.), then probabilities need to be proportional to population of each "modern" country.
I have tried several scenarios using the
and
commands, but to no avail. I am also aware that MATA could probably be more suitable for this task, but then again, I'm not so familiar with it. I seem to be facing two main issues:
(i) It's my understanding that each "historical" sample should consist of 1 variable ("country_old"). However, "modern" sample should contain 2 variables ("country_modern" and "population" for probabilty weights). If I try to merge them, they won't merge, since there is no country id for them. I don't think generating the country id would be plausible considering that it won't help in the process or merging, since they will be unique on their own.
(ii) If I were to reshape them, population as a second variable would float around, which is problematic.
I'm very confused and lost at the moment, and I apologize if I'm being sloppy in my problem identification. Please, let me know if I should clarify further.
All help/comments/suggestions are highly appreciated!
Wolfgang.
I am working on the individual level data on first-generation immigrants in the US, who have declared their birthplaces as "historical" countries in the 1940, 1950, and 1960 samples. My task is to transform these birthplaces/"historical" countries to "modern" ones. In this regard, I need to create a transition matrix for each sample to effectively accomplish this task. These old samples comprise between 75 and 120 birthplace/"historical" country records. Here are the steps that I am planning to follow:
1. I initially need to create a transition matrix of zeros that specify responses given by these immigrants to the birthplace question as rows and all modern countries as columns. Each cell in this matrix needs to yield the probability of a birthplace/"historical" country being exactly the same as a "modern" country. As such, all cells need to contain values in [0,1], and rows need to sump up to 1.
2. If it's the case that a "historical" country and a "modern" one are the same (e.g., France), the entry in a given cell needs to switch from 0 to 1. On the other hand, if a "historical" country (e.g., USSR) is in several "modern" countries (e.g., Ukraine, Russia, etc.), then probabilities need to be proportional to population of each "modern" country.
I have tried several scenarios using the
Code:
reshape
Code:
xttrans
(i) It's my understanding that each "historical" sample should consist of 1 variable ("country_old"). However, "modern" sample should contain 2 variables ("country_modern" and "population" for probabilty weights). If I try to merge them, they won't merge, since there is no country id for them. I don't think generating the country id would be plausible considering that it won't help in the process or merging, since they will be unique on their own.
(ii) If I were to reshape them, population as a second variable would float around, which is problematic.
I'm very confused and lost at the moment, and I apologize if I'm being sloppy in my problem identification. Please, let me know if I should clarify further.
All help/comments/suggestions are highly appreciated!
Wolfgang.
Comment