Dear Statalisters,
I hope my request will be clear. I have, in a folder, multiple datasets for different countries and waves but they all belong to the same standardized survey. Some countries have only one wave, other countries have more waves, there's no particular rule for this. The problem with this classification is that the time in which each wave of the survey is done really depends on the country. For instance, a third wave done in country A doesn't mean it will be done at the same time than wave 3 of country B, etc. It just depends on when the previous waves of the survey were conducted for each country
So rather than keeping countries that were surveyed in a given wave, I'd like to keep countries that were surveyed in a given date regardless of the wave. Of course, all datasets have a standardized variable month and year. How would you proceed to do such a classification? For a demo, let's say my folder is composed of the following .dta datasets (the date of the survey is in parenthesis)
country1_W1 (July 2018)
country1_W2 (January 2019)
country2_W1 (September 2018)
country3_W1 (October 2018)
country3_W2 (February 2019)
country3_W3 (April 2019)
As you can see, belonging to the first wave doesn't mean that the waves were done at the same month across countries. How should I proceed if I want to 1) identify datasets ranging in a given timespan, say from September 2018 to March 2019 2) appending those datasets together ?
the variable month gives the month number and the variable year gives the year number. So only a few combination of these two variables should be kept (9-2018, ... 3-2019).
One thing to be mindful about: I assume there will be cases where some waves of the survey will be done at the end of one month and at the beginning of another. Say, for instance, that country3_W3 is done at the end of March2019 but also at the beginning of April 2019. If I specify that I want to identify datasets from September 2018 to March 2019, as stated above, then I'd like Stata not to forget about observations from country3_W3 to be done in April 2019 and to be a little more flexible on the boundaries.
I feel like there's no solution other than a case-by-case scenario where I look at the date of each wave of each country and delete the file if it doesn't match my timespan. If someone could save me countless hours of this repetitive work I would be forever grateful to them.
I hope I was clear. I can explain in other words if it isn't.
Thank you in advance for your time,
Hugo
I hope my request will be clear. I have, in a folder, multiple datasets for different countries and waves but they all belong to the same standardized survey. Some countries have only one wave, other countries have more waves, there's no particular rule for this. The problem with this classification is that the time in which each wave of the survey is done really depends on the country. For instance, a third wave done in country A doesn't mean it will be done at the same time than wave 3 of country B, etc. It just depends on when the previous waves of the survey were conducted for each country
So rather than keeping countries that were surveyed in a given wave, I'd like to keep countries that were surveyed in a given date regardless of the wave. Of course, all datasets have a standardized variable month and year. How would you proceed to do such a classification? For a demo, let's say my folder is composed of the following .dta datasets (the date of the survey is in parenthesis)
country1_W1 (July 2018)
country1_W2 (January 2019)
country2_W1 (September 2018)
country3_W1 (October 2018)
country3_W2 (February 2019)
country3_W3 (April 2019)
As you can see, belonging to the first wave doesn't mean that the waves were done at the same month across countries. How should I proceed if I want to 1) identify datasets ranging in a given timespan, say from September 2018 to March 2019 2) appending those datasets together ?
the variable month gives the month number and the variable year gives the year number. So only a few combination of these two variables should be kept (9-2018, ... 3-2019).
One thing to be mindful about: I assume there will be cases where some waves of the survey will be done at the end of one month and at the beginning of another. Say, for instance, that country3_W3 is done at the end of March2019 but also at the beginning of April 2019. If I specify that I want to identify datasets from September 2018 to March 2019, as stated above, then I'd like Stata not to forget about observations from country3_W3 to be done in April 2019 and to be a little more flexible on the boundaries.
I feel like there's no solution other than a case-by-case scenario where I look at the date of each wave of each country and delete the file if it doesn't match my timespan. If someone could save me countless hours of this repetitive work I would be forever grateful to them.
I hope I was clear. I can explain in other words if it isn't.
Thank you in advance for your time,
Hugo

Comment