I have a respondent variable for years from 2010 to 2019.
I wanted to know how many new respondents that did not appear in the previous year appear each year. Also, how many of the respondents from the previous year disappeared this year?
----------------------- copy starting from the next line -----------------------
------------------ copy up to and including the previous line ------------------
Listed 100 out of 55857 observations
Use the count() option to list more
. distinct Respondents
| Observations
| total distinct
-------------+----------------------
Respondents | 51155 93
. sum Respondents
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
Respondents | 51,155 128.5497 54.66533 2 191
.
When I distinct respondents, I see that I have a total of 191 distinct respondents.
Each year new respondents may be recorded, and existing (previous) respondents may not appear. The number of new respondents that appear and the number of previous respondents that do not appear each year is not necessarily equal.
I am trying to find each year how many respondents are new. In each year, how many of the existing respondents are disappearing?
I would appreciate your tips on how I have to get around this.
As the first step of my trial, I tried to generate a variable that indicates the total number of years each respondent appear using the following STATA command.
egen tag_year_Respondents = tag ( year Respondents )
replace tag_year_Respondents =. if Respondents ==.
bysort Respondents: egen countYearforRespondents = total(tag_year_Respondents) if Respondents !=.
Therefore, with “countYearforRespondents” variable, I can see the number of years that each respondent appear.
With a cross-tabulation of “countYearforRespondents” and “year” for observation with tag_year_Respondents ==1, using the following command (tab countYearforRespondents year if tag_year_Respondents==1 ), I managed to see how many respondents appear in all the year( in all the 10 years) by year.
For example, I can see, out of the 54 respondents in the year 2011, 14 appear in all 10 years, 3, appear in a total of 8 years, none appear for 7 years, and 4 respondents appear in a total of 6 years. Below is the cross-tabulation result
However, this cross-tabulation result doesn’t give me information on whether the respondents in each year are new or not.
How many of the existing respondents left each year?
For example, in 2012 I can see that from the cross-tabulation above, I have 43 respondents. I would like to know how many of the 43 are new respondents that appear neither in 2010 nor in 2011. If possible I would also like to know how many of the 43 respondents in 2012 disappear in 2013. The most important thing for me is the former, how many of the 43 in 2012 are new respondents that appear neither in 2010 nor in 2011?
I hope my issue is clear. I would appreciate it if you have any tips for that. Thank you!!
I wanted to know how many new respondents that did not appear in the previous year appear each year. Also, how many of the respondents from the previous year disappeared this year?
----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. For more info, type help dataex clear input int year float Respondents 2012 2 2013 2 2013 2 2012 2 2012 2 2012 2 2012 2 2011 2 2012 2 2012 2 2011 2 2012 2 2013 2 2013 2 2011 2 2012 2 2010 2 2012 2 2012 2 2012 2 2011 2 2012 2 2012 2 2011 2 2011 2 2014 2 2011 2 2011 2 2012 2 2012 2 2011 2 2012 2 2012 2 2013 2 2012 2 2012 2 2012 2 2010 2 2011 2 2010 2 2013 2 2013 2 2013 2 2013 2 2013 2 2013 2 2012 2 2013 2 2013 2 2010 2 2010 2 2013 2 2012 2 2012 2 2012 2 2010 2 2010 2 2012 2 2011 2 2012 2 2012 2 2012 2 2012 2 2011 2 2011 2 2012 2 2011 2 2010 2 2013 2 2013 2 2010 2 2011 2 2010 2 2010 2 2012 2 2012 2 2013 2 2011 2 2010 2 2010 2 2010 2 2010 2 2013 2 2012 2 2012 2 2013 2 2011 2 2013 2 2013 2 2011 2 2014 2 2011 2 2014 2 2013 2 2012 2 2011 2 2011 2 2010 2 2011 2 2011 2 end
Listed 100 out of 55857 observations
Use the count() option to list more
. distinct Respondents
| Observations
| total distinct
-------------+----------------------
Respondents | 51155 93
. sum Respondents
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
Respondents | 51,155 128.5497 54.66533 2 191
.
When I distinct respondents, I see that I have a total of 191 distinct respondents.
Each year new respondents may be recorded, and existing (previous) respondents may not appear. The number of new respondents that appear and the number of previous respondents that do not appear each year is not necessarily equal.
I am trying to find each year how many respondents are new. In each year, how many of the existing respondents are disappearing?
I would appreciate your tips on how I have to get around this.
As the first step of my trial, I tried to generate a variable that indicates the total number of years each respondent appear using the following STATA command.
egen tag_year_Respondents = tag ( year Respondents )
replace tag_year_Respondents =. if Respondents ==.
bysort Respondents: egen countYearforRespondents = total(tag_year_Respondents) if Respondents !=.
Therefore, with “countYearforRespondents” variable, I can see the number of years that each respondent appear.
With a cross-tabulation of “countYearforRespondents” and “year” for observation with tag_year_Respondents ==1, using the following command (tab countYearforRespondents year if tag_year_Respondents==1 ), I managed to see how many respondents appear in all the year( in all the 10 years) by year.
For example, I can see, out of the 54 respondents in the year 2011, 14 appear in all 10 years, 3, appear in a total of 8 years, none appear for 7 years, and 4 respondents appear in a total of 6 years. Below is the cross-tabulation result
countYearforRespondents | year | ||||||||||
2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | Total | |
1 | 9 | 1 | 2 | 2 | 1 | 3 | 5 | 1 | 1 | 5 | 30 |
2 | 5 | 6 | 2 | 0 | 1 | 3 | 2 | 3 | 3 | 3 | 28 |
3 | 7 | 7 | 6 | 2 | 1 | 1 | 2 | 1 | 0 | 0 | 27 |
4 | 5 | 3 | 5 | 5 | 2 | 2 | 2 | 2 | 2 | 0 | 28 |
5 | 7 | 7 | 7 | 7 | 7 | 0 | 0 | 0 | 0 | 0 | 35 |
6 | 4 | 4 | 5 | 3 | 4 | 5 | 3 | 5 | 4 | 5 | 42 |
7 | 0 | 0 | 0 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 14 |
8 | 3 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 2 | 2 | 24 |
10 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 140 |
Total | 54 | 44 | 43 | 37 | 34 | 33 | 33 | 31 | 28 | 31 | 368 |
However, this cross-tabulation result doesn’t give me information on whether the respondents in each year are new or not.
How many of the existing respondents left each year?
For example, in 2012 I can see that from the cross-tabulation above, I have 43 respondents. I would like to know how many of the 43 are new respondents that appear neither in 2010 nor in 2011. If possible I would also like to know how many of the 43 respondents in 2012 disappear in 2013. The most important thing for me is the former, how many of the 43 in 2012 are new respondents that appear neither in 2010 nor in 2011?
I hope my issue is clear. I would appreciate it if you have any tips for that. Thank you!!
Comment