Dear Statalist user,
I have a problem regarding my panel data that should be fairly easy to solve but I was unable to accomplish it. I found similar threats but was unable to find a fitting solution.
(e.g. https://www.stata.com/statalist/arch.../msg00159.html)
My data is structured as follows:
personID_year_country_ income
1_______1____1_______1
2_______1____2_______2
3_______1____3_______3
1_______2____1_______4
2_______2____2_______5
3_______2____3_______6
I want to analyze various variables (e.g. income) as grouped by other variables (e.g.countries), e.g. calculating the mean of income per country. In order to improve the analysis I want to focus on countries with more than 30 observations (personID) per year. My attempt was to drop countries if they had a lower frequency of 30 per year. Is there a way to drop values from a variable if their frequency is less than, as in this case, 30? Or maybe dropping is not the right approach and there is another much better solution to my problem? I guess the if condition could work as well but I would still need to be able to create a variable tagging countries with more/less than 30 observations per year.
The codes I tried were accepted by Stata but didn`t provide the results I wished for. (There were still less than thirty observations in some values of the country variable in several years) As I thought they would work, I don`t know what I actually did, which is not really helpful so if someone might explain to me what`s wrong with my attempts I would be really happy as well:
First try:
bysort country (year): egen countrytotal= total(country)
drop if countrytotal<30
*Some observations were deleted but apparently not enough?
Second try:
foreach num of numlist 1/3 {
egen countrytag`num'= tag(country) if year==`zahl'
egen countrycount`num'= _N if countrytag`zahl'==1
drop if countrycount`zahl'<50
}
* No observations at all were deleted
As I read in the other thread, the assert command is a way for checking if a command worked as wished for. I used it, but maybe that one was wrong as well? Stata reported there was a number of contradictions in my observations.
bysort country (year): gen countryfreq= _N
assert countryfreq >30
As you can see I`m rather confused after a some hours of coding without accomplishing what I aimed for. I hope I was clear enough to express what I`m aiming at? If not please let me know and I´ll try to clarify my goals. Any help would be highly appreciated, thank you all in advance and have a nice day!
I have a problem regarding my panel data that should be fairly easy to solve but I was unable to accomplish it. I found similar threats but was unable to find a fitting solution.
(e.g. https://www.stata.com/statalist/arch.../msg00159.html)
My data is structured as follows:
personID_year_country_ income
1_______1____1_______1
2_______1____2_______2
3_______1____3_______3
1_______2____1_______4
2_______2____2_______5
3_______2____3_______6
I want to analyze various variables (e.g. income) as grouped by other variables (e.g.countries), e.g. calculating the mean of income per country. In order to improve the analysis I want to focus on countries with more than 30 observations (personID) per year. My attempt was to drop countries if they had a lower frequency of 30 per year. Is there a way to drop values from a variable if their frequency is less than, as in this case, 30? Or maybe dropping is not the right approach and there is another much better solution to my problem? I guess the if condition could work as well but I would still need to be able to create a variable tagging countries with more/less than 30 observations per year.
The codes I tried were accepted by Stata but didn`t provide the results I wished for. (There were still less than thirty observations in some values of the country variable in several years) As I thought they would work, I don`t know what I actually did, which is not really helpful so if someone might explain to me what`s wrong with my attempts I would be really happy as well:
First try:
bysort country (year): egen countrytotal= total(country)
drop if countrytotal<30
*Some observations were deleted but apparently not enough?
Second try:
foreach num of numlist 1/3 {
egen countrytag`num'= tag(country) if year==`zahl'
egen countrycount`num'= _N if countrytag`zahl'==1
drop if countrycount`zahl'<50
}
* No observations at all were deleted
As I read in the other thread, the assert command is a way for checking if a command worked as wished for. I used it, but maybe that one was wrong as well? Stata reported there was a number of contradictions in my observations.
bysort country (year): gen countryfreq= _N
assert countryfreq >30
As you can see I`m rather confused after a some hours of coding without accomplishing what I aimed for. I hope I was clear enough to express what I`m aiming at? If not please let me know and I´ll try to clarify my goals. Any help would be highly appreciated, thank you all in advance and have a nice day!
Comment