Hi, i'm trying to generate a per capita income variable from the income data of each member of each household. The database I am using has the per capita income variable miscalculated in some cases, which is why I want to calculate it again.
In the extract of the database that I have copied, the variables that appear are in Spanish. These are: codusu (dwelling), nro_hogar (home), component (member), p47t (income), ipcf ( family per capita income) and ingpcf (family per capita income, generated by me). The ipcf is a variable that already existed in the database but has been miscalculated, for example the first three rows are the observations of three members of the same dwelling and home. In that case, there's only one member that has a 16000 income, so the family per capita income is 16000/3=5,333.33. But the ipcf appears as 533333. Well, the entire database is full of these errors in the ipcf variable, so I thought it will be great to generate my own ipcf variable, that I've called ingpcf. But, if you look at the fourth and fifth row, the variable p47t of the same dwelling and home appears as -9 for the member 1 and 70000 for the member 2. The National Institute of Statistics and Censuses in its Permanent Household Survey, considers the code -9 as a missing observartion, so that's the reason why in the ipcf variable is zero in the fourth and fifth observations. The problem with my generated variable is that not considers this situation.
To generate my ingpcf variable I've used bys codusu nro_hogar: egen ingpcf=mean(p47t) if p47t!=-9.
But, that syntax is incorrect because generates missing values only in the row there's a -9 code in the variable p47t. I need missings values for all the observations that belongs to the same dwelling and home.
My english is very bad, I'm sorry if you don't understand at all.
In the extract of the database that I have copied, the variables that appear are in Spanish. These are: codusu (dwelling), nro_hogar (home), component (member), p47t (income), ipcf ( family per capita income) and ingpcf (family per capita income, generated by me). The ipcf is a variable that already existed in the database but has been miscalculated, for example the first three rows are the observations of three members of the same dwelling and home. In that case, there's only one member that has a 16000 income, so the family per capita income is 16000/3=5,333.33. But the ipcf appears as 533333. Well, the entire database is full of these errors in the ipcf variable, so I thought it will be great to generate my own ipcf variable, that I've called ingpcf. But, if you look at the fourth and fifth row, the variable p47t of the same dwelling and home appears as -9 for the member 1 and 70000 for the member 2. The National Institute of Statistics and Censuses in its Permanent Household Survey, considers the code -9 as a missing observartion, so that's the reason why in the ipcf variable is zero in the fourth and fifth observations. The problem with my generated variable is that not considers this situation.
To generate my ingpcf variable I've used bys codusu nro_hogar: egen ingpcf=mean(p47t) if p47t!=-9.
But, that syntax is incorrect because generates missing values only in the row there's a -9 code in the variable p47t. I need missings values for all the observations that belongs to the same dwelling and home.
My english is very bad, I'm sorry if you don't understand at all.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str29 codusu byte(nro_hogar componente) long(p47t ipcf) float ingpcf "TQRMNOPPQHJMKNCDEOHCH00628107" 1 1 0 533333 5333.333 "TQRMNOPPQHJMKNCDEOHCH00628107" 1 2 16000 533333 5333.333 "TQRMNOPPQHJMKNCDEOHCH00628107" 1 3 0 533333 5333.333 "TQRMNOPPQHJMKPCDEIHJF00626473" 1 1 -9 0 . "TQRMNOPPQHJMKPCDEIHJF00626473" 1 2 70000 0 70000 "TQRMNOPPQHJMLLCDEFMDB00623532" 1 1 25000 6750 6750 "TQRMNOPPQHJMLLCDEFMDB00623532" 1 2 2000 6750 6750 "TQRMNOPPQHJMLLCDEFMDB00623532" 1 3 0 6750 6750 "TQRMNOPPQHJMLLCDEFMDB00623532" 1 4 0 6750 6750 "TQRMNOPPQHJMLPCDEIMBF00627918" 1 1 9000 14500 14500 "TQRMNOPPQHJMLPCDEIMBF00627918" 1 2 20000 14500 14500 "TQRMNOPPQHJMLQCDEIJAH00627184" 1 1 -9 0 . "TQRMNOPPQHJMLQCDEIJAH00627184" 1 2 -9 0 . "TQRMNOPPQHJMLUCDEFIAH00622993" 1 1 -9 0 . "TQRMNOPPQHJMLUCDEFIAH00622993" 1 2 15000 0 5000 "TQRMNOPPQHJMLUCDEFIAH00622993" 1 3 0 0 5000 "TQRMNOPPQHJMLUCDEFIAH00622993" 1 4 0 0 5000 end
Comment