Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to generate a per capita income variable from the income data of each member of each household?

    Hi, i'm trying to generate a per capita income variable from the income data of each member of each household. The database I am using has the per capita income variable miscalculated in some cases, which is why I want to calculate it again.
    In the extract of the database that I have copied, the variables that appear are in Spanish. These are: codusu (dwelling), nro_hogar (home), component (member), p47t (income), ipcf ( family per capita income) and ingpcf (family per capita income, generated by me). The ipcf is a variable that already existed in the database but has been miscalculated, for example the first three rows are the observations of three members of the same dwelling and home. In that case, there's only one member that has a 16000 income, so the family per capita income is 16000/3=5,333.33. But the ipcf appears as 533333. Well, the entire database is full of these errors in the ipcf variable, so I thought it will be great to generate my own ipcf variable, that I've called ingpcf. But, if you look at the fourth and fifth row, the variable p47t of the same dwelling and home appears as -9 for the member 1 and 70000 for the member 2. The National Institute of Statistics and Censuses in its Permanent Household Survey, considers the code -9 as a missing observartion, so that's the reason why in the ipcf variable is zero in the fourth and fifth observations. The problem with my generated variable is that not considers this situation.

    To generate my ingpcf variable I've used bys codusu nro_hogar: egen ingpcf=mean(p47t) if p47t!=-9.

    But, that syntax is incorrect because generates missing values only in the row there's a -9 code in the variable p47t. I need missings values for all the observations that belongs to the same dwelling and home.

    My english is very bad, I'm sorry if you don't understand at all.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str29 codusu byte(nro_hogar componente) long(p47t ipcf) float ingpcf
    "TQRMNOPPQHJMKNCDEOHCH00628107" 1 1     0 533333 5333.333
    "TQRMNOPPQHJMKNCDEOHCH00628107" 1 2 16000 533333 5333.333
    "TQRMNOPPQHJMKNCDEOHCH00628107" 1 3     0 533333 5333.333
    "TQRMNOPPQHJMKPCDEIHJF00626473" 1 1    -9      0        .
    "TQRMNOPPQHJMKPCDEIHJF00626473" 1 2 70000      0    70000
    "TQRMNOPPQHJMLLCDEFMDB00623532" 1 1 25000   6750     6750
    "TQRMNOPPQHJMLLCDEFMDB00623532" 1 2  2000   6750     6750
    "TQRMNOPPQHJMLLCDEFMDB00623532" 1 3     0   6750     6750
    "TQRMNOPPQHJMLLCDEFMDB00623532" 1 4     0   6750     6750
    "TQRMNOPPQHJMLPCDEIMBF00627918" 1 1  9000  14500    14500
    "TQRMNOPPQHJMLPCDEIMBF00627918" 1 2 20000  14500    14500
    "TQRMNOPPQHJMLQCDEIJAH00627184" 1 1    -9      0        .
    "TQRMNOPPQHJMLQCDEIJAH00627184" 1 2    -9      0        .
    "TQRMNOPPQHJMLUCDEFIAH00622993" 1 1    -9      0        .
    "TQRMNOPPQHJMLUCDEFIAH00622993" 1 2 15000      0     5000
    "TQRMNOPPQHJMLUCDEFIAH00622993" 1 3     0      0     5000
    "TQRMNOPPQHJMLUCDEFIAH00622993" 1 4     0      0     5000
    end
    Last edited by Ignacio Ibarra; 31 Jul 2019, 16:04.

  • #2
    In some other statistics packages it is common practice to use magic numbers like -9 to encode missing values, and then you tell commands to ignore those numbers. But that is a recipe for trouble in Stata, as you have seen. The solution is not to code around it but to replace them with Stata missing values.

    Code:
    mvdecode p47t, mv(-9)
    by codusu nro_hogar, sort: egen per_capita_income = mean(p47t)
    Note: the last line of code assumes that a household is defined by the combination of codusu and nro_hogar. If that is not correct, replace codusu nro_hogar by whatever variable or combination of variables identifies households.

    By the way, before you get into more trouble with this kind of data, you should go through the entire data set and -mvdecode- any other variables that have missing values recorded as numbers.

    Comment


    • #3
      Thanks a lot. It was such usefull your advice.

      Comment

      Working...
      X