Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Graphing binary response variable with extreme independent variable

    My question is relatively simple.

    I'm trying to study whether primary school completion (yes/no- binary) varies with income for 3 racial groups. The problem with my data is that for some groups I have a large proportion of people bunched at zero to very little income. I also some extreme outliers with very high incomes. I essentially just want a way to be able to visualize the relationship between the probability of having completed primary school and income (elasticity of primary school completion to income). Income is measured in local currency. Since I am using survey data with sample weights I want to be able to use weights appropriately. This data is at the individual level and the question essentially is- Have you completed primary school. I also have several other controls for gender, age, etc.

    I want to know whether the income elasticity is different for the 3 groups.

    What would be the best way to do that graphically and econometrically.

  • #2
    For graphing purposes, you should consider using log of income in place of income. It would be unusual for respondents to report a negative income, and for those with a strictly zero income, you will want to change this to a positive value very close to 0, e.g., 0.0001 before taking logs. To simulate your data, I generate a variable from a fat tailed distribution (specifically Cauchy) where I fix all negative values at 0.

    Code:
    . set obs 1000
    number of observations (_N) was 0, now 1,000
    
    . gen n1= rnormal(0,1)
    
    . gen n2= rnormal(0,1)
    
    . *A CAUCHY RANDOM VARIABLE IS THE RATIO OF TWO STANDARD NORMAL VARIABLES
    
    . gen income = n1/n2
    
    . replace income= .0001 if income<0.00001
    (498 real changes made)
    
    . *SET ODD NUMBERS AS COMPLETED, 0 OTHERWISE
    
    . gen completed=mod(_n,2)
    
    . *DEFINE LABELS
    
    . label define Completed 0 "Not Completed" 1 "Completed"
    
    . label values completed Completed
    
    . histogram income, by(completed)
    Because of the concentration of values at zero (or near zero), you will not see the other observations if you use the raw income values.


    Click image for larger version

Name:	income.png
Views:	1
Size:	11.6 KB
ID:	1355241




    However, using log-income allows you to view the non- near zero observations

    Code:
    . gen log_income = log(income)
    
    . histogram log_income, by(completed)


    Click image for larger version

Name:	log_income.png
Views:	1
Size:	13.3 KB
ID:	1355242

    Comment

    Working...
    X