Wls

Guest
#1

Wls

04 Mar 2019, 07:35

I want to obtain WLS(Weighted Least Square) estimate susing -[aweight]- , I do not know what should be the value of -[aweight=?]-
My model is :
reg reject black obrat high_ltv medium_ltv unem credit_hist public_rec married, cformat(%9.3f)
And my dataset is :
input byte(reject black) float(obrat high_ltv medium_ltv unem) byte(married credit_hist public_rec)
0 0 37 0 0 3.2 1 1 0
0 0 21 0 0 3.1 1 1 0
0 0 7 0 1 10.6 1 1 0
0 0 35 0 1 2 0 1 0
0 0 29 0 0 1.8 1 1 0
0 0 33 0 1 3.2 0 1 0
0 0 40 0 0 3.9 0 1 0
0 0 33 0 1 3.1 1 1 0
0 0 37 1 0 3.2 1 1 0
0 0 19 0 0 1.8 1 1 0
0 0 34.7 0 0 3.2 0 1 0
0 0 33 0 0 4.3 0 1 0
0 0 35.2 0 1 10.6 1 1 0
0 0 34 0 0 1.8 0 1 0
1 0 7 0 0 3.6 1 1 0
0 0 36 0 0 3.9 1 1 0
0 0 33 0 0 3.2 1 1 0
0 0 17.8 0 0 4.3 1 1 0
0 0 41 0 0 3.2 1 1 0
0 0 29 0 0 4.3 0 1 0
0 0 42.1 0 0 5.3 1 1 0
0 0 35.5 1 0 10.6 1 1 0
0 0 17 0 0 4.3 1 1 0
0 0 38 0 1 3.2 1 1 0
0 0 33.1 0 0 10.6 1 1 0
0 0 35 0 1 1.8 0 1 0
0 0 33 0 1 1.8 1 1 0
0 0 24 0 1 3.6 1 1 0
0 0 35 0 1 3.1 1 1 0
0 0 24 0 0 3.2 1 1 0
0 0 31.1 0 1 3.2 0 1 0
0 0 39 0 0 3.2 1 1 0
0 0 35.6 0 0 1.8 1 1 0
0 0 16 0 0 10.6 1 1 0
0 0 22 0 1 3.2 0 1 0
1 0 16 0 1 3.2 0 0 1
0 0 31 0 1 4.3 1 1 0
0 0 23 0 0 3.1 1 1 0
0 0 37.2 0 1 3.2 1 1 0
0 0 30.1 1 0 1.8 0 1 0
0 0 35 0 1 2 1 1 0
0 0 42 0 1 3.2 0 1 0
0 0 24 0 0 5.3 1 1 0
0 0 29 0 0 3.2 1 1 0
1 0 66 0 1 3.2 0 1 0
0 0 25 0 0 3.1 0 1 0
0 0 30 0 1 3.2 1 1 0
0 0 7 0 0 3.9 0 1 0
1 0 32 1 0 3.2 1 1 0
0 0 28 0 0 3.9 1 1 0
0 0 34 0 1 1.8 1 1 0
0 0 34.4 0 0 3.2 0 1 0
0 0 36 0 1 3.2 1 1 0
0 0 32 0 1 4.3 1 1 0
0 0 30.2 0 1 1.8 0 1 0
0 0 29 0 0 3.6 0 1 0
0 0 33 0 0 4.3 0 1 0
0 0 31 0 1 3.2 1 1 0
0 0 35.2 0 1 3.2 0 1 0
0 0 22 0 1 2 1 0 1
0 0 37.2 0 1 3.1 1 1 0
0 0 21.2 0 1 3.2 1 1 0
0 0 35 0 1 3.2 0 1 0
0 0 32 0 1 4.3 0 1 0
0 0 33.8 0 0 3.2 0 1 0
1 0 20 0 0 1.8 1 0 0
0 0 38 0 1 3.2 1 1 0
0 0 26.6 0 0 1.8 0 1 0
0 0 33 0 0 3.2 1 1 0
0 0 40 1 0 3.1 1 1 0
0 0 32.4 0 1 3.2 1 1 0
0 0 33.4 0 1 3.2 1 1 0
0 0 34 0 0 10.6 1 1 0
0 0 37 1 0 3.6 1 1 0
0 0 24 0 0 3.2 1 1 0
0 0 36.7 0 0 4.3 1 1 0
0 0 21.7 0 0 3.9 1 1 0
0 0 16 0 0 3.2 1 1 0
0 0 43 0 0 4.3 1 1 0
1 0 31.4 1 0 4.3 0 1 0
0 0 33 0 0 3.2 0 1 0
1 0 27 0 1 2 1 1 0
0 0 30 1 0 3.2 0 1 0
0 0 8 0 0 3.2 0 1 0
0 0 15 0 0 3.2 1 1 0
0 0 30 0 1 3.2 1 1 0
0 0 31.2 0 1 1.8 0 1 0
1 0 35 0 1 4.3 0 1 0
0 0 20 0 0 3.2 1 1 0
1 0 37 0 0 4.3 1 1 0
0 0 28 0 1 3.9 1 1 0
0 0 40 0 1 3.2 1 1 0
0 0 27 0 0 3.9 1 1 0
1 0 50.5 0 0 1.8 1 0 0
0 0 39 0 1 3.2 0 1 0
0 0 27.8 0 1 3.2 1 1 0
0 0 27 0 0 1.8 1 0 1
0 0 24 0 0 3.2 0 1 0
0 0 43 0 1 3.2 1 1 0
0 0 21 0 0 3.6 1 1 1
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

04 Mar 2019, 09:08

At the least, you will have to explain more about where this data comes from and what your research goals are to get an answer to your question.

But at least from the looks of what you posted, it seems that this data set is not suitable for a weighted least squares analysis at all. -aweight-s are used when the dependent variable represents the average of a series of measurements, and the number of measurements averaged differs across observations. In that situation, the -aweight- variable contains the number of measurements that were averaged to produce the outcome variable for the current observation. But it is almost inconceivable that this process could lead to a data set in which the outcome variable is always 0 or 1!

(A typical application of -aweights- would be where each observation represents a group of people, such as a class of students in a school, and the outcome variable is the average score of all students in that class on some test. Then the aweight variable would be the number of students in the class who took the test.)

So tell us more about what the data means and what your research question is, and perhaps somebody can suggest a better approach.
Comment
Guest
#3

04 Mar 2019, 09:33

The research question: Is there any difference in mortgage application denial by race ? I found heteroskedasticity in race level. All of the fitted values must lie between 0 and 1. I made the adjustment as below: . quietly reg reject black obrat high_ltv medium_ltv unem credit_hist public_rec married, cformat(%9.3f) . predict prob_lpm, xb . replace prob_lpm = 0.000035 if prob_lpm <0 Vriable definition: married public_rec obrat unem
black reject credit_hist high_ltv medium_ltv

=1 if applicant married
=1 if filed bankruptcy
other debt obligations as a percentage of total income unemployment rate by industry of applicant
=1 if applicant black
=1 if mortgage application denied
=0 if accnts deliq. >= 60 days
Loan to value ratio > 0.95
Loan to value ratio between 0.8 and 0.95
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#4

04 Mar 2019, 09:58

I don't see anything in the nature of the data or the research question that calls for a weighted analysis.

I see other problems here. If it is crucial to your work that the fitted values always fall between 0 and 1, you should not be using a linear regression at all, because for low-risk or high-risk cases, it can produce predicted values outside that range. It even appears that you have encountered that problem, because you included code to replace negative predicted values by an arbitrary positive number 0.000035, a practice which is, at best, very difficult to justify.

I would do this differently. I would use logistic or probit regression: these models always produce fitted values between 0 and 1. If you are concerned about heteroscedasticity, use robust variance estimation. So something like this:

Code:

logit reject black obrat high_ltv medium_ltv unem credit_hist public_rec married, cformat(%9.3f) vce(robust)

After that you can use -predict- and you will always get values between 0 and 1.
1 like
Comment
Guest
#5

04 Mar 2019, 11:39

Let me put the whole question that I was asked to solve :

Based on the model
reg reject black obrat high_ltv medium_ltv unem credit_hist public_rec married, cformat(%9.3f)

OLS estimators are inefficient in the linear probability model since the conditional variance of depends on the regressors:
Var[Y|X1,X2,X3,...,Xk]=P(X1,X2,X3,...,Xk)[1-P(X1,X2,X3,...,Xk)]
where
P(X1,X2,X3,...,Xk)=B0+B1X1+B2X2+...+BkXk
We should expect heteroskedasticity of a particular form as indicated in
8.47 equation : h_hati=y_hati(1-y_hati)

A. Note that equations 8.47 also imply the weights to estimate this equation via weighted least squares. Obtain the WLS estimates by hand (that is, without using Stata's -[aweight]-). Note that, as all of the fitted values (y_hati) must lie between 0 and 1. Thus, you must make the adjustment :
. quietly reg reject black obrat high_ltv medium_ltv unem credit_hist public_rec married, cformat(%9.3f)
. predict prob_lpm, xb
. replace prob_lpm = 0.000035 if prob_lpm <0

B. Now verify your by hand WLS estimates by showing that you obtain the same coefficient estimates and standard errors using Stata's -[aweight]-.
Comment

Announcement

Comment

Comment

Comment

Comment