Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Wls

    I want to obtain WLS(Weighted Least Square) estimate susing -[aweight]- , I do not know what should be the value of -[aweight=?]-
    My model is :
    reg reject black obrat high_ltv medium_ltv unem credit_hist public_rec married, cformat(%9.3f)
    And my dataset is :
    input byte(reject black) float(obrat high_ltv medium_ltv unem) byte(married credit_hist public_rec)
    0 0 37 0 0 3.2 1 1 0
    0 0 21 0 0 3.1 1 1 0
    0 0 7 0 1 10.6 1 1 0
    0 0 35 0 1 2 0 1 0
    0 0 29 0 0 1.8 1 1 0
    0 0 33 0 1 3.2 0 1 0
    0 0 40 0 0 3.9 0 1 0
    0 0 33 0 1 3.1 1 1 0
    0 0 37 1 0 3.2 1 1 0
    0 0 19 0 0 1.8 1 1 0
    0 0 34.7 0 0 3.2 0 1 0
    0 0 33 0 0 4.3 0 1 0
    0 0 35.2 0 1 10.6 1 1 0
    0 0 34 0 0 1.8 0 1 0
    1 0 7 0 0 3.6 1 1 0
    0 0 36 0 0 3.9 1 1 0
    0 0 33 0 0 3.2 1 1 0
    0 0 17.8 0 0 4.3 1 1 0
    0 0 41 0 0 3.2 1 1 0
    0 0 29 0 0 4.3 0 1 0
    0 0 42.1 0 0 5.3 1 1 0
    0 0 35.5 1 0 10.6 1 1 0
    0 0 17 0 0 4.3 1 1 0
    0 0 38 0 1 3.2 1 1 0
    0 0 33.1 0 0 10.6 1 1 0
    0 0 35 0 1 1.8 0 1 0
    0 0 33 0 1 1.8 1 1 0
    0 0 24 0 1 3.6 1 1 0
    0 0 35 0 1 3.1 1 1 0
    0 0 24 0 0 3.2 1 1 0
    0 0 31.1 0 1 3.2 0 1 0
    0 0 39 0 0 3.2 1 1 0
    0 0 35.6 0 0 1.8 1 1 0
    0 0 16 0 0 10.6 1 1 0
    0 0 22 0 1 3.2 0 1 0
    1 0 16 0 1 3.2 0 0 1
    0 0 31 0 1 4.3 1 1 0
    0 0 23 0 0 3.1 1 1 0
    0 0 37.2 0 1 3.2 1 1 0
    0 0 30.1 1 0 1.8 0 1 0
    0 0 35 0 1 2 1 1 0
    0 0 42 0 1 3.2 0 1 0
    0 0 24 0 0 5.3 1 1 0
    0 0 29 0 0 3.2 1 1 0
    1 0 66 0 1 3.2 0 1 0
    0 0 25 0 0 3.1 0 1 0
    0 0 30 0 1 3.2 1 1 0
    0 0 7 0 0 3.9 0 1 0
    1 0 32 1 0 3.2 1 1 0
    0 0 28 0 0 3.9 1 1 0
    0 0 34 0 1 1.8 1 1 0
    0 0 34.4 0 0 3.2 0 1 0
    0 0 36 0 1 3.2 1 1 0
    0 0 32 0 1 4.3 1 1 0
    0 0 30.2 0 1 1.8 0 1 0
    0 0 29 0 0 3.6 0 1 0
    0 0 33 0 0 4.3 0 1 0
    0 0 31 0 1 3.2 1 1 0
    0 0 35.2 0 1 3.2 0 1 0
    0 0 22 0 1 2 1 0 1
    0 0 37.2 0 1 3.1 1 1 0
    0 0 21.2 0 1 3.2 1 1 0
    0 0 35 0 1 3.2 0 1 0
    0 0 32 0 1 4.3 0 1 0
    0 0 33.8 0 0 3.2 0 1 0
    1 0 20 0 0 1.8 1 0 0
    0 0 38 0 1 3.2 1 1 0
    0 0 26.6 0 0 1.8 0 1 0
    0 0 33 0 0 3.2 1 1 0
    0 0 40 1 0 3.1 1 1 0
    0 0 32.4 0 1 3.2 1 1 0
    0 0 33.4 0 1 3.2 1 1 0
    0 0 34 0 0 10.6 1 1 0
    0 0 37 1 0 3.6 1 1 0
    0 0 24 0 0 3.2 1 1 0
    0 0 36.7 0 0 4.3 1 1 0
    0 0 21.7 0 0 3.9 1 1 0
    0 0 16 0 0 3.2 1 1 0
    0 0 43 0 0 4.3 1 1 0
    1 0 31.4 1 0 4.3 0 1 0
    0 0 33 0 0 3.2 0 1 0
    1 0 27 0 1 2 1 1 0
    0 0 30 1 0 3.2 0 1 0
    0 0 8 0 0 3.2 0 1 0
    0 0 15 0 0 3.2 1 1 0
    0 0 30 0 1 3.2 1 1 0
    0 0 31.2 0 1 1.8 0 1 0
    1 0 35 0 1 4.3 0 1 0
    0 0 20 0 0 3.2 1 1 0
    1 0 37 0 0 4.3 1 1 0
    0 0 28 0 1 3.9 1 1 0
    0 0 40 0 1 3.2 1 1 0
    0 0 27 0 0 3.9 1 1 0
    1 0 50.5 0 0 1.8 1 0 0
    0 0 39 0 1 3.2 0 1 0
    0 0 27.8 0 1 3.2 1 1 0
    0 0 27 0 0 1.8 1 0 1
    0 0 24 0 0 3.2 0 1 0
    0 0 43 0 1 3.2 1 1 0
    0 0 21 0 0 3.6 1 1 1

  • #2
    At the least, you will have to explain more about where this data comes from and what your research goals are to get an answer to your question.

    But at least from the looks of what you posted, it seems that this data set is not suitable for a weighted least squares analysis at all. -aweight-s are used when the dependent variable represents the average of a series of measurements, and the number of measurements averaged differs across observations. In that situation, the -aweight- variable contains the number of measurements that were averaged to produce the outcome variable for the current observation. But it is almost inconceivable that this process could lead to a data set in which the outcome variable is always 0 or 1!

    (A typical application of -aweights- would be where each observation represents a group of people, such as a class of students in a school, and the outcome variable is the average score of all students in that class on some test. Then the aweight variable would be the number of students in the class who took the test.)

    So tell us more about what the data means and what your research question is, and perhaps somebody can suggest a better approach.

    Comment


    • #3

      The research question: Is there any difference in mortgage application denial by race ? I found heteroskedasticity in race level. All of the fitted values must lie between 0 and 1. I made the adjustment as below: . quietly reg reject black obrat high_ltv medium_ltv unem credit_hist public_rec married, cformat(%9.3f) . predict prob_lpm, xb . replace prob_lpm = 0.000035 if prob_lpm <0 Vriable definition: married public_rec obrat unem
      black reject credit_hist high_ltv medium_ltv

      =1 if applicant married
      =1 if filed bankruptcy
      other debt obligations as a percentage of total income unemployment rate by industry of applicant
      =1 if applicant black
      =1 if mortgage application denied
      =0 if accnts deliq. >= 60 days
      Loan to value ratio > 0.95
      Loan to value ratio between 0.8 and 0.95

      Comment


      • #4
        I don't see anything in the nature of the data or the research question that calls for a weighted analysis.

        I see other problems here. If it is crucial to your work that the fitted values always fall between 0 and 1, you should not be using a linear regression at all, because for low-risk or high-risk cases, it can produce predicted values outside that range. It even appears that you have encountered that problem, because you included code to replace negative predicted values by an arbitrary positive number 0.000035, a practice which is, at best, very difficult to justify.

        I would do this differently. I would use logistic or probit regression: these models always produce fitted values between 0 and 1. If you are concerned about heteroscedasticity, use robust variance estimation. So something like this:

        Code:
        logit reject black obrat high_ltv medium_ltv unem credit_hist public_rec married, cformat(%9.3f) vce(robust)
        After that you can use -predict- and you will always get values between 0 and 1.

        Comment


        • #5
          Let me put the whole question that I was asked to solve :

          Based on the model
          reg reject black obrat high_ltv medium_ltv unem credit_hist public_rec married, cformat(%9.3f)

          OLS estimators are inefficient in the linear probability model since the conditional variance of depends on the regressors:
          Var[Y|X1,X2,X3,...,Xk]=P(X1,X2,X3,...,Xk)[1-P(X1,X2,X3,...,Xk)]
          where
          P(X1,X2,X3,...,Xk)=B0+B1X1+B2X2+...+BkXk
          We should expect heteroskedasticity of a particular form as indicated in
          8.47 equation : h_hati=y_hati(1-y_hati)

          A. Note that equations 8.47 also imply the weights to estimate this equation via weighted least squares. Obtain the WLS estimates by hand (that is, without using Stata's -[aweight]-). Note that, as all of the fitted values (y_hati) must lie between 0 and 1. Thus, you must make the adjustment :
          . quietly reg reject black obrat high_ltv medium_ltv unem credit_hist public_rec married, cformat(%9.3f)
          . predict prob_lpm, xb
          . replace prob_lpm = 0.000035 if prob_lpm <0


          B. Now verify your by hand WLS estimates by showing that you obtain the same coefficient estimates and standard errors using Stata's -[aweight]-.

          Comment

          Working...
          X