Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • A Tricky Regression

    Hi Statalist,

    Hope everyone is well!

    I'm running a regression and essentially have ran into a few problems that I honestly cannot seem to resolve that I was hoping I could get some insight into!

    Context
    • I'm running a MLS with a continuous dependent variable (values between 0 and 2).
    • All my independent variables are categorical, and save two (Sex and Mortgage_Number), they all have three or more categories.
    • I'm also running an interaction between Race and Sex.
    • Previously, I did use a multinomial logistic regression, but this turned out to be unsuitable for my task.
    Problems
    • My base category is White Male, however, Stata doesn't give me the interaction coefficient and standard errors for Male#Black, Male#Asian, Male#Other, and Female#White. I need to work these out, but can't find a way to either: (a) calculate them in Stata or (b) calculate them manually.
    • Also, since my independent variables are categorical, I don't think I can test whether my regression meets the assumptions of a multiple linear regression in the conventional ways.
    • Furthermore, I'm not too sure of ways to test for robustness and reliability of my results.
    • Please could someone help me!
    Thank you so much!

    Code:
    gen agegroup = x74r replace agegroup = 1 if x74r >= 18 & x74r <= 21 replace agegroup = 2 if x74r >= 22 & x74r <= 29 replace agegroup = 3 if x74r >= 30 & x74r <= 39 replace agegroup = 4 if x74r >= 40 & x74r <= 49 replace agegroup = 5 if x74r >= 50 & x74r <= 59 replace agegroup = 6 if x74r >= 60 & x74r <= 69 replace agegroup = 7 if x74r >= 70 & x74r <= 79 replace agegroup = 8 if x74r >= 80 & x74r <= 99 label define Age_Range 1 "Eighteen to Twenty-One" 2 "Twenty-Two to Twenty-Nine" 3 "Thirty to Thirty-Nine" 4 "Forty to Forty-Nine" 5 "Fifty to Fifty-Nine" 6 "Sixty to Sixty-Nine" 7 "Seventy to Seventy-Nine" 8 "Eighty to Ninety-Nine" label values agegroup Age_Range rename agegroup Age label define sex 1 "Male" 2 "Female" label values x75r sex rename x75r Sex label define education 1 "Some Schooling" 2 "High School" 3 "Technical School" 4 "College" 5 "College Graduate" 6 "Postgraduate Studies" label values x76r education rename x76r Education label define race 1 "White" 2 "Black" 3 "Asian" 4 "Other" label values x78r race rename x78r Race label define Household_Income 1 "Less Than $35,000" 2 "$35,000 to $49,999" 3 "$50,000 to $74,999" 4 "$75,000 to $99,999" 5 "$100,000 to $174,999" 6 "More Than $175,000" label values x83 Household_Income rename x83 Household_Income label define Risk_Attitudes 1 "High" 2 "Above Average" 3 "Average" 4 "Averse" label values x87 Risk_Attitudes rename x87 Risk_Attitudes label define Mortgage_Number 1 "First Mortgage" 2 "Not First Mortgage" label values first_mort_r Mortgage_Number rename first_mort_r Mortgage_Number mvdecode Mortgage_Number, mv(-2) rename ltv LTV rename score_orig_r Credit_Score mvdecode Credit_Score, mv(-2) drop if Credit_Score < 300 drop if Credit_Score > 850 recode x56a (3=0) (2=1) (1=2) label define mortgagelitone 0 "Not At All" 1 "Somewhat" 2 "Very" label values x56a mortgagelitone rename x56a Mortgage_Literacy_One recode x56b (3=0) (2=1) (1=2) label define mortgagelittwo 0 "Not At All" 1 "Somewhat" 2 "Very" label values x56b mortgagelittwo rename x56b Mortgage_Literacy_Two recode x56c (3=0) (2=1) (1=2) label define mortgagelitthree 0 "Not At All" 1 "Somewhat" 2 "Very" label values x56c mortgagelitthree rename x56c Mortgage_Literacy_Three recode x56d (3=0) (2=1) (1=2) label define mortgagelitfour 0 "Not At All" 1 "Somewhat" 2 "Very" label values x56d mortgagelitfour rename x56d Mortgage_Literacy_Four recode x56e (3=0) (2=1) (1=2) label define mortgagelitfive 0 "Not At All" 1 "Somewhat" 2 "Very" label values x56e mortgagelitfive rename x56e Mortgage_Literacy_Five recode x56f (3=0) (2=1) (1=2) label define mortgagelitsix 0 "Not At All" 1 "Somewhat" 2 "Very" label values x56f mortgagelitsix rename x56f Mortgage_Literacy_Six recode x56g (3=0) (2=1) (1=2) label define mortgagelitseven 0 "Not At All" 1 "Somewhat" 2 "Very" label values x56g mortgagelitseven rename x56g Mortgage_Literacy_Seven mvdecode Mortgage_Literacy_Seven, mv(-3) recode x56h (3=0) (2=1) (1=2) label define mortgageliteight 0 "Not At All" 1 "Somewhat" 2 "Very" label values x56h mortgageliteight rename x56h Mortgage_Literacy_Eight mvdecode Mortgage_Literacy_Eight, mv(-3) recode x56i (3=0) (2=1) (1=2) label define mortgagelitnine 0 "Not At All" 1 "Somewhat" 2 "Very" label values x56i mortgagelitnine rename x56i Mortgage_Literacy_Nine mvdecode Mortgage_Literacy_Nine, mv(-3) egen Mortgage_Literacy_Ten = rowmean(Mortgage_Literacy_One Mortgage_Literacy_Two Mortgage_Literacy_Three Mortgage_Literacy_Four Mortgage_Literacy_Five Mortgage_Literacy_Six Mortgage_Literacy_Seven Mortgage_Literacy_Eight Mortgage_Literacy_Nine) reg Mortgage_Literacy_Ten i.Sex##i.Race i.Education i.Household_Income i.Risk_Attitudes i.Age i.Mortgage_Number, allbaselevels

  • #2
    Also, here is a picture of the regression output!

    Click image for larger version

Name:	Output.jpg
Views:	1
Size:	424.5 KB
ID:	1707812

    Comment


    • #3
      My base category is White Male, however, Stata doesn't give me the interaction coefficient and standard errors for Male#Black, Male#Asian, Male#Other, and Female#White.
      Actually, it does. You're just looking for them in the wrong place.

      The first thing to remember is that in an interaction model, the "main effects" do not mean what they mean in a non-interaction model. Rather, they reflect the effects of those variables conditional on the other variable's being zero. This is another way of saying that these "main effects" coefficients are actually disguised interaction coefficients for terms where the other variable is in its base level.

      So, for example, the Male#Asian effect will be found as the coefficient of Asian in the Race variable. And the Female#White effect will be found as the coefficient of Female in the Sex variable. This is because Male is the base category for Sex in your model, and White the base category for Race.

      Comment


      • #4
        Hi Clyde,

        Amazing, thank you so much!

        How would I interpret the coefficients then? My understanding is that since my base category is White Male, I use "summarize Mortgage_Literacy_Ten if Sex==1 & Race==1" to get the mean value of Mortgage_Literacy_Ten for White Male, e.g. 1.3, and the coefficients represent the change from that mean value. For example, going from First Mortgage to Not First Mortgage would represent a 0.12 increase in Mortgage_Literacy_Ten from the mean value, so a 9% increase. Is this correct?

        Best wishes,
        Towhid

        Comment


        • #5
          Well, actually, that approach doesn't work well anyway. The problem is that it is highly likely that the other variable besides race and sex differ among the values of race and sex, so when you then take means conditioned on the values of race and sex, you are partially undoing the efforts of the regression to adjust the analysis for those differences in the other variables. What you would be calculating with that approach is the expected values of the outcome variable in each race#sex category while holding all of the other variables at zero (i.e., for the categorical predictors, holding them at their base level).

          To get fully adjusted estimates of the expected values of the outcome in each race sex category, you should instead use the -margins- command following the regression. For this regression, the follow-up with -margins- is
          Code:
          margins Sex#Race
          If you are looking for expected values that suppress variation in the variables other than Race and Sex (which is what you would have gotten with your approach, appropriately modified based on what I explained in #3) it is probably more meaningful to constrain the other variables to their means rather than their base levels. You can get that with
          Code:
          margins Sex#Race, atmeans
          Or, if you want expected values that suppress variation in the variables other than Race and Sex but don't like using their mean values, you can choose the specific values you want with:
          Code:
          margins Sex#Race, at(Education = chosen_value_of_education Risk_Attitudes = chosen_value_of_risk_attitude ETC.)
          Replace the italicized parts by values and variables appropriate to your model.

          Comment


          • #6
            Ah, I see. Thank you! Last question My new methodology is:
            • Run regression without the interaction effects.
            • Use margins to calculate the mortgage literacy value of my base category.
            • Run the regression with the interaction effect - but only to use the interaction effects.
            Essentially, I use the first regression to analyse the individual variables and their significance - they follow the same trend, and then using the second regression only for the interaction effects.

            Does this sound like it would produce reliable results?

            Comment


            • #7
              No. If I understand what you are saying, you would like to use both the model with and the model without iteraction and draw conclusions from one of them for some variables and from the other for other variables. Definitely a bad idea. You cannot "mix and match" the terms from the two models. (There are certain very stringent conditions under which this would be OK, but they seldom apply in real-world observational data. Moreover, in those conditions, the coefficients of the non-interaction-involved variables would turn out to be the same, or very nearly so, in both models, so it really would make no difference at all which model you looked at for those.)

              Perhaps I have misunderstood what you have in mind. In that case, please try to explain more clearly what you are thinking of doing here.




              Comment


              • #8
                Yes, you're right in understanding! I will definitely not do that then.

                Based on what you've said, I will do the following:
                • Run "reg Mortgage_Literacy_Ten i.Sex##i.Race i.Education i.Household_Income i.Risk_Attitudes i.Age i.Mortgage_Number, allbaselevels".
                • Use margins to get the mortgage literacy value of the base category.
                • Use this value to work out percentage changes in literacy levels based on the different variables, e.g. if the coefficient value for "Not First Mortgage" is +0.2, and the margin for Male#White is 1.3, then the percentage increase would be roughly 15%.
                Does this sound right? If so, do you have any thoughts on ways of dealing with testing whether my model meets the assumptions of a multiple linear regression? I have categorical variables that makes things more difficult, i.e. in practice, my dependent variable is also a categorical variable - the values are either 0, 1, or 2.

                Comment


                • #9
                  If this doesn't sound right, would it be possible to trouble you for a call please? )

                  Comment

                  Working...
                  X