What non-parametric regression to use with average neighbourhood data

Vincent Scholing

Join Date: Jun 2023

Posts: 12
#1

What non-parametric regression to use with average neighbourhood data

13 Jun 2023, 03:43

*STATA17*
For my Thesis, I am trying to explain what the effects of distance are on multiple different dependent variables which include Stress in the past 4 weeks. This data is in the format of percentual averages (so for example neighbourhood 1, "24% reported feeling a lot of stress in the past 4 weeks".
I only have 48 neighbourhoods and a wide range of other control variables which are also procentual (see below).
using the pwcorr command with the stress in past 4 weeks & average_distance of park results in a significant negative association. After that, I looked into whether there was a linear relationship between the variables using a scatterplot which was not the case in my opinion (see image below)

pwcorr Having_stress_last_4_weeks Distance_to_park, star(0.05) obs
twoway (scatter Having_stress_last_4_weeks Distance_to_park) (lfit Having_stress_last_4_weeks Distance_to_park)

Due to there also being an outlier as seen in the scatterplot, I decided to also check the correlation using both spearman & Ktau

spearman Having_stress_last_4_weeks Distance_to_park, stats(rho p)
ktau Having_stress_last_4_weeks Distance_to_park, stats(taua taub p)

These both showed insignificant outcomes. (see image below)

My exact question now is what to do from here. As it is unclear to me due to the nature of the dependent variable being a percentage, is there a non-parametric regression available that suits the data well even though both Spearman & Tau are insignificant?
An additional problem with the data is that most other variables such as "Male_gender" is also percentual and thus almost fully correlate with "Female_Gender" and to a certain extent it is the same for "Education_Level" being 3 separate percentages per neighbourhood (Low, average & high). Could someone give insight whether this works properly or that I should omit one of the genders / education levels?

I hope that I worded my example well, the same for the examples I have given below as png images.

All variables: "Distance to park", "Percentage feeling stressed", "Percentage feeling lonely", Percentage with low education", "percentage with average education", Percentage with high education, "Percentage of people working" "Income in absolute numbers" "Percentage Male gender" "Percentage female gender" "Percentage age groups (3 in total)"
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17736
#2

13 Jun 2023, 04:26

Vincent:
I'm not that clear with what you're after.
As far as I can get your post:
1) you're dealing with a cross-sectional dataset with >1 dependent variables. Most/all of the dependent and independent variables are continuous and expressed in percentages;
2) it's unlikely that you can find out something informative by pairwise correlation, as more that one predictor can contribute to explain variations in the dependent variable(s);
3) therefore, you may want to take a look at -mvreg-;
4) outliers: exception made for blatant dataset entry errors, outliers are ofter values that are totally legal given the underlying data generating process.

Kind regards,
Carlo
(Stata 19.0)
Comment
Vincent Scholing

Join Date: Jun 2023

Posts: 12
#3

15 Jun 2023, 07:39

Thank you for your recommendation of using mvreg, I think that would fit here.
My exact question is now how to proceed with 2 or 3 different variables that fully explain each other such as gender or level of education. Should I fully omit one of them in order to make sure there is no multicollinearity in the final model?

Sincerely,

Vincent
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17736
#4

15 Jun 2023, 08:17

Vincent:
the first step to take is using -fvvarlist- notation.
Stata deals with potential collinearity automatically.

Last edited by Carlo Lazzaro; 15 Jun 2023, 08:21.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

What non-parametric regression to use with average neighbourhood data

Comment

Comment

Comment