Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • What non-parametric regression to use with average neighbourhood data

    *STATA17*
    For my Thesis, I am trying to explain what the effects of distance are on multiple different dependent variables which include Stress in the past 4 weeks. This data is in the format of percentual averages (so for example neighbourhood 1, "24% reported feeling a lot of stress in the past 4 weeks".
    I only have 48 neighbourhoods and a wide range of other control variables which are also procentual (see below).
    using the pwcorr command with the stress in past 4 weeks & average_distance of park results in a significant negative association. After that, I looked into whether there was a linear relationship between the variables using a scatterplot which was not the case in my opinion (see image below)

    pwcorr Having_stress_last_4_weeks Distance_to_park, star(0.05) obs
    twoway (scatter Having_stress_last_4_weeks Distance_to_park) (lfit Having_stress_last_4_weeks Distance_to_park)

    Due to there also being an outlier as seen in the scatterplot, I decided to also check the correlation using both spearman & Ktau

    spearman Having_stress_last_4_weeks Distance_to_park, stats(rho p)
    ktau Having_stress_last_4_weeks Distance_to_park, stats(taua taub p)

    These both showed insignificant outcomes. (see image below)

    My exact question now is what to do from here. As it is unclear to me due to the nature of the dependent variable being a percentage, is there a non-parametric regression available that suits the data well even though both Spearman & Tau are insignificant?
    An additional problem with the data is that most other variables such as "Male_gender" is also percentual and thus almost fully correlate with "Female_Gender" and to a certain extent it is the same for "Education_Level" being 3 separate percentages per neighbourhood (Low, average & high). Could someone give insight whether this works properly or that I should omit one of the genders / education levels?

    I hope that I worded my example well, the same for the examples I have given below as png images.

    All variables: "Distance to park", "Percentage feeling stressed", "Percentage feeling lonely", Percentage with low education", "percentage with average education", Percentage with high education, "Percentage of people working" "Income in absolute numbers" "Percentage Male gender" "Percentage female gender" "Percentage age groups (3 in total)"
    Click image for larger version

Name:	Schermafbeelding 2023-06-12 145421.png
Views:	2
Size:	2.5 KB
ID:	1716999
    Click image for larger version

Name:	Schermafbeelding 2023-06-12 144533.png
Views:	2
Size:	5.9 KB
ID:	1717000
    Click image for larger version

Name:	Schermafbeelding 2023-06-13 113903.png
Views:	1
Size:	5.9 KB
ID:	1717003
    Click image for larger version

Name:	Schermafbeelding 2023-06-12 144559.png
Views:	2
Size:	32.6 KB
ID:	1717001
    Click image for larger version

Name:	Schermafbeelding 2023-06-12 144846.png
Views:	2
Size:	16.1 KB
ID:	1717002

  • #2
    Vincent:
    I'm not that clear with what you're after.
    As far as I can get your post:
    1) you're dealing with a cross-sectional dataset with >1 dependent variables. Most/all of the dependent and independent variables are continuous and expressed in percentages;
    2) it's unlikely that you can find out something informative by pairwise correlation, as more that one predictor can contribute to explain variations in the dependent variable(s);
    3) therefore, you may want to take a look at -mvreg-;
    4) outliers: exception made for blatant dataset entry errors, outliers are ofter values that are totally legal given the underlying data generating process.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you for your recommendation of using mvreg, I think that would fit here.
      My exact question is now how to proceed with 2 or 3 different variables that fully explain each other such as gender or level of education. Should I fully omit one of them in order to make sure there is no multicollinearity in the final model?

      Sincerely,

      Vincent

      Comment


      • #4
        Vincent:
        the first step to take is using -fvvarlist- notation.
        Stata deals with potential collinearity automatically.
        Last edited by Carlo Lazzaro; 15 Jun 2023, 08:21.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment

        Working...
        X