Is this data analysis sound? - Help a self-taught beginner

Rowan Tate

Join Date: Aug 2022

Posts: 2
#1

Is this data analysis sound? - Help a self-taught beginner

06 Aug 2022, 05:36

I am a young and new Stata user trying to learn how to draw conclusions from datasets.
I have a small set of survey responses (attached) that I am trying to analyze.
As I am self-taught, could a more experienced user review my work/reasoning and offer corrections/suggestions?

I am interested to know:
What errors I am making

What issues I am not addressing

How my thinking can be more sophisticated

Whether conclusions can be drawn from this regression model

The dataset is attached and my do-file is below:

I will explore descriptive statistics to become familiar with the data.

Code:

codebook sum misstable sum

I am interested in exploring the following variables:

Code:

tab1 Q2 Q11 Q14 Q15, miss bysort Q11: tab Q2 Q14 bysort Q15: tab Q2 Q11

I will focus on the following variables:

Code:

sum Q14, detail sum Q15, detail tab Q14 Q15, chi2 lrchi2

I see there are negative values for missing responses which I will recode to be able to run correlations and regressions

Code:

replace Q14 = . if Q14 < 0 replace Q15 = . if Q15 < 0

I am interested in how strongly related these variables are.

Code:

pwcorr Q14 Q15, sig star(.05)

A Pearson’s correlation indicates a moderate positive correlation between Q14 and Q15 (r = .6, p < .00005), with social media as a news source explaining 36% of the variation in trusting the news.

I will further explore this relationship by building the following models:

Code:

set showbaselevels on asdoc reg Q15 i.Q14, robust nest append asdoc reg Q15 i.Q14 i.ideocat, robust nest append asdoc reg Q15 i.Q14 i.ideocat i.gencat i.racecat i.gender, robust nest append estat vif

I run regressions using robust standard errors to control for heteroskedasticity. The model explains nearly 40% of the variance in trust and shows a statistically significant relationship between Q15 and Q14 (p < .00005). The RMSE value (.61) indicates that the model can predict the data fairly accurately.

The degree to which news is acquired from social media, ideology, race, gender, and generation (Generation X and Baby Boomers) are statistically significant in explaining trust.

None of the predictors have VIF > 10 or 1/VIF < .1, suggesting that there is no multicollinearity.

Code:

pwcorr Q15 Q14 gencat ideocat racecat gender, sig star(.05)

I run a correlation matrix for all variables in the model. There is a strong relationship between using social media as a news source and trusting the news, more moderate correlations between generation and ideology, and inverse relationships for gender and race.

I can conclude that the degree to which one trusts social media as a news source depends on their race, gender, age, and ideology.

Attached Files

Dataset.dta (1.65 MB, 1 view)
Tags: None

Announcement

Is this data analysis sound? - Help a self-taught beginner