Inexperienced in STATA- How to interpret Wilcoxon matched pairs output

Mauro Salas

Join Date: Jun 2015

Posts: 2
#1

Inexperienced in STATA- How to interpret Wilcoxon matched pairs output

26 Jun 2015, 10:50

Hello All,

I am currently working on a project for my graduate program that requires me to work with STATA. That being said, I am very new to this and could use some assistance. I am working with data from a written survey which produced likert style responses. We used a scale of 1 to 6 relating to the topic and asked a series of questions. I was told that a Wilcoxon matched pairs test for each question would help give my audience a clearer picture of what was happening. I therefore thought to run signrank c1a=c1b (which correlate to section c question 1 on test a and b). I would do this for each question to look for a signficant difference between a pre test (Test a) and post (Test b). My first output is here http://imgur.com/b6VAAxp . While I can understand the p value, I am not sure what else I can pull from this output. Can someone please provide a quick run down on what the other numbers mean? Are the magnitude and sign of the Z score important? Anything else I should pay attention to? Thanks
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

26 Jun 2015, 12:18

Hello, Mauro,

Welcome to the Stata Forum!

First, I suggest you type - help signrank - so as to get background information concerning the Wilcoxon signed-ranks test and get acquainted with this important resource. That said, I took a look on the output and it indicates you might reject the null hypothesis. On the other hand, please keep in mind you have a large number of zeros (no difference in ranks, in fact, the majority of your observations) once they are excluded from the calculation.

Additionally, I fear the signed-ranks test may not be appropriate for your design, not only due to the fact that the range from 1 to 6 is short, but, fundamentally, because you have "a series of questions". If you performed repeated tests like this, you would potentially incur in familywise error.

You didn't tell much about your study question, but if you want to analize the answers from the questionnaire under an integrated perspective, SEM (Structural Equation Modeling) with time effects and latent variables could be a tentative option, among others.

On a second note,considering the complexity as well as the fact that you mentioned it is a graduation project, perhaps you should ask for advice from your professor and see what is really being demanded. Maybe it is just a descriptive analysis, who knows?

Finally, please prefer to write "Stata", i.e., keep only the S as a capital letter.

Hopefully that helps.

Best,

Marcos

Best regards,

Marcos
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17742
#3

27 Jun 2015, 08:04

Mauro:
set aside Wilcoxon test for a while, I do share Marcos' concern that your answers are not independent, because the same person answered a set of items.
I would take a look at -manova- and see whether, with some tweaks (i.e., considering ranks instead of continuous variables values) that approach matches with your research goal.

Kind regards,
Carlo
(Stata 19.0)
Comment
Mauro Salas

Join Date: Jun 2015

Posts: 2
#4

27 Jun 2015, 12:02

Thank you for your responses. Perhaps I should give a bit more information. The study involved a survey that was administered to medical students to assess asthma management and knowledge. The survey consisted to three sets of questions each in a likert style format. A response of 1 indicated "expert" while a 6 indicated "completely unfamiliar"; or similarly "always perform behavior" and "never" on a latter section. Either way, lower number responses suggest greater knowledge or recommended practices. The Pre-Test was administered in a paper format prior to an education intervention; basically a power point presentation explaining certain environmental triggers of asthma. Immediately following the presentation an identical (with the exception of basic demographic questions asked in the pre) Post-Test was administered. Three months later an email reminder was sent to participants asking that they fill out a follow-up survey. This final survey consisted of the same questions, however was online. In total, we reached about 700 students however I am specifically working with the 245 completed sets of pre/post/and follow up tests. These are my matched 245 respondents. Initially, the data was formatted in a way that each individual had a new entry for each test. My first approach was to use cronbach's alpha to see if each block of questions could potentially form some sort of scale. So I ran alpha on all questions in each specific section, resulting in alpha scores greater than .88 for each test scale. The distribution of the Scales that I generated was skewed so on the advice of someone else, I tried dichotomizing the scale. I recoded each scale to reflect 1 if the mean score on the scale was 1 vs anything else. I thought this could be a proxy for better knowledge or practices. This left me with three new variables indicating a mean score of 1 on each respective scale vs not a 1. Earlier, I had created another variable TEST which was either a 1, 2 or 3, that represented a pre, post or follow-up test respectively (used the ID code and excel). This TEST variable becomes important in my next step . I ran logistic CScalealways i.TEST , and did the same for all three scales. The results indicated that there are significant increased odds of scoring a 1 on the post test compared to the pre. It also showed increased odds of scoring a 1 on follow up compared to pre, but to a smaller extent than the post. I gave a rundown of what I did to my adviser and awaited a response.
He replied that the most legitimate statistical approach for this type of analysis is to conduct a factor analysis. He thinks that each section is likely trying to measure some "latent" variable (Knowledge on a specific area). but it may difficult for me to implement this approach given the time constraint and complexity. He says he is fine with the simplified solution of simply summing all items up but says I could also do a Wilcoxon signed ranks test on each question. It may be tedious, but it may give my readers a clearer picture what's going on after the intervention on each item.

So. I went back to the drawing board to see review the Wilcoxon sign rank test (to be honest, learn for the first time). From what I understand, the matched pairs test could be used to test for differences between two distributions. I was confused as to how to perform signrank based on how my data for formatted. In order to make the test work, I had to merge my data from a long format to wide. So now, instead of having repsondent xxx-yy-01, xxx-yy-02, xxx-yy-03 as separate entries; we had xxx-yy as one entry with three respective variables each representing the same question but different test version. ( For example Q1Test1, Q1Test2, Q1Test3). This brings me to the signrank function. I now ran signrank Q1Test1=Q1Test2 to look for a significant variance. Admittedly, I am not sure exactly what he wanted me to do with the Wilcoxon results, but I have emailed him and am awaiting his response.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17742
#5

28 Jun 2015, 08:19

Mauro:
thanks a lot for providing further details.
I share your supervisor's first proposal, as you have many dependent variables. As far as factor analysis is concerned, you can find an ad hoc command in Stata (please, see -help factor- and related entry in Stata 13.1 .pdf manual).
Conversely, I don't think that the so-called simplified solution (Wilcoxon sign rank test) is the right tool for multvariate analysis.

Kind regards,
Carlo
(Stata 19.0)
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#6

29 Jun 2015, 09:20

Hello, Mauro,

It seems we all (your advisor, Carlo and I) are talking on the same issue: you'd have two many comparisons to perform, there is a short-range scale and, besides, you may need to add latent variables and time effects. In Stata, you can deal with also these "situations", including principal component analysis, under - sem - commands.

Just type "help sem" and see how you like it. You may also take a look at confirmatory factor analysis under SEM.

Best,

Marcos

Best regards,

Marcos
Comment

Announcement

Inexperienced in STATA- How to interpret Wilcoxon matched pairs output

Comment

Comment

Comment

Comment

Comment