Finding out statistically significant data

Miraz Mahmud

Join Date: Aug 2014

Posts: 19
#1

Finding out statistically significant data

31 Aug 2014, 15:19

Sir, I am using Stata to analyze survey data to find out the combination of the factors whose p value is greater than 1. How can I get the combination with highest number of factors easily?
I have to find at least 10 factors in row 1 and 2 (please see the smcl file attached)
Attached Files

data smcl.smcl (6.2 KB, 1 view)

data txt.txt (31.2 KB, 1 view)

data.dta (31.3 KB, 2 views)

data.xlsx (62.5 KB, 1 view)
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#2

31 Aug 2014, 17:22

by definition a p-value canNOT be >1; your question makes no sense; please clarify
2 likes
Comment
Miraz Mahmud

Join Date: Aug 2014

Posts: 19
#3

02 Sep 2014, 08:02

Sorry Sir, It was a typing mistake, it will be less than 0.1 (<0.1). It will be so helpful for me if I get a solution.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#4

02 Sep 2014, 08:17

the combination of the factors

It would help if you explained more clearly and carefully precisely what you mean by this, perhaps by reference to a published article that has implemented the approach. (If you cite work, please give complete bibliographic references as the FAQ asks.)
Comment
Miraz Mahmud

Join Date: Aug 2014

Posts: 19
#5

02 Sep 2014, 09:01

Sir, I conducted a survey on the driver's choice of message type shown in VMS(variable message sign) on the roads and the factors on which their choice of message type vary. We had 3 message type( 3 dependent variables) and 84 factors like age, gender e.t.c( 84 independent variables) on which their choice varies. Now I using Stata to find out the statistically significant factors(whose P value is less than 0.1). I am using the command "mlogit". I got only 8 significant variables in case of message type 3 and 4 significant variables in case of message type 1 considering the message type 2 as a base. But I need 15 or more significant variables(P value less than 0.1) for each message type. Here the .dta and .smcl files are added.I got the result by inputting manually random combination of variables like ( mlogit msgtyp age...(plz see the .smcl file)). I need to know if there is any other easy way by which I can get the combination of the independent variables (age ,gender e.t.c) with p value <0.1 without putting the variables manually or any other easy way That means how I can get the highest number of significant independent variables.I am badly in need of a solution.
Attached Files

105427.smcl (203 Bytes, 1 view)

data.dta (31.3 KB, 2 views)
Comment
Miraz Mahmud

Join Date: Aug 2014

Posts: 19
#6

02 Sep 2014, 09:26

please check it .smcl
Attached Files

105427.smcl (6.7 KB, 1 view)
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#7

02 Sep 2014, 09:28

I need 15 or more significant variables

Why do you need to meet this criterion? Where does it come from?

Have you looked at help stepwise? Would stepwise methods do what you're wanting to do? (I suppose you are aware that there is a large literature that advises against using such methods for model selection -- you can search Statalist for posts and references.)

I think it's worth you responding about these bigger issues rather than providing us with data and output. (And, please, have a look at the FAQ's advice about the best ways to provide information to the list
Comment
Miraz Mahmud

Join Date: Aug 2014

Posts: 19
#8

02 Sep 2014, 09:39

Sir, it is my undergrad thesis project, our supervisor suggested us to find out at least 15 variables to make it a good project. Sir, I am new in using Stata, I am not familiar with stepwise methods.Thank you sir for your kind advise.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#9

02 Sep 2014, 10:00

I'm not sure I understand exactly what you are trying to do here. But looking at your data, it seems you have a bunch of potential predictor variables, all of which are categorical in nature. (By the way, v23 is strange both in having an odd name, and taking on values 0, 1, and, in a single observation, 10. Is that 10 a data entry error? Is v23 itself some kind of import mistake?) And you want to identify at least 15 of them as significant (p < 0.10) predictors in bivariate analysis with your three-level outcome variable. So, I would not use -mlogit- for this purpose. Rather, I would do cross-tabulations and chi square tests for the outcome vs each variable. Since you have 84 variables to test, and want to identify the significant ones, I would do this in a loop over the predictor variables, and as we go I would build up a local macro containing the names of those variables that show up with p < 0.1.

Code:

set more off // INITIALIZE LOCAL MACRO LISTING SIGNIFICANT RESULTS // AS EMPTY local significant foreach v of varlist gender-fir { tab msgtyp `v', chi2 // CHECK SIGNIFICANCE OF RESULTS // IF SIGNIFICANT, ADD `v' TO LIST if r(p) < 0.1 { local significant `significant' `v' } } // SHOW THE RESULTS display as text "The following variables have p < 0.1: " display as result "`significant'"

Whether there actually are 15 variables that will achieve this level of significant, I don't know. Also, I don't think this is a particularly good way to select variables for inclusion in a larger model, but that is a discussion for another day. The code above should do what you have requested if there really are 15 such variables.
Comment
Miraz Mahmud

Join Date: Aug 2014

Posts: 19
#10

02 Sep 2014, 11:29

sir,I surveyed in a city. I asked the drivers of that city to choice one of the 3 message type(dependent variables) which will be displayed on VMS(variable message sign, a system to show different message to the drivers) and also asked them their different characteristics like age, gender....total 84 characteristics (the independent variables) on which their choice of message type may vary. now i want to find out only the significant characteristics(independent variables) on which their choice really vary.(whose p value id less than 0.1). A photo of vms is shown here.

1 Photo
Comment
Miraz Mahmud

Join Date: Aug 2014

Posts: 19
#11

02 Sep 2014, 11:35

here the independent variables are the predictor variables. after getting all the predictor variables together I will develop the equation to predict the message type and v23 it is a data entry error.

Last edited by Miraz Mahmud; 02 Sep 2014, 11:50.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#12

02 Sep 2014, 11:57

OK, but did you try the code I posted? If so, how is the result different from what you are looking for? I just ran it on the data set you posted previously, and it identified and named 22 predictor variables that are significant at the 0.1 level as predictors of msg_typ in bivariate analysis. If that isn't what you need, please explain more fully what you do need.

Now, bear in mind that when you throw these variables into the multivariable regression (mlogit), because they are not independent of each other, not all of them will remain significant. Your model may end up with only a handful, or even none, being individually identified as significant in the context of all the others.
Comment
Miraz Mahmud

Join Date: Aug 2014

Posts: 19
#13

02 Sep 2014, 12:15

yes sir, I ran your code, I got result but some error also. Sir, would you please give me the .smcl file after running the code on my data set?
I want to use mlogit but I will be greatly gratified if you give me the .smcl file with 22 predictor variables. the smcl i got is attached here.
Attached Files

last.smcl (2.0 KB, 1 view)
Comment
Sarah Edgington

Join Date: Apr 2014

Posts: 284
#14

02 Sep 2014, 12:23

It looks like you ran the code by copying and pasting into the command line rather than running it from a dofile. Stata doesn't recognize // as indicating comments when running command interactively. You need to copy Clyde's code into the dofile editor and run from there. You can also replace the // at the beginning of lines with *. Doing that will allow you to run the commands interactively. However, for the purpose of reproducing your results you should save a dofile with the commands.
Comment
Miraz Mahmud

Join Date: Aug 2014

Posts: 19
#15

02 Sep 2014, 12:42

Thank you so much Sarah Edgington,Clyde Schechter ,Stephen Jenkins and Rich Goldstein for helping me. The command worked. please let me know if there is any way to do this using "mlogit" because it is my thesis project and my supervisor told me to do it using mlogit but I will try to do it by this method.
Comment

Announcement

Finding out statistically significant data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment