Integer related issue

James Lindsey

Join Date: Aug 2020

Posts: 4
#1

Integer related issue

08 Aug 2020, 20:18

I'm new to STATA. I used Excel prior but recently purchased STATA. I'm learning as I go. Here is my issue that I'd like help with. Below is the error message I received [INDENT=2]

manova GeneratesSatisfactionSAT = Transformational Transactional PassiveAvoidant
Transformational: factor variables may not contain noninteger values
r(452);[/INDENT]

The data comes from a survey that I championed. Multiple questions on the survey fed to these results in the variables that I'm using in the MANOVA. All positive numbers, no negatives. Some are decimals. Examples are 4, 3, 3.75, .4, .8, and so on. Other than rounding the numbers to remove the decimal to make them integers, what can I do? I'm just starting. My goal is to determine if there are statical significance in the variables. If yes, then I planned to run post hoc t-tests?

Jim
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30127
#2

08 Aug 2020, 20:27

The predictors in (M)ANOVA are, by definition, categorical variables. If your variables have values like 3.75, .4, and .8 then one of two things must be true:

a) Those numerical values are meaningless--they are just tokens representing different categories in a categorical variable. In that case you should just recode those variables giving each of the categories a non-negative integer value. Note: Do
not
do this by rounding, because 3.75 will round to 4 and the distinction between those two categories will be lost!

b) Those numerical values are, indeed, meaningful. There's a reason that it's 3.75 and not 3 or 4, and, in fact, 3.75 corresponds to something that is truly between 3 and 4, closer to 4. In that case, these variables are simply not appropriate to use as predictors in (M)ANOVA.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#3

08 Aug 2020, 20:42

Further to what Clyde is saying, in your case -regress- would do just fine. You just need to figure out the nature of your variables in the lines that Clyde explained, because the nature of the variables determines how you include them in regress.
Comment
James Lindsey

Join Date: Aug 2020

Posts: 4
#4

09 Aug 2020, 09:58

Gentlemen, Let me see if I can supply a bit more information.
The survey I used had 45 questions for the participants to answer. Each of the 45 questions used a Likert scale (0 to 4) with options of 0,1,2,3,or 4. with 0 = unsure to 4 = always or nearly always. I believe these inputs by the participants are ordinal. For discussion, lets say of the 45 questions in the survey that questions 1, 4, 9, and 40 were questions measuring the participant‘s preference for a transactional leadership style. The first participant may have responded to these four questions with these inputs: 2, 3,0,4 = 9/4 or 2.25 mean score for Transactional leadership style... I have 165 observations, so this example is repeated (with different participants and ratings) for transactional leadership 165 times. Sometimes the mean score is 4, 3, 2.25, .8, etc. varying for each of the 165 participants. This is similar too for transformational and passive/avoidant leadership styles.

In addition to the 45 questions that are standard In the survey I added five demographic questions per participants to answer which were birth generation (five categories to choose from), gender (two categories), employee reports (two categories), and two tenure questions (each with categories to select).

Questions to answer from the data (which I did in Excel), but trying to now learn how to do in STATA are:
1. What is the overall preference mean scores for Transformational, transactional, and passive/avoidant leadership styles?
2. Is there a statistically significant difference between genders and preferred leadership style?
3. Is there a statistically significant difference between birth generations and preferred leadership style?

I used ANOVAs to first check for statistical significance, then used post hoc t-tests to determine if the difference were significant at a 95 or higher level.

Jim

Last edited by James Lindsey; 09 Aug 2020, 10:03.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30127
#5

09 Aug 2020, 10:12

Yes, Likert scale responses are ordinal variables. And although some disagree with doing so, they are often treated as interval level information: taking a mean score is implicitly treating them as interval level data, and, in particular, as continuous variables. So they are not suitable predictors for (M)ANOVA. (By contrast, it sounds like your demographic variables, employee reports, and tenure questions are categorical variables. By themselves, those would be suitable for (M)ANOVA, but the need to include these surveys rules that out.

Your analysis should be regression based.

As an aside, from what you have described, even if you had all categorical variables, I don't see anything that would call for using MANOVA instead of ANOVA. The M in MANOVA stands for multivariate, and it is used when you have more than one outcome variable, which does not appear to be the case here. This has some relevance to how you proceed because you will have a similar choice between -reg- and -mvreg-, and, at least based on what you have said so far, there would be no reason to use the latter.

I can't really comment on what analyses you did in Excel, and whether they have any validity or not. Excel is not a statistical package. It is a spreadsheet. At least in its earlier versions its statistical procedures were known for doing many things flat-out wrong. I have heard that some of those errors have been corrected, but as I never use those features of Excel, I can't say anything about its current level of correctness. Suffice it to say, if you asked it to do an ANOVA and gave it variables like the ones you show then one of the following must be true:

1. Behind your back it treated them as discrete variables, completely misrepresenting the scaling properties of your data.
2. It substituted regression for ANOVA without telling you what it did.
3. It did something else that is completely mystifying, but apparently didn't see fit to explain it to you, so you are left with no way of knowing what it did with your data. The results you got may mean something, but it is anybody's guess what that might be.

Excel should NEVER be used for statistical analysis. Even if its routines have been corrected, it leaves no audit trail of what you have done. Excel is fine for sharing data with people who do not have a common statistical package because just about every other data software can read and write Excel files. It is often helpful for creating visually appealing displays of the data. But analysis itself requires an analytic package, and it requires a complete record of everything that has been done, from the first opening of the raw data set to the final analyses. If in the future somebody hands you an analysis report and it says that analysis was done using Excel, my best advice would be to stop reading right there.

Last edited by Clyde Schechter; 09 Aug 2020, 10:26.
Comment
James Lindsey

Join Date: Aug 2020

Posts: 4
#6

09 Aug 2020, 10:36

Clyde,
Thank you for the engagement and helpful insights. I did try using the ANOVA first, but got the same type of error. Not fully understanding the M-ANOVA, I tried running it... it being the last attempt, so I copied the M-ANOVA error and not the first ANOVA error message. I’m not at my PC now but using my iPad, so I will have to play around with regression later. I don’t work with statistics everyday, Any helpful tip for me?

EDIT:
The Excel program I used had the statistical package added and a very current version. I did do some comparison checks against the information, and found them to be accurate. I started to use SPSS (for a very short period of time) but the cost to continue for me was prohibited. I’ve made a choice to switch over to STATA after taking an abbreviated three class webinar. I purchased a perpetual license v. 16.1. The learning curve is rather steep with STATA, but I will get there.

I’m open to other persons’ perspective and insights too.

Respectfully,

Jim

Last edited by James Lindsey; 09 Aug 2020, 11:00.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30127
#7

09 Aug 2020, 11:00

The best advice I can give you is the advice that few people follow: read the instruction manual first! Most of the documentation in the Stata PDF manuals is excellent. It won't teach you statistics, but it tells you what the command syntax is and explains what the command does. For most commands there are good worked examples that show how it is done in practice. Overall, the time invested reading the manual section before you use a command is amply repaid.

In particular, before you go ahead and do a regression analysis on this data, make sure you understand the basics of regression. If you have taken a statistics course that includes regression, review your notes or that part of the textbook before your proceed. Also, read the chapter on -regress- in the PDF manuals that are installed along with Stata. (The simplest way to find that is to type -help regress- in the command window. The Viewer window will open, and near the top, in blue, there is a link to the PDF manuals Click there and you will be taken to it. That way you will know both the basics of regression, and the basics of how Stata does it. Then try out your command. You may find yourself a bit perplexed by the large number of options available on some commands. Stata's commands are, for the most part, very well designed, so that the default (which is what you get if you specify nothing at all for any option that isn't "required") is usually broadly applicable and will be right most of the time.

Use factor-variable notation. Precede your categorical variables with i. and your continuous variables with c.. But know before you go: read the PDF manual chapter on it. You can get there by starting with -help fvvarlist- and then clicking on the blue link. Using factor-variable notation will give you a neater, more readable regression output. And if you later need to use the -margins- command, you will have laid the groundwork for doing that.

Once you have done your regression, you will want to see if your results are sensible. Is R² appropriate? Is it large enough to say that the regression model explains a meaningful amount of the variance in the data? As it small enough that it doesn't suggest that there is a mistake in the data or the command set up? Do the coefficients have the signs you expect and the general magnitude you expect? If not, can you explain why not? How does the residuals vs fitted plot look (-help rvfplot-)?
Comment
James Lindsey

Join Date: Aug 2020

Posts: 4
#8

09 Aug 2020, 11:16

The manual is a good suggestion, but a challenge IMHO to decipher at times. I purchased off amazon, A Gentle Introduction to Stata. I’ve been using the SYSUSE data supplied in STATA and working the problems presented in the purchased book. Thank you Dr. Schechter for your insights. Yes, I’ve taken college level probability and Stat. Courses... about 35 years ago. In 2017 I took an inferential stats. Course as a part of my doctorate. Sadly, due to issues, the section on regression analysis was not covered.

Respectfully,
Comment

Announcement

Integer related issue

Comment

Comment

Comment

Comment

Comment

Comment

Comment