Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Nested between-subject design (non-binary independent variable) analysis in Stata

    Hello everyone,

    I am new here so feel free to ask if I incorrectly formatted anything!

    First let me explain the research a bit to give you a general understanding of what I am trying to achieve. I obtained data from participants who designed an ice cream with different ingredients. They were randomised over 3 different groups: a Control Group, a Measurement Group and a Certificate Group. There were a total of 4 categories of ingredients: chocolate, nuts, fruit and other.
    - In the Control Group I just asked participants to design a delicious ice cream.
    - In the Measurement Group I asked people to design a delicious ice cream and I told them I would measure the amount of chocolate/nuts/fruit ingredients (one of those was shown).
    - For the Certificate Group I I asked people to design a delicious ice cream & told them I would measure the amount of chocolate/nuts/fruit ingredients (one of those was shown) & I would give
    them a certificate for the strategy (delicious ice cream).


    However due to an error in the randomisation algorithm I received mostly chocolate observations and the nuts/fruits measurements were basically useless.

    My results had one variable (treatment) that included overall in which group they were: a categorical variable that had the values 0 - Control Group, 1- Measurement Group, 2 - Certificate Group
    I created dummy variables based on the ingredients that were measured, because nuts and fruit basically became useless because of too little observations I will show the chocolate dummies. I also created another dataset for these dummies that did not include the nuts & fruit observations.

    ChocolateMeasurementGroupdummy: 1 - Chocolate Measurement Group, 0 - Control Group or Chocolate Certificate Group >>>> remember that the nuts/fruit observations were removed
    ChocolateCertificateGroupdummy: 1 - Chocolate Certificate Group, 0 - Control Group or Chocolate Measurement Group
    ChocolateControlGroupdummy: 1 - Control Group, 0 - Chocolate Certificate Group or Chocolate Measurement Group

    my dependent variable is Chocolate Amount, which is a continuous variable with a min of 0 and a max of 5.

    Now I am testing 2 things:
    H1 The Measurement Group put significantly more chocolate in their ice creams than the Control Group
    H2 The Measurement Group put significantly less chocolate in their ice creams that the Certificate Group

    I need to know if H1 is true before I can test H2

    I was told to use one-tailed and set an alpha of 10% (0.1) but the issue is that ANOVA, which would be most logical to use, cannot be used because of the one-tail because F-statistics are not designed for one-tailed. Instead, I created 2 separate datasets from the original chocolate-only dataset and ran t-tests using stata's ttest command.
    - Dataset 1. included only Chocolate Measurement Group observations & Control Group observations
    - Dataset 2. included only Chocolate Measurement Group observations & Certificate Group observations

    Stata code:
    In dataset 1. to test H1 I used:
    Code:
    ttest ChocolateAmount, by(ChocolateMeasurementGroupdummy)
    -> in dataset 2. I removed Chocolate Certificate Group observations, so only Measurement and Control remain

    In dataset 2. to test H2 I used:
    Code:
    ttest ChocolateAmount, by(ChocolateMeasurementGroup dummy)
    -> in dataset 2. I removed Control Group observations, so only Measurement and Certificate remain

    Now Question 1: Does this make sense? Or should I still include every treatment group in both t-tests?

    After this I wanted to perform regressions to test H1 and H2 further. I was thinking about using a normal linear regression and I know this is possible with dummy variables but I am cannot seem to wrap my head around how to do it so it makes sense. I was told I had a nested between-subjects design. Frankly, I have no idea how to do this in Stata. My teacher recommended using the following stata code and run this in the original chocolate-only dataset that only included the chocolate measurement observations and control group (so no nuts/fruits):

    Code:
    regress ChocolateAmount ChocolateMeasurementGroupdummy ChocolateCertificateGroupdummy
    However, with this code I don't know what I am testing or how to interpret this. I seems to me I am not testing H1 because I still have the Certificate Group observations in there. And I also do not seem to be testing H2 because of the Control Group observations.

    So Question 2: how do I perform a regression that answers my H1 & H2 and that I can interpret.

    I am aware that there are very complex models and commands that do this, but I do not have enough statistical knowledge and background to use these. So the tools I have are basically simple regressions and t-tests.

    Thank you very much for your help in advance!

    Last edited by Olivia Fisher; 03 Jan 2019, 04:16.

  • #2
    I don't understand in what sense there is any nesting in this design--perhaps there is more to it than you have explained. Anyway, I'm going to ignore that.

    Basically you have a three-group randomized trial, and the groups actually form an ordinal variable from control to measurement to certificate. So you need a single group variable with 0 = control, 1 = measurement, 2 = certificate.

    Your hypothesis is that the outcome variable, chocolate amount will increase monotonically from group 0 to group 1 and group 1 to group 2.

    You can test this as follows:

    Code:
    regress ChocolateAmount i.group
    margins group
    margins ar.group, contrast
    The -regress- command is just a different way of doing ANOVA. The first -margins- command will show you the expected outcome in each group. The second -margins- command will show you the contrast between each group and the one "below" it, along with significance tests. The significance tests are based on F-statistics, but those are just the squares of the t-statistics. The p-values you get are two-sided, but you can just divided the p-value by 2 to get the one-sided p-value if the difference between the groups runs in the direction predicted by your hypothesis.

    As an aside, I will remark that using both one-tailed testing and a p-value of 0.10 is an extremely lenient approach.

    Comment


    • #3
      Thank you Clyde! This helps a lot.
      Yes, I am aware that one-tailed and alpha 0.1 is extremely lenient and not my preferred option either.

      Comment

      Working...
      X