Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Analyze effect of subgroups of variables on the whole group of variables

    Hi,

    I am a total beginner with Stata, so please excuse the title mess.

    I have a dataset which consists of peoples preferences (1-5 with 5 being the highest value) for different types of animals.
    It looks like this:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(id age_person cat1 cat2 cat3 dog1 dog2)
    1 34 2 5 5 4 2
    2 45 3 4 5 5 2
    3 75 5 4 5 3 3
    4 36 3 5 5 3 5
    5 22 2 3 4 5 2
    end
    I would like to adress the hypothesis that "people prefer dogs over cats".
    I know that I could do this with descriptive statistics and just use the mean scores for cats and dogs.

    However, I thought if I create a variable "animal" which is the aggregated score for all dogs and cats per person and two variables "dog" and "cat", which are the aggregates of the respective animal, and then do a regression, I could also control for the age of the respondent as well. (My actual dataset has more respondent-specific variables such as income etc. which I would also like to include.)

    Is that something I could do? Or am I missing the point here?

    Thank you

  • #2
    Paul:
    welcome to this forum.
    What follows is what springs to my mind (I've also added a squared term for -age_person- searching for ta urning point; no evidence detected in your data excerpt, though; no evidence that cats are preferred to dogs, either):
    Code:
    . egen mean_cat=rowmean(cat*)
    
    . egen mean_dog=rowmean(dog*)
    
    . g diff_mean_cat_dog= mean_cat-mean_dog
    
    . regress diff_mean_cat_dog c.age_person##c.age_person
    
          Source |       SS           df       MS      Number of obs   =         5
    -------------+----------------------------------   F(2, 2)         =      3.40
           Model |  1.99992025         2  .999960124   Prob > F        =    0.2275
        Residual |  .588968217         2  .294484108   R-squared       =    0.7725
    -------------+----------------------------------   Adj R-squared   =    0.5450
           Total |  2.58888847         4  .647222116   Root MSE        =    .54266
    
    -------------------------------------------------------------------------------------------
            diff_mean_cat_dog | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    --------------------------+----------------------------------------------------------------
                   age_person |   .0812229    .087661     0.93   0.452    -.2959519    .4583977
                              |
    c.age_person#c.age_person |   -.000463   .0008604    -0.54   0.644    -.0041651    .0032391
                              |
                        _cons |  -1.863632   1.952927    -0.95   0.441     -10.2664    6.539134
    -------------------------------------------------------------------------------------------
    
    .
    Just out of curiosity; how could you test hypotheses via descriptive statistics?
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hi Carlo,

      thank you very much for your answer. I will look further into this.

      Regarding your question: I can't! That was very poorly worded by me, so please just ignore it.

      Thanks a lot

      Comment


      • #4
        Paul:
        my question was tongue in cheek (and deliberately so!) .
        Enjoy staying with the forum.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Hi again,

          I just wondered: 1) You say that there is no evidence that cats are preferred to dogs, how did you come to that conclusion? I cannot seem to get that information out of the table you posted...

          And: 2) Given 'age' was significant, would the correct interpretation be: 'With increased age, one is more likely to prefer cats over dogs'?

          Thanks

          Comment


          • #6
            Paul:
            1) _cons does not reach statistical significance;
            2) linear and squared terms for -age- are not statistical significant. Therefore, your statement cannot be defended.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Paul:
              let's focus the example of a single variable only:
              Code:
              . input float(id age_person cat1 cat2 cat3 dog1 dog2)
              
                          id  age_per~n       cat1       cat2       cat3       dog1       dog2
                1. 
              . 1 34 2 5 5 4 2
                2. 
              . 2 45 3 4 5 5 2
                3. 
              . 3 75 5 4 5 3 3
                4. 
              . 4 36 3 5 5 3 5
                5. 
              . 5 22 2 3 4 5 2
                6. 
              . end
              
              . egen mean_cat=rowmean(cat*)
              
              . egen mean_dog=rowmean(dog*)
              
              . g diff_mean_cat_dog= mean_cat-mean_dog ///if this variable>0 the sample on hand prefers cats to dogs
              
              . regress diff_mean_cat_dog
              
                    Source |       SS           df       MS      Number of obs   =         5
              -------------+----------------------------------   F(0, 4)         =      0.00
                     Model |           0         0           .   Prob > F        =         .
                  Residual |  2.58888847         4  .647222116   R-squared       =    0.0000
              -------------+----------------------------------   Adj R-squared   =    0.0000
                     Total |  2.58888847         4  .647222116   Root MSE        =     .8045
              
              ------------------------------------------------------------------------------
              diff_mean_~g | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
              -------------+----------------------------------------------------------------
                     _cons |         .6   .3597839     1.67   0.171    -.3989201     1.59892
              ------------------------------------------------------------------------------
              
              . mean diff_mean_cat_dog
              
              Mean estimation                                   Number of obs = 5
              
              -------------------------------------------------------------------
                                |       Mean   Std. err.     [95% conf. interval]
              ------------------+------------------------------------------------
              diff_mean_cat_dog |         .6   .3597839     -.3989201     1.59892
              -------------------------------------------------------------------
              
              .
              
              As we can see calculating the mean of -diff_mean_cat_dog- with two equivalent procedures, there's no evidence that cats
              are preferred in dogs in the sampme under investigation.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment

              Working...
              X