dividing sample by mean for values over fifty percent and below fifty percent of a variable

aaati ali

Join Date: Mar 2021

Posts: 16
#1

dividing sample by mean for values over fifty percent and below fifty percent of a variable

11 May 2022, 08:31

I have a variable CSR which is panel and I want to split it by mean , so that i can perform ttest on it.I want to divide the values for higher csr and lower csr by mean.can someone help me in this regard
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 36058

11 May 2022, 09:15

If you split at the mean you will only rarely get 50% above and 50% below. Small print aside, that's the definition of the median.

But wanting a t test is puzzling here, with nothing else said. I may be misreading your intentions, but consider this concrete example.

Code:

. sysuse auto, clear
(1978 automobile data)

. su mpg, meanonly

. gen above = mpg > r(mean)

. tab above

      above |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         43       58.11       58.11
          1 |         31       41.89      100.00
------------+-----------------------------------
      Total |         74      100.00

. ttest mpg, by(above)

Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
       0 |      43     17.4186    .3768385    2.471095    16.65811     18.1791
       1 |      31    26.67742    .8313574    4.628802    24.97956    28.37528
---------+--------------------------------------------------------------------
Combined |      74     21.2973    .6725511    5.785503     19.9569    22.63769
---------+--------------------------------------------------------------------
    diff |           -9.258815    .8326686               -10.91871    -7.59892
------------------------------------------------------------------------------
    diff = mean(0) - mean(1)                                      t = -11.1194
H0: diff = 0                                     Degrees of freedom =       72

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

. ret li

scalars:
              r(level) =  95
                 r(sd) =  5.785503209735141
               r(sd_2) =  4.628801837583124
               r(sd_1) =  2.471095362862147
                 r(se) =  .8326685828712946
                r(p_u) =  1
                r(p_l) =  1.37516360994e-17
                  r(p) =  2.75032721988e-17
                  r(t) =  -11.11944763395385
               r(df_t) =  72
               r(mu_2) =  26.67741935483871
                r(N_2) =  31
               r(mu_1) =  17.41860465116279
                r(N_1) =  43

.
I showed convincingly (look at the P-value, much less than 1 in a quadrillion) that cars with higher mpg have on average higher mpg than those with lower mpg. The consequence follows from the set-up, but is not interesting.

If you want something different, fine, and please explain what that is.

Comment

aaati ali

Join Date: Mar 2021

Posts: 16
#3

13 May 2022, 09:06

Sir, thankyou for the reply.actually I have to test a non linear regression, where a variable showing extreme negative condition and extreme positive condition increases other variable.I have to somehow regress it.Its like present of an insurance, where insurance offers an attention and it increases attention level but absence of an insurance also increase attention because it poses a threat.But I cannot measure absence of insurance because insurance has only positive values and attention also has positive value.So how can i relate this to attention with a model;My idea as that if i take mean of insurance, may be below mean is a state showing negative or threat of absence of insurance and above mean is a effect of presence of insurance state.so thats why i was testing for difference between two states so that i can relate one with negative or absent effect n other with positive effect wth ttest.and moreover how can i regress when both increase attention.i have no idea how to estimate it;can you help in this regard.is there some no linear model which can give me some sort of relationshio
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36058
#4

13 May 2022, 09:27

Sorry, but I can't follow that easily except that I see that you want to do something different from #2.

But backing up, #2 showed already that if you calculate the mean of something with say summarize then you can identify observations with values above and below that mean.

Backing up yet further, I sense that dichotomizing simplifies by leaving out most of the information. If I had amount of insurance as a predictor, I wouldn't want to dichotomize it.
Comment
aaati ali

Join Date: Mar 2021

Posts: 16
#5

14 May 2022, 12:14

Sir problem is there are only rating for insurance which are positive.so i cannot calculate how to calculate for absence of insurance or threat by absence of insurance.second thing is that both states of insurance and non insurance result in increase of attention;can yo suggest me a model where both negative and positive states actually result in increase in attention;;how can i map that;;;it means that both values of insurance result in increase of attention level..so is there any modelling can be done?
Comment

Announcement