Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Ronald, you could have, perhaps, been a little more helpful to us than just looking for a working solution. We can't help you unless you help us. Please do read the FAQ section where it says, you are expected to clearly state the nature of your problem, let us know 'exactly' what you have done and 'exactly' what you have received as output. Up, people spent time on changing the ID variable of your dataset, and you informed us that it was a waste of time because of you deliberately and unnecessarily deviated us from the original data type. It would be useful to learn that not necessarily everyone looks at the problem domain from the same aspect therefore, clearly stating the nature of the problem helps everyone and you increase the chance of getting an efficient reply. Now, as I said before, I am not sure what analysis you are trying to undertake. From your post#1: if you are
    trying to plot the group (male/female) mean by year with their 95% CI you need to bring the group means in one column (example below). Have a look at the graph below from some mock data, if that is what you want:

    Code:
    replace mean_male=mean_female if mean_male==.
    
    /*The easiest way to get the graph is to install ''lgraph" from ssc". Type: ssc install lgraph */
    
    lgraph mean_male year gender /*this will give you the graph below*/



    [ATTACH=CONFIG]nArray[/ATTACH]:





    Code:
       +-------------------------------+
         | id   year   gender    mean_mf |
         |-------------------------------|
      1. |  1   2012     Male   1.907674 |
      2. |  1   2013     Male   6.505144 |
      3. |  1   2014     Male   3.187489 |
         |-------------------------------|
     28. | 10   2012   Female   1.801927 |
     29. | 10   2013   Female   5.023422 |
     30. | 10   2014   Female   5.370842 |
         +-------------------------------+
    You have a panel dataset and many more statistical modelling possibilities are there. But first of all, you need to let us know what you want.

    Best,

    Roman

    Comment


    • #17
      For some reason the server failed to upload the graph.
      Roman

      Comment


      • #18
        Thank you Roman, and sorry if I might have provided only limited information to discuss the problem.
        The main problem I currently still have is calculating the difference, as mentioned in my prior post.

        I have the following data and would like the new variable to display the difference between the yearly means between the two categories (male and female), e.g. 2012 = 3.6 - 1.8 = 1.8 and so on. The sample period is not always the same for all IDs, so for some I have 3 years, for others 5 years. That is probably why the initial approach bysort year (id) : gen difference = mean_male - mean_female[_n-1] was not porividing the correct results.
        ID Gender Year Income mean_male mean_female
        1 Male 2012 3 3.6 .
        1 Male 2013 4 3.7 .
        1 Male 2014 5 3.9 .
        2 Female 2012 1 . 1.8
        2 Female 2013 2 . 2.1
        2 Female 2014 3 . 2.8
        3 Male 2012 5 3.6 .
        3 Male 2013 6 3.7 .
        It would be a huge help if you could help me with that issue.

        Comment


        • #19
          I think that the main reason that you haven't received a working answer is that you tried to tell a story about what you have and what you want, rather than show what you have and what you want. Several participants actually offered some time trying to find out.

          Here is just a suggestion for a more practical organization of your data: generate a single mean value:
          Code:
          generate meanx = mean_male
          replace meanx = mean_female if missing(meanx)

          Comment


          • #20
            Thank you for your comment. I thought I showed what I have (excerpt of my data -> table) and what I want (expressed in my first post). How does the single mean value help me in calculating the difference? As I wrote I would like a new variable to display the difference between the yearly means of the two categories (male and female), e.g. difference for 2012 = mean_male (2012) - mean_female (2012) = 3.6 - 1.8 = 1.8 and so on. However, I do not have a code that calculates the desired outcome as the sample periods vary for most of the groups (ID). Hence, the help of the community would be highly appreciated. Thanks

            Comment


            • #21
              Question: Do you call your variables mean_male and mean_female because they represent the mean of several measurements? For example, does mean_male = 3.6 in the first observation represent the mean of several measurements, or is it a single measurement? If it is a single measurement, including "mean" in the variable name is designed to confuse the reader. If it is the mean of several measurements, do you have the original measurements - which are needed if you want to calculate standard errors and confidence intervals? Also, if it is a mean of several measurements, do you then want to calculate the mean of means? In your example you show two observations of males in 2012; should the both contribute to the mean for males 2012?

              Comment


              • #22
                It is a mean (per group, here "Gender") for every seperate year for several measurements (all observations for one group per year). I have the original measurements (in my example "income"). I want to calculate the simple difference between the mean of one group and the other group per year (basically substitution of mean_male from mean_female).
                Yes the two observations of males in 2012 contribute to the mean for males 2012.
                Thank you

                Comment


                • #23
                  Ronald I am not sure why you want to calculated a column for difference while you could easily merge your "mean columns" into one and get test for male-female differences with 95% CI. However, if that is the thing you want, you need to merge the two columns into one anyway. Try the following, example given from your data:

                  First I merged the two columns and renamed the column as "mean_mf:

                  Code:
                  replace mean_male=mean_female if mean_male==. & mean_female!=.
                  drop mean_female
                  ren mean_male mean_mf
                  This gives me a dataset like the following:


                  Code:
                     +----------------------------------------+
                       | id    gender   year   income   mean_mf |
                       |----------------------------------------|
                    1. |  1     Male    2012        3       3.6 |
                    2. |  1     Male    2013        4       3.7 |
                    3. |  1     Male    2014        5       3.9 |
                       |----------------------------------------|
                    4. |  2   Female    2012        1       1.8 |
                    5. |  2   Female    2013        2       2.1 |
                    6. |  2   Female    2014        3       2.8 |
                       |----------------------------------------|
                    7. |  3     Male    2012        5       3.6 |
                    8. |  3     Male    2013        6       3.7 |
                       +----------------------------------------+
                  Now run the following code:

                  Code:
                  gen dif=.
                  
                  levelsof year,local(levels)
                  foreach l of local levels {
                      di "`var'"
                      qui ttest mean_mf if year==`l',by(gender)
                      replace dif=(`r(mu_2)'-`r(mu_1)') if year==`l'
                  }
                  This calculates a column named as "dif" for male-female differences by year. Now you can see that "difference" values are repeated as data are repeated (see below).


                  Code:
                    +----------------------------------------------+
                       | id    gender   year   income   mean_mf   dif |
                       |----------------------------------------------|
                    1. |  1     Male    2012        3       3.6   1.8 |
                    2. |  1     Male    2013        4       3.7   1.6 |
                    3. |  1     Male    2014        5       3.9   1.1 |
                       |----------------------------------------------|
                    4. |  2   Female    2012        1       1.8   1.8 |
                    5. |  2   Female    2013        2       2.1   1.6 |
                    6. |  2   Female    2014        3       2.8   1.1 |
                       |----------------------------------------------|
                    7. |  3     Male    2012        5       3.6   1.8 |
                    8. |  3     Male    2013        6       3.7   1.6 |
                       +----------------------------------------------+
                  If the repeat does not bother you it is done. If it does, clear it up with the following command:

                  Code:
                  egen tag_=tag(year dif)
                  replace dif=. if tag_==0
                  drop tag_
                  It's cleared up.

                  Code:
                     +----------------------------------------------+
                       | id    gender   year   income   mean_mf   dif |
                       |----------------------------------------------|
                    1. |  1     Male    2012        3       3.6   1.8 |
                    2. |  1     Male    2013        4       3.7   1.6 |
                    3. |  1     Male    2014        5       3.9   1.1 |
                       |----------------------------------------------|
                    4. |  2   Female    2012        1       1.8     . |
                    5. |  2   Female    2013        2       2.1     . |
                    6. |  2   Female    2014        3       2.8     . |
                       |----------------------------------------------|
                    7. |  3     Male    2012        5       3.6     . |
                    8. |  3     Male    2013        6       3.7     . |
                       +----------------------------------------------+
                  Best,
                  Last edited by Roman Mostazir; 04 Dec 2014, 18:27.
                  Roman

                  Comment


                  • #24
                    Roman, thank you very much for your help! Your approach is working perfectly fine One question out of curiosity, why do you use qui ttest? For a Stata-Novice your approach (with the macros) looks quite complicated.

                    Also how can I display the yearly upper and lower 95% confidence intervals now that you helped me calculate the differences?

                    Thanks again!

                    Comment


                    • #25
                      t-test calculates the difference between two values (means) and Stata saves the relevant information regarding the test into scalars (type: "return list" after a t-test to see what it saves). Therefore, the idea of the test was to capture the underlying saved two values and calculate a difference from them and be displayed in the column and all done in a loop (repeat for a set of criteria) .The 'qui' command stands for 'quietly' meaning we were not interested to see the output from the t-tests. You can ignore that if you want to see the t-test results. Also in the code, just noticed, the line "di `var'" is doing nothing. You can ignore that too.

                      Originally posted by Ronald Biefinger View Post

                      Also how can I display the yearly upper and lower 95% confidence intervals now that you helped me calculate the differences?
                      This is where you have failed to help us. If you want to know the DIFFERENCE BETWEEN MALE AND FEMALE AT EACH YEAR WITH 95% CI, WE NEED INDIVIDUAL LEVEL ORIGINAL MEASUREMENTS FROM WHERE THE MEANS ARE BEING CALCULATED.

                      I will try to explain. To calculate the CI for i.e. year 2012 for Male vs. Female, we need the original data measures from where the means for both groups at 2012 have been calculated. For example, if you have N=10 male and N=15 female for 2012, and we have a mean 3.6 for male at 2012 and 1.8 for female at 2012 then we also have a standard deviation (SD) for male at 2012 and for female at 2012.

                      This SD and N for each group is needed to calculate the standard error (SE) from which 95% CIs are calculated. From your data, we only know that the mean for male at 2012 was 3.6 and mean for female at 2012 was 1.8. We do not know what the SDs and N for each group and therefore, it is not possible to estimate there CI.

                      For example look at the MOCK data below where INDIVIDUAL's income (in thousands) are given. I repeat, each row of id represents an individual be a male or female and their yearly income by year WHICH differs from your data where each row id refers to average income for male/female for each year :

                      Code:
                         +-------------------------------+
                           | id   year   gender   year_inc |
                           |-------------------------------|
                        1. |  1   2012     Male   1.907674 |
                        2. |  1   2013     Male   6.505144 |
                        3. |  1   2014     Male   3.187489 |
                           |-------------------------------|
                       28. | 10   2012   Female   1.801927 |
                       29. | 10   2013   Female   5.023422 |
                       30. | 10   2014   Female   5.370842 |
                           |-------------------------------|
                       31. | 11   2012     Male   2.710236 |
                       32. | 11   2013     Male   6.399412 |
                       33. | 11   2014     Male   3.395019 |
                           |-------------------------------|
                      133. | 45   2012   Female   2.492032 |
                      134. | 45   2013   Female   5.018729 |
                      135. | 45   2014   Female   5.418658 |
                           +-------------------------------+
                      Now, because it is INDIVIDUAL-LEVEL data, it is possible to calculate the average income for male each year, average income for female for each year, test their difference whether year on year basis the male and female differs or not. What you gave us is the mean-level data where N and SD information are lost.

                      Please don't get me wrong, the frustration of this long thread is that, you were warned several times by several members to tell us what exactly you want, you repeatedly said that you want to see the difference between male-female in a column. So that solution is given, and now you getting back to a basic question which could have been solved with a simple answer had it been that you presented us your original individual-level data. 95% CI for a row !!! The statistical concept does not exist.


                      IF YOU WANT mean difference by male vs female by year, present the original data form where you have individual-level data i.e each persons be a male or female by year.
                      Roman

                      Comment


                      • #26
                        Thank you again for your detailed explanation.

                        Maybe it would be easier to use mock data in order to understand the intuition behind the code for calculating the 95% CI for the difference of the yearly means?

                        It is straightforward to use ci income if gender == "Male", by(year) and ci income if gender == "Female", by(year) to get the yearly means, SE and 95% CI. But as I wrote earlier I am stuck with figuring out how to get the 95% CI for the difference which I calculated with your help. Could you explain the basic procedure by using the mock data / placeholder variables?

                        Best
                        Ronald

                        Comment


                        • #27
                          Assume you have the following data with individual's income in millions per-year. And income measured repeatedly for three years:

                          Code:
                            +-------------------------------+
                               | id   year   gender     income |
                               |-------------------------------|
                            1. |  1   2012     Male   1.907674 |
                            2. |  2   2012     Male   2.238695 |
                            3. |  3   2012     Male   2.825735 |
                            4. |  4   2012     Male   1.942716 |
                            5. |  5   2012     Male   2.918935 |
                            6. |  6   2012     Male   2.903867 |
                            7. |  7   2012     Male   2.543149 |
                            8. |  8   2012     Male   2.767552 |
                            9. |  9   2012     Male   3.269608 |
                           10. | 10   2012   Female   1.801927 |
                           11. | 11   2012     Male   2.710236 |
                           12. | 12   2012   Female   1.932059 |
                           13. | 13   2012   Female   2.468674 |
                           14. | 14   2012     Male   2.781168 |
                           15. | 15   2012     Male   2.088007 |
                           16. | 16   2012     Male   3.187452 |
                           17. | 17   2012     Male    2.39654 |
                           18. | 18   2012     Male   2.305332 |
                           19. | 19   2012     Male   3.219035 |
                           20. | 20   2012   Female   2.328676 |
                           21. | 21   2012     Male   2.286691 |
                           22. | 22   2012     Male   2.649525 |
                           23. | 23   2012     Male   3.325634 |
                           24. | 24   2012     Male   1.915123 |
                           25. | 25   2012     Male    1.99407 |
                           26. | 26   2012   Female   2.071749 |
                           27. | 27   2012   Female   2.346393 |
                           28. | 28   2012     Male   2.132564 |
                           29. | 29   2012     Male   1.914288 |
                           30. | 30   2012     Male   1.910289 |
                           31. | 31   2012     Male   2.248585 |
                           32. | 32   2012   Female   2.063762 |
                           33. | 33   2012   Female   2.041844 |
                           34. | 34   2012     Male   2.142874 |
                           35. | 35   2012     Male   2.100614 |
                           36. | 36   2012   Female   2.356926 |
                           37. | 37   2012     Male   2.216128 |
                           38. | 38   2012   Female   2.460236 |
                           39. | 39   2012     Male   3.473259 |
                           40. | 40   2012     Male    2.31481 |
                           41. | 41   2012   Female   1.906167 |
                           42. | 42   2012     Male   2.527757 |
                           43. | 43   2012     Male   2.129541 |
                           44. | 44   2012     Male   3.488268 |
                           45. | 45   2012   Female   2.492032 |
                           46. | 46   2012     Male   2.896644 |
                           47. | 47   2012   Female   2.263464 |
                           48. | 48   2012   Female   1.953104 |
                               |-------------------------------|
                           49. |  1   2013     Male   6.505144 |
                           50. |  2   2013     Male   5.764479 |
                           51. |  3   2013     Male    6.04374 |
                           52. |  4   2013     Male   5.828332 |
                           53. |  5   2013     Male   6.237039 |
                           54. |  6   2013     Male   6.324605 |
                           55. |  7   2013     Male   5.638552 |
                           56. |  8   2013     Male   6.662349 |
                          Run the following codes to calculate the "Difference (dif)" and 95% "Upper (ul)" and "lower limits (ll)" for each year by gender:

                          Code:
                          gen dif=.
                          gen ll=.
                          gen ul=.
                          levelsof year,local(levels)
                          foreach l of local levels {
                              qui ttest income if year==`l',by(gender)
                              replace dif=(`r(mu_1)'-`r(mu_2)') if year==`l'
                          replace ll=(`r(mu_1)'-`r(mu_2)')-invttail(((`r(N_1)'-1)+(`r(N_2)'-1)),0.025)*`r(se)' if year==`l'
                          replace ul=(`r(mu_1)'-`r(mu_2)')+invttail(((`r(N_1)'-1)+(`r(N_2)'-1)),0.025)*`r(se)' if year==`l'
                          }
                          For deleting the duplicates, run the codes below. Ignore if you want the duplicates:

                          Code:
                          egen tagdif=tag(year dif ll ul)
                          foreach var of varlist dif ll ul {
                              replace `var'=. if tagdif==0
                          }
                          drop tagdif
                          Further, if you want to cross-check the calculated difference and the limits:

                          Code:
                          levelsof year,local(levels)
                          foreach l of local levels {
                          di " "
                          di "Year:`l'"
                          ttest income if year==`l',by(gender)
                          }
                          Roman

                          Comment


                          • #28
                            By the way, I still believe you could have achieved what you are trying to without making it complicating. However, just to confirm you one more time, you presented us mean-level data and this solution is for individual-level data.
                            Roman

                            Comment


                            • #29
                              Roman, thank you very much, your solution works perfectly fine. Sorry for complicating things, I will try my best to improve!
                              Best

                              Comment


                              • #30
                                Good luck !!
                                Roman

                                Comment

                                Working...
                                X