Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Display line with difference and 95% confidence intervals

    Hello,

    an easy one for you I guess, but I got stuck:

    I would like to calculate the difference between two variables by year and group and then display the difference as a line with 95% confidence interval.

    I tried around with, but probably I miss something...
    [CODE]bysort year (group) : gen difference = var1 - var2

    What I have as data: I have several individuals (either male or female) and data for several years (panel data), two relevant variables in this context (Var1 = yearly mean of income of all male individuals; Var2 = yearly mean of income of all female individuals). Now I would like to calculate the difference between the yearly means. For each observation I have either a value for Var1 or Var2, thats why I cannot just use gen difference = var1 - var2

    I would like to display the difference as a line (twoway line difference year) and would like to show the 95% confidence interval as well.

    Could you help me out please? Thank you.

  • #2
    You will need the raw data to calculate confidence intervals. Use statsby and ci to create a dataset for graphing.

    Note that code delimiters occur in pairs (how could it be otherwise?). Writing [CODE] is necessary to start but you need to write down its sibling to finish. See FAQ Advice.

    Comment


    • #3
      Ok, thank you for the quick reply. Could you also help me with calculating the difference? Probably I just forgot just one part in the command...
      Code:
      bysort year (group) : gen difference = var1 - var2
      -> is just returning missing values...

      Comment


      • #4
        I confirm that the problem is a subtraction.

        Presumably you are getting missings because the data are not aligned properly, namely groups define disjoint blocks of observations.

        I haven't time to think this through and the problem may need you to do some programming. But I do think you would benefit from using statsby.

        Comment


        • #5
          Ronald, I think you will get a lot of helpful replies if you post a smaple of your data. Please see the FAQ on how to post sample data.
          Roman

          Comment


          • #6
            Example:
            ID Year Income mean_male mean_female
            Male1 2012 3 3.6 .
            Male1 2013 4 3.7 .
            Male1 2014 5 3.9 .
            Female1 2012 1 . 1.8
            Female1 2013 2 . 2.1
            Female1 2014 3 . 2.8
            Male2 2012 5 3.6 .
            Male2 2013 6 3.7 .
            Now the new variable should display the difference between the yearly means, e.g. 2012 = 3.6 - 1.8 = 1.8 and so on

            Comment


            • #7
              Any hints? Thanks you.

              Comment


              • #8
                Hi Ronald,

                Does
                Code:
                bysort year (id) : gen difference = mean_male - mean_female[_n-1]
                give you what you want? (or possibly the other way around, female - male, depending on whether your ID variable is encoded or string)

                The problem with your original code was that it told Stata to subtract one value from another in the same row (observation), and clearly this will always result in missing values since mean_male and mean_female are never both non-missing in the same observation.

                David.

                Comment


                • #9
                  David, thank you.
                  Unfortunately the command is not providing the correct results...

                  Comment


                  • #10
                    Hi Ronald,

                    Could you provide an example, similar to the one you posted previously, showing the incorrect results? Also please give the precise Stata commands you used. Did you try switching "male" and "female" in my suggested command, in case my example was the wrong way around for your data?

                    The code worked fine when I tried it on the fragment of data you provided, but there may be reasons why it would fail in your larger dataset.

                    e.g.
                    Code:
                    clear
                    input str7 ID Year Income mean_male mean_female
                    Male1 2012 3 3.6 .
                    Male1 2013 4 3.7 .
                    Male1 2014 5 3.9 .
                    Female1 2012 1 . 1.8
                    Female1 2013 2 . 2.1
                    Female1 2014 3 . 2.8
                    Male2 2012 5 3.6 .
                    Male2 2013 6 3.7 .
                    end
                    
                    gen obs=_n        // store original sort order
                    bysort Year (ID) : gen difference = mean_male - mean_female[_n-1]
                    sort obs
                    drop obs
                    list, clean noobs
                    
                    /*
                             ID   Year   Income   mean_m~e   mean_f~e   differ~e  
                          Male1   2012        3        3.6          .        1.8  
                          Male1   2013        4        3.7          .        1.6  
                          Male1   2014        5        3.9          .        1.1  
                        Female1   2012        1          .        1.8          .  
                        Female1   2013        2          .        2.1          .  
                        Female1   2014        3          .        2.8          .  
                          Male2   2012        5        3.6          .          .  
                          Male2   2013        6        3.7          .          .  
                    */
                    Thanks,

                    David.

                    Last edited by David Fisher; 18 Nov 2014, 04:33.

                    Comment


                    • #11
                      David, thank you again.

                      The problem is (I guess) that I have an unbalanced panel dataset. So sometimes I have 3 years for an individual, sometimes 5 years etc. - thus I would need a missing value if data is not available for a year, but the correct result if data is available. I believe the [_n-1] optione might create the error. To be clear, I do not get an error message from Stata - just the "wrong" results. Hope you have an alternative solution thanks!!

                      Comment


                      • #12
                        Hi Ronald,
                        OK, I'm lost now. I don't think I've understood your setup properly, as I can't see how my code wouldn't work even if data were not available for a year.
                        I'm going to take a wild guess and say maybe it's your IDs that are the problem. Do they all take the form "Male`i'" or "Female`i'" where `i' is an integer? If so, try this:

                        Code:
                        gen id2 = real(substr(ID,-1,1))
                        bysort Year (id2 ID) : gen difference = mean_male - mean_female[_n-1]
                        This strips off the final integer and saves it in a new variable, "id2", which is also used in the sort. Now, the "[_n-1]" code should only calculate a difference for the same year and for the same value of id2.

                        Does that help at all? If not, I can only suggest copying-and-pasting a sample of your data again, highlighting which rows have incorrect results.

                        Thanks,

                        David.

                        Comment


                        • #13
                          Hi Ronald, you may find David's code for creation of 'id2' not working as they are entered as text rather numbers. An alternative code is provided to extract numeric id2 from your id (see code below). While David's code for calculating the difference worked perfect when I played with your data, I do find the structure of your dataset is unusual. You have a panel dataset but have two response columns for mean male and female while they could be easily in one response column and be analyzed as panel data. It is not clear what you are trying to do, but what I am suggesting will provide you the difference between mean_male and mean_female year by year with 95% CI. Nice looking graphs are possible too. But before anything, you perhaps need to explain a bit more about your hypothesis i.e. what you are trying to do and provide a little more rows of data so that we can play with and assist you.


                          Code:
                          /*Alternative code to extract numeric id2 from id*/
                          
                          gen id2=substr(id,strpos(id,"le")+2,.)
                          Roman

                          Comment


                          • #14
                            Thank you for your replies. Well I think I made an error. The ID is unspecific, it is just starting from 1 up to N. I just wrote Male1, Female2, etc as an example... So I cannot identify the gender by looking at ID. The gender is indicated in another variable called "gender". Sorry if I caused some confusion.
                            ID Gender Year Income mean_male mean_female
                            1 Male 2012 3 3.6 .
                            1 Male 2013 4 3.7 .
                            1 Male 2014 5 3.9 .
                            2 Female 2012 1 . 1.8
                            2 Female 2013 2 . 2.1
                            2 Female 2014 3 . 2.8
                            3 Male 2012 5 3.6 .
                            3 Male 2013 6 3.7 .
                            Last edited by Ronald Biefinger; 20 Nov 2014, 06:59.

                            Comment


                            • #15
                              Any suggestions? I still dont have a working solution yet...
                              Thank you!

                              Comment

                              Working...
                              X