Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding observations in Stata

    Hello dear fellow Stata-users,

    I have registered here because I could not find anything useful in the archives.

    The situation:
    I use a dataset which contains strings and integers. I want to merge two countries into one for each month (one variable) of each year (another variable) as well as for each sex and each destination country. So in the end female refugees who arrived in Spain in April 2002 from country A shall be added to female refugees who arrived in Spain in April 2002 from country B.

    I tried egen newvar = total countryA countryB, by sex age year month destination

    but only receive my favourite error message: r(198);


    Thanks in advance for your help!

  • #2
    Originally posted by Mirko Stapel View Post
    Hello dear fellow Stata-users,

    I have registered here because I could not find anything useful in the archives.

    The situation:
    I use a dataset which contains strings and integers. I want to merge two countries into one for each month (one variable) of each year (another variable) as well as for each sex and each destination country. So in the end female refugees who arrived in Spain in April 2002 from country A shall be added to female refugees who arrived in Spain in April 2002 from country B.

    I tried egen newvar = total countryA countryB, by sex age year month destination

    but only receive my favourite error message: r(198);


    Thanks in advance for your help!
    Can you use dataex to include a sample of your dataset? That would make it a bit easier to see and hopefully solve the problem. Ideally include a tiny table with the desired output based on the sample.

    Comment


    • #3
      while I agree with Jesse about seeing some data (dataex can be found and downloaded using the "search dataex" command), I can see that you are missing parentheses in several places; please carefully examine the help file for egen (h egen)

      on the other hand, it is not clear that this is what you want; e.g., do you want there to be fewer observations after summing? if so, look at the help for collapse

      I am still confused about the difference between your subject title ("adding observations") and your description; maybe you just made a typo? at any rate, I have assumed that your text, not your title, is the more accurate

      Comment


      • #4
        I don't know dataex but I can include a table
        origin destination age sex refugees flags year month
        string string string string integer string integer integer
        What I hope to get
        origin destination age sex refugees flags year month
        Country A Afghanistan na f 25 2002 10
        Country A Afghanistan na m 43 2002 10
        Country B + C Eritrea and Ethiopia na f 15 M 2002 4
        Country B + C Eritrea and Ethiopia na m 18 M 2002 4
        flags are just a letter indicating a potential problem with the data; as you can see, each country is present twice per time period: once for males and once for females

        I hope that helps



        Edit: Oh yes of course! Sorry, of course I want to merge two observations. Yes, I thought about collapse but my colleagues told me that collapse was built to collapse entire datasets, not just a small subset. Concerning parantheses: When I write: newvar = total(countryA countryB), by sex age year month destination

        it tells me countryAcountryB not found, as if this were one name, despite the blank inbetween..
        Last edited by Mirko Stapel; 14 Jul 2016, 10:57.

        Comment


        • #5
          I think the syntax you are looking for is

          Code:
          bysort origin destination age year month: egen totalRefugees = total(refugees)
          This will sum up the value of refugees for the first two rows (25+43), as well as sum up the last two (15+18).

          This variables after bysort define the group within which you want to perform the egen function, in this case people with the same origin, destination, age, year and month (but crucially not sex). The variable between ()'s after total shows which variable you want to perform the egen function on, in this case summing up refugees.

          Comment


          • #6
            That's the thing Jesse, I do NOT want to some up the values for country A (and neither for countries D to Z). Only countries B and C shall be summed up. Reason is that some datasets do not destinguish. E.g. some datasets list Ethiopia and Eritrea as one country. To retain comparability I have to merge those countries..

            Comment


            • #7
              Originally posted by Mirko Stapel View Post
              That's the thing Jesse, I do NOT want to some up the values for country A (and neither for countries D to Z). Only countries B and C shall be summed up. Reason is that some datasets do not destinguish. E.g. some datasets list Ethiopia and Eritrea as one country. To retain comparability I have to merge those countries..
              It's hard to do this without an actual datasample (dataex really is useful), but this might work.
              Code:
              bysort destination age year month: egen totalRefugees = total(refugees) if origin == "Country B" | origin == "Country C"
              replace refugees = totalRefugees if ~missing(totalRefugees)
              drop if origin == "Country C"
              replace origin = "Country B + C" if origin == "Country B"

              Comment


              • #8
                when you use "by" at the end of an "egen" command, you need to included parentheses there also

                Comment


                • #9
                  Thanks a lot, that helped a great deal!! I changed the code a little:

                  bysort sex destination age year month: egen totalRefugees = total(refugees) if origin == "Country A" | origin == "Country B"

                  (I added sex because I want to add females to females and males to males)

                  replace totalRefugees = . if refugees== .

                  (everything after this line is the same)

                  I have a lot of missings and egen turns missings into zeros which is bad, so I added the replace command. There is just one problem: In cases in which data is missing for one country but not for the other I should also code a missing, but as far as I see it, the code does not do that at the moment. Do you have an idea how to fix this? In exel I would use an IF(OR( command but I don't know how to do that in Stata..

                  Comment

                  Working...
                  X