Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create dataset

    Dear all,
    my apologize if the question is quite simple but I am not used to create a dataset and I need your help. I have this kind of values and I need to have a distribution. I have a range for income which is 0-1 where the total income is negative and there are 283468 taxpayers. When I copy and paste them in STATA, they are all strings. How can I create ny dataset and have normal distribution? I hope someone can help me.
    Thank you very much
    Range income Total income N. taxpayers
    0-1 - 4 835 516 000 283 468
    1-10000 14 002 838 000 2 506 533
    10000-20000 71 204 885 000 4 749 939
    20000-30000 119 519 432 000 4 784 796
    30000-40000 142 509 645 000 4 102 418
    40000-50000 129 949 670 000 2 906 925
    50000-100000 383 600 971 000 5 641 489
    100000-500000 264 531 927 000 1 674 865
    >500000 72 251 237 000 56 397

  • #2
    Hello Simona,

    Welcome to the Stata Forum.

    I'm not sure if I understood right your query. I assume by "create" you mean "input".

    Being this so, you may type:

    Code:
    . set obs 8
    . input range_income total_income n_taxpayers
    */ Then you may input the values. For "range_income" you may create 8 groups (say, 1 to 8) and afterwards define and use label values.
    */ Finally, if you wish to have a huge dataset according to the number of taxpayers, you may type:
    . expand n_taxpayers
    Hopefully that helps.
    Last edited by Marcos Almeida; 23 Jan 2017, 12:51.
    Best regards,

    Marcos

    Comment


    • #3
      Thank you for the welcome and for your answer. I am quite new so from that my question.
      Yes I meant "input" variables. When I run the first two lines then I have

      input range_income total_income n_taxpayers
      range_i~e total_i~e n_taxpa~s
      1.

      Under "1." should I write my values? Because if I click "data editor", I cannot open the window.
      Thank you again

      Best regards,
      Simona

      Comment


      • #4
        Simona:
        the main issue with your dataset rests on the fact that your variables include ranges, not point estimates.
        So you have to split the range of your variables and create a new variable for each lower and upper limit of the range.
        Obviously, I cannot say if the suggested approach is in line with what you're after.
        As far as your second query is concerned, I find difficult to believe that a normal distribution for variables like the ones you're dealing with is realistic.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Dear Carlo,
          thank you very much for your answer. I have variables as ranges but I got the excel file like that. You suggest to create a new variable for each lower and upper limit of the range: var 1 for 0, var2 for 1, var3 for 10000 and so on but what should I input as values for 0 and values for 1 given that they have same income and number of taxpayers?
          I need from that simple data to produce a normal distribution, if I can do it.
          Thank you

          Best regards,
          Simona

          Comment


          • #6
            Simona:
            your interpretation of my previous reply is correct.
            However, I don't believe that your dataset, as it is, allows any substantve statitical analyses in Stata.
            Obviiously, I may be mistaken: if you clarify the goal of your research some ore positive replies may come alive.
            Eventually, I do not follow your need of relying on a normal ditribution; often, income follows a Gamma distribution (or a skewed one, at any rate)
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Hello Simona,

              As I pointed out in #2,

              For "range_income" you may create 8 groups (say, 1 to 8) and afterwards define and use label values.
              It is a simple procedure in Stata and you will find well described in the manual.

              Giving you an example, you may type 1 for the first range, 2 for the second, etc. Then you will just need to define a label and apply it.

              For this, just type:

              Code:
              . help label
              And see some interesting examples.

              You may also wish to use the variables manager instead, it you like.
              Best regards,

              Marcos

              Comment


              • #8
                Thank you again Carlo for the quick answer.
                I work on income distribution with deductible donations. At the begining, I need some form of distribution within the income groups (assume a normal distribution (Gauss) within the single income groups or assume an equal distribution (group members equally distributed within the group.) I don´t know which is more adequate. I also need to split the number for taxpayers between "single" and "married" because for the country I study, I have this distinction too (the number for taxpayers above is the total: single + married)

                I work on income distribution with deductible donations
                Thank you

                Best regards,
                Simona
                Range_income Total_income N_taxpayers Range_income N_taxpayers (single+married) Tot_Donations (/1000)
                0-1 - 4 835 516 000 283 468 0-1 61 380 827 891
                1-10000 14 002 838 000 2 506 533 1-10000 343 305 137 215
                10000-20000 71 204 885 000 4 749 939 10000-20000 1 271 314 423 459
                20000-30000 119 519 432 000 4 784 796 20000-30000 1 462 863 478 909
                30000-40000 142 509 645 000 4 102 418 30000-40000 1 438 834 543 478
                40000-50000 129 949 670 000 2 906 925 40000-50000 1 159 082 514 188
                50000-100000 383 600 971 000 5 641 489 50000-100000 2 696 888 1 356 600
                100000-500000 264 531 927 000 1 674 865 100000-500000 1 090 703 1 304 582
                >500000 72 251 237 000 56 397 >500000 47 499 1 135 606
                1 192 735 089 000 26 706 830 9 571 868 6 721 927 6 721 927
                Range_income N_taxpayers (single+married) Donations_paid (/1000) Range_income Single Donation_paid (/1000)
                0-1 4 136 880 0-1 2 372 382
                1-10000 251 311 51 031 1-10000 211 584 40 467
                10000-20000 1 106 972 284 679 10000-20000 844 750 199 328
                20000-30000 1 309 756 383 235 20000-30000 776 637 200 222
                30000-40000 1 329 616 452 724 30000-40000 658 336 202 929
                40000-50000 1 085 995 414 963 40000-50000 423 428 149 040
                50000-100000 2 551 674 1 152 886 50000-100000 557 379 242 770
                100000-500000 1 044 651 903 827 100000-500000 145 512 140 178
                >500000 46 462 544 834 >500000 8 761 153 390
                8 730 573 4 189 059 3 628 759 1 328 706

                Comment


                • #9
                  Simona:
                  thanks for further clarifications.
                  As far as I know, the Statalister who has the widest experience with that stuff is Stephen Jenkins.
                  Waiting for him to chime in, you may want to search fo some of his previous posts (along with related works) and (hopefully) retrieve some suggestions.
                  Sorry I cannot be more helpful.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Thank you very much Carlo.
                    Of course I have taken a look at the different topics trying to find something out, mainly how to input variables properly otherwise I have a wrong dataset.

                    Best regards,
                    Simona

                    Comment


                    • #11
                      Simona:
                      the main issue is that you have (for some variables, at least) a sort of grouped data dataset.
                      Perhaps you may want to calculate a midpoint value whenever you have a range and then input accordingly.
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment


                      • #12
                        Dear all,

                        I write here again because I found another way to present my dataset. However, I do not know how to input my variables in STATA so that I can work on it.
                        I have created dist1 = 0.004457153 and I created it in STATA with the command ge dist1=0.004457153

                        Now I have to input all my taxpayers, many and many starting from 1. So first taxpayer has income 1, second taxpayer has income "1+dist1", third taxpayer has income from second taxpayer + dist1 and so on (you can see the small excel table below). As I need many observations as how many taxpayers, I would like to know how I can do for all of them. It is just a sum of previous value with a value with is fixed in 0.004457153.
                        I know it is quite simple question for you experts and also my request of advice but I do not know how I can do it . I need for thousand and thousand of taxpayers but as then the computation is the same, I will really appreciate if you can explain me or suggest me how to write the command.
                        You can see from the table that 2 has income 1+0.00457153; 3 has income 1.00445715+0.004457153; 4 has income 1.00891431+0.004457153 and so on

                        Thank you very much
                        Best regards,
                        Simona


                        1 1
                        2 1.00445715
                        3 1.00891431
                        4 1.01337146
                        5 1.01782861
                        6 1.02228576
                        7 1.02674292
                        8 1.03120007

                        Comment


                        • #13
                          http://www.statalist.org/forums/help#spelling

                          Code:
                          gen double newvar = 1 + (_n - 1) * 0.004457153

                          Comment


                          • #14
                            Dear Nick Cox,
                            thank you very much. Finally I got how to create my dataset. Following the advices above and now your comment, I generate a loop and I have what I needed.
                            Thank you all of you who answered to my post.

                            Best regards,
                            Simona

                            Comment

                            Working...
                            X