Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Divide groups by size in panel data

    Dear Statalist,

    I am very new on Stata, I want to divide my panel data into groups regarding to firm size (small firms and large firms). Based on the median of firm size in current year (2015), I have two groups by ID (one group includes all id with the firm size in 2015 >= median and the other group include the rest of id with the firm size<median in 2015) my thought is that I should creating the dummy variable of id (companies) then run two regressions on two sub-sample for large and small firm, respectively. However, the syntax did not work as following

    sort id
    by id: gen newid=1 if year=2015 & firm size>=732047.9 => syntax did not work!

    Anyone can help me correct the syntax or instruct me the process! Many thanks!

    Trang

  • #2
    Welcome to Statalist, Trang.

    Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

    Section 12.1 is particularly pertinent

    12.1 What to say about your commands and your problem

    Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!
    ...
    Never say just that something "doesn't work" or "didn't work", but explain precisely in what sense you didn't get what you wanted.
    In the case of the code you posted
    Code:
    by id: gen newid=1 if year=2015 & firm size>=732047.9
    the most obvious thing we can say is that it didn't work because your variable name "firm size" appears to have a space in the middle of it, which isn't allowed by Stata. But I'm guessing that isn't the actual code you used, and that the symptom of it not working is something different.

    It would be helpful if you were to post a small hand-made example, perhaps with just a few variables and observations, showing the data before the process and how you expect it to look after the process. In particular, please read FAQ #12 and use dataex and CODE delimiters when posting to Statalist.

    Comment


    • #3
      Trang:
      as as aside to William's helpful advice, please note that the way you selected observation will allow you to perform an OLS only (but not a panel data regression), as theres only one year included (2015).
      If that were not what you're seeking help for, as William said, please pose your question in a more detailed way. Thanks.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Dear William and Carlo. Thank you for your posts. I tried adjusting my initiate post but I did not find the function like this!

        I have panel data with id (companies) and year (2007-2015). I want to run seperately regressions on two sub-samples for large firms and small firms. I define large and small firms by the upper and lower median of size (firm size by total asset) in the year of 2015. I think i should create dummy variable by generate newid (newid=0 for all observations which have the size is smaller the median of size of all firms in 2015 then replace newid=1 for all the rest of obsrvations belongs to the firms which have the size in 2015 is larger than median of size for all firms in 2015). I process this idea on stata as following:

        by id: gen newid=0 if size<732047.9 & year==2015
        replace newid=1 if size>=732047.9 & year==2015

        The thought that newid=0 for all observations of the firms (2007,2008,2009,2010,2011,2012,2013,2014 and 2015) which have the size in 2015 <732047.9 but the result is only the value of observation in 2015 =0, the rest of the others years=.

        First, did i follow the right process if i want to run two regression on two sub-samples? do you have any another way to solve it?
        Second, if it is correct, how can I replace the dot (.) for all years like below (I type by hand the result)

        Newid I want
        . 0
        . 0
        . 0
        . 0
        . 0
        . 0
        . 0
        . 0
        0 0

        Hope I can explain clearly what I want for my very first project with Stata. Much appreciate all the replies!

        Many thanks

        Trang

        Comment


        • #5
          .
          Last edited by Trang Phan; 06 Mar 2017, 06:04.

          Comment


          • #6
            I have panel data (firms and year from 2007-2015), I want to run to regressions on two sub-samples for large firms and small firms. Firm’s size is larger than median of 2015 (all firms (id) in 2015) belong to group of small size. The rest is large firms group.

            In the beginning, I run the command:

            xtabond2 debtmat l.debtmat tax firmqual levb assetmat lnta cacl mvtobv mabsebit ratevol if size<=732047.9, gmm(l.debtmat,laglimits(1 5) ) iv(lnta,eq(lev) ) twostep robust

            (732047.9 number is the median of size in 2015)

            But I regconized that if I follow this way, means that in a firm, this year it is in the small group and the other year it is maybe in the large group as long as the size is upper or lower the median line.

            Then, I thought that I should divide data into two groups by filtering id with id=0 if size is smaller than median and id=1 if size is larger than median based on the median of size in 2015. If firm’s size in 2015 is smaller than median of 2015 all the other years (2007-2014) id=0. But after Carlo’s comment, I know that if I can do it (until now I have not yet generate the dummy variable as I want in the previous explanation) I no longer have panel data in hand because in one group (small and large firms) I have one id (0 and 1 respectively).

            At the end, I first just wanted to ask for a help in syntax correction but I no longer need it. And now, can anyone share me how to run two regressions on two sub-samples because I am stuck in without any further idea.
            Many thanks!
            Last edited by Trang Phan; 06 Mar 2017, 06:09.

            Comment


            • #7
              For the record, a problem in #1 was the use of = rather than ==.

              You need an indicator based on size in 2015. As 2015 is the last year in your panel,

              Code:
              bysort id (year) : gen big = size[_N]  > 732047.9 if size[_N] < .
              and then your regressions are if big == 1 and if big == 0.

              See http://www.stata-journal.com/sjpdf.h...iclenum=dm0055 for a review of technique.

              There must be better ways to model the effects of size, however....

              Comment


              • #8
                Dear Nick,

                Thank you so much for your suggest!

                Comment

                Working...
                X