Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Getting Balanced Panel from Unbalanced Panel which your yearly observations are with gap


    Hi, I have unabalanced panel data of 1850+ companies with 26 yearly observation

    xtdes

    coid: 1, 2, ..., 1955 n = 1873
    year: 1990, 1991, ..., 2015 T = 26
    Delta(year) = 1 unit
    Span(year) = 26 periods
    (coid*year uniquely identifies each observation)

    I have unbalanced panel with gaps

    xtset coid year
    panel variable: coid (unbalanced)
    time variable: year, 1990 to 2015, but with gaps
    delta: 1 unit

    I know following command will give me balanced Panel with 15 years

    by coid: gen nyear=[_N]

    keep if nyear == 15


    but my problem is that in my data some companies does not have data of 2015 or 2014, so in above command the companies having data from 1998 to 2013 are having Nyear=15 but i want to discard them.

    My Specific Requirement,
    In my balanced panel, I want to retain only those companies whose data is available from 2000 to 2015 without any gap. Do remember some of my companies does not have yearly observation of 2003 or 2015 or 2014 too.

    Can someone help me in either adjusting above two command or giving a new command for my requirement?

    Mubeen

  • #2
    Note that while

    Code:
    by coid: gen nyear=[_N]
    would work, in general it's best to use square brackets only to indicate observation numbers.

    As I understand it you want this

    Code:
    egen wanted = total(inrange(year, 2000, 2015)), by(coid)
    and that variable will have the value 16 if and only if all 16 years (not 15) from 2000 to 2015 are present in each panel.

    Comment


    • #3
      Thank you Sir, That was helpful for me in getting Balanced Panel!

      I have one more query related to this, As we know while deciding Balance Panel if you want to Increase Year Dimension you will lose Cross Dimension. Its always a tradeoff between Year and Cross Dimension.

      In my above Example, i randomly opt for year 2000 to 2015 (16 years panel), What is your clever way/command of deciding how many years to have in Balance Panel? Any command which will give you a summary of Balanced Panels possibility eg if 16 years how many cross left, if 20 years how many cross left, if 12 years how many cross left?

      To give background: Basically I am working on Emerging Economies (China, India, Turkey, Brazil, Korea, Indonesia, Pakistan)

      Lets take Turkey: I was having Unbalanced Panel of 375 firms in Turkey and by selecting year 2000 to 2015 (16 years time dimension in Panel i am left with 104 firms, 12 years time dimension in Panel i am left with 137 firms), Similar will be done with other countries. As I have not finalized the Time Dimension of my Panel, Need suggestions in command while considering my tradoff.

      Regards and Stay Blessed
      Mubeen

      Comment


      • #4
        What is your clever way/command of deciding how many years to have in Balance Panel?
        I really don't have one. As you say, it's a trade-off problem and the solution is likely to be substantive as much as statistical.

        I don't work in this area at all; I just know some basic tools for data management.

        That leaves the question wide open for everyone else.

        Comment


        • #5
          A good start would be - why do you need a balanced panel? Most estimators work just fine with unbalanced data.

          Comment


          • #6
            Originally posted by Nick Cox View Post
            Note that while

            Code:
            by coid: gen nyear=[_N]
            would work, in general it's best to use square brackets only to indicate observation numbers.

            As I understand it you want this

            Code:
            egen wanted = total(inrange(year, 2000, 2015)), by(coid)
            and that variable will have the value 16 if and only if all 16 years (not 15) from 2000 to 2015 are present in each panel.
            Dear Sir,

            With reference to above query, after using
            Code:
            egen wanted = total(inrange(year, 2000, 2015)), by(coid)
            and

            Code:
            keep if wanted == 16
            I am still not able to get perfectly balanced panel as egen wanted is generated value of 16 for the firm whose data is also available from year 1991 to 2015 and the firms whose whole 25 years data is available are still there and thats not making may panel data perfectly balanced. I have thought of following code
            Code:
            drop if year == 1991 & year == 1992 & ...... & year == 1998 & year == 1999
            I want to ask two things
            a) is there any other way of getting Balanced panel after deciding time dimension with year of 2000 to 2015 where we dont have to use drop years code ?
            b) how to shorten that drop if year code if we want to drop the series of year eg from year 1980 to 1999 or keeping 2000 to 2015?

            Regards
            Mubeen

            Comment


            • #7
              As for #2

              Code:
              drop if year >= 1991 & year <= 1999
              Note that your current code wouldn't work, as no single observation will have their year value equal to 1991 AND 1992 AND 1993 AND ...

              Comment


              • #8
                Thank you Jesse for # 2 ! Silly me for using And instead of Or , By the way What is the character for "OR" ?

                Comment


                • #9
                  |

                  See help operator

                  Comment


                  • #10
                    Originally posted by Nick Cox View Post
                    Note that while

                    Code:
                    by coid: gen nyear=[_N]
                    would work, in general it's best to use square brackets only to indicate observation numbers.

                    As I understand it you want this

                    Code:
                    egen wanted = total(inrange(year, 2000, 2015)), by(coid)
                    and that variable will have the value 16 if and only if all 16 years (not 15) from 2000 to 2015 are present in each panel.

                    Hi All,

                    Dear Nick,
                    the way you show the use of total(inrange(year, 2000, 2015)) was really helpful at start of my data arrangement. Again need your suggestion related one of Data Management query

                    now my panel data structure is as follow:
                    Code:
                    xtset coidn year
                    xtdes
                    coidn: 1, 2, ..., 55 n = 55
                    year: 2000, 2001, ..., 2015 T = 16
                    Delta(year) = 1 unit
                    Span(year) = 16 periods

                    after my preliminary analysis now using one of following

                    Code:
                    ** Generating Variable showing % Change in Shares
                    generate newshp = ((shoutst - L1.shoutst)/L1.shoutst)*100
                    label variable newshp "Yearly % Increase in Shares / % Issued"
                    generate equitychange = ((shoutst - L15.shoutst)/L15.shoutst)*100
                    label variable equitychange "Total % Change from 2000 to 2015"
                    I am having trouble for creation of dummy the way i want. see following

                    Lets take equitychange variable. Now I want to create a dummy in such a way where if equitychange is positive, value should be 1 and and if negative its zero for full coidn (in all year from 2000 to 2015), as equitychange has been created with L.15 every coidn has 15 missing value from year 2000 to 2014 and only 2015 value is there thus
                    using following

                    Code:
                    generate dum = equitychange >0 & equitychange<.
                    gives 0 to all missing value (year 2000 to 2014 in a coidn) but i want to have value of 1 for all such years of 2000 to 2015 in a coidn where in year 2015 equitychange is positive so it should give value of 1 from year 2000 to 2015 and not only in 2015

                    following didnot work also

                    Code:
                    egen dum if year = 2015 | equitychange>0, by(coidn)
                    Waiting for all of yours' valuable suggestion.

                    Muhammad Mubeen

                    Comment


                    • #11
                      Try

                      Code:
                       generate dum_temp = equitychange >0 if equitychange<.
                      bysort coidn: egen dum = mean(dum_temp)

                      Comment


                      • #12
                        Originally posted by Jesse Wursten View Post
                        Try

                        Code:
                        generate dum_temp = equitychange >0 if equitychange<.
                        bysort coidn: egen dum = mean(dum_temp)
                        Thank you!

                        just before reading your reply i was able to do the same as follow (after spending 5 to 6 hours which offcourse worth for me considering my current situation):

                        Code:
                        egen ecmax = max(equitychange), by(coidn)
                        generate dum = 0
                        replace dum = 1 if ecmax>0 & ecmax<.
                        It was more of applying the codes with intelligence rather than knowing the code!

                        Regards
                        Mubeen

                        Comment


                        • #13
                          You know what they say, a solution handed to you is remembered for a day, a solution found yourself lasts several decades.

                          Comment


                          • #14
                            Click image for larger version

Name:	bc.PNG
Views:	4
Size:	10.5 KB
ID:	1434180

                            dear all

                            i have a problem with my panel data
                            it shows that is " strongly balanced", but when i run the regression i get that msg !
                            any help please

                            best regards
                            Sedki

                            Comment


                            • #15
                              xtset only checks whether your panel and time variable are always present ("your data is rectangular"). It does not check whether this is true for all the variables in your data. I think Nick Cox (and others, maybe even Stata) have written some programs that help figure out what is missing when. You can always just have a look at your data (browse), which should point out the issue soon enough.

                              Comment

                              Working...
                              X