Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding gap years in the observations

    Hello everyone,

    I have three variables in my dataset - company name, year, and profit. The year variable is from 2001 to 2019. Some companies don't have observations for some years, let's say the company A has years and profits only for only 2002 and 2004. I want to add missing years for the companies, that will include all years from 2001 to 2019. The proft for those added gap years can be missing observations. I would highly appreciate if you could advise me on how I can do it on Stata.


    Thank you very much in advance,
    Nick

  • #2
    Adding extra observations in this case with missing values for profit could be done by fillin, but what is the advantage in doing that?

    You just inflate the dataset with observations that aren't useful.

    Code:
    . clear 
    
    . set obs 21 
    Number of observations (_N) was 0, now 21.
    
    . gen id = cond(_n <= 19, 1, 2)
    
    . label def id 1 "B" 2 "A"
    
    . label val id id 
    
    . gen year = cond(_n <= 19, 2000 + _n, cond(_n == 20, 2002, 2004))
    
    . gen profit = _n
    
    . 
    . list, sepby(id)
    
         +--------------------+
         | id   year   profit |
         |--------------------|
      1. |  B   2001        1 |
      2. |  B   2002        2 |
      3. |  B   2003        3 |
      4. |  B   2004        4 |
      5. |  B   2005        5 |
      6. |  B   2006        6 |
      7. |  B   2007        7 |
      8. |  B   2008        8 |
      9. |  B   2009        9 |
     10. |  B   2010       10 |
     11. |  B   2011       11 |
     12. |  B   2012       12 |
     13. |  B   2013       13 |
     14. |  B   2014       14 |
     15. |  B   2015       15 |
     16. |  B   2016       16 |
     17. |  B   2017       17 |
     18. |  B   2018       18 |
     19. |  B   2019       19 |
         |--------------------|
     20. |  A   2002       20 |
     21. |  A   2004       21 |
         +--------------------+
    
    . 
    . tsset id year 
    
    Panel variable: id (unbalanced)
     Time variable: year, 2001 to 2019, but with a gap
             Delta: 1 unit
    
    . fillin id year 
    
    . 
    . list, sepby(id)
    
         +------------------------------+
         | id   year   profit   _fillin |
         |------------------------------|
      1. |  B   2001        1         0 |
      2. |  B   2002        2         0 |
      3. |  B   2003        3         0 |
      4. |  B   2004        4         0 |
      5. |  B   2005        5         0 |
      6. |  B   2006        6         0 |
      7. |  B   2007        7         0 |
      8. |  B   2008        8         0 |
      9. |  B   2009        9         0 |
     10. |  B   2010       10         0 |
     11. |  B   2011       11         0 |
     12. |  B   2012       12         0 |
     13. |  B   2013       13         0 |
     14. |  B   2014       14         0 |
     15. |  B   2015       15         0 |
     16. |  B   2016       16         0 |
     17. |  B   2017       17         0 |
     18. |  B   2018       18         0 |
     19. |  B   2019       19         0 |
         |------------------------------|
     20. |  A   2001        .         1 |
     21. |  A   2002       20         0 |
     22. |  A   2003        .         1 |
     23. |  A   2004       21         0 |
     24. |  A   2005        .         1 |
     25. |  A   2006        .         1 |
     26. |  A   2007        .         1 |
     27. |  A   2008        .         1 |
     28. |  A   2009        .         1 |
     29. |  A   2010        .         1 |
     30. |  A   2011        .         1 |
     31. |  A   2012        .         1 |
     32. |  A   2013        .         1 |
     33. |  A   2014        .         1 |
     34. |  A   2015        .         1 |
     35. |  A   2016        .         1 |
     36. |  A   2017        .         1 |
     37. |  A   2018        .         1 |
     38. |  A   2019        .         1 |
         +------------------------------+

    Comment


    • #3
      Hello Nick, thank you so much for your reply. However, I am having some difficulties with running the code. I am not sure how to implement it on my dataset. Below, please see my data extraction.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int Company_identifier float(Year Profits)
       4 2017   605731072
       4 2018  1449750016
       4 2003       40000
       4 2007     1600000
       4 2008       90000
       4 2015      150000
       4 2006       27794
       4 2019       23890
       4 2005       72566
       4 2013       37000
       4 2021   1.302e+10
       4 2012       32200
       7 2013     5610080
       7 2011           .
       7 2003           .
       7 2005           .
       7 2017           .
       7 2015           .
       7 2004           .
       7 2007           .
       7 2014           .
       7 2009           .
       7 2002           .
       7 2008           .
       7 2019           .
       7 2010           .
       7 2012           .
       7 2006       40000
       9 2013       10000
       9 2010    6.80e+07
       9 2007           .
       9 2008           .
       9 2017           .
       9 2002           .
       9 2005           .
       9 2014       55368
       9 2009           .
       9 2012           .
       9 2015           .
       9 2018           .
       9 2019           .
       9 2003           .
       9 2004           .
       9 2006       33278
       9 2011       14400
       9 2016           .
       9 2021     6458000
      10 2013      428922
      10 2018      193696
      10 2000       26100
      10 2001       51000
      10 2006       14800
      10 2019   125144000
      10 2016      232002
      10 2021      360420
      10 2015       27000
      10 2004      116720
      10 2017      276646
      10 2002       11600
      10 2005      206920
      10 2007       38960
      10 2008      100500
      10 2014      892046
      10 2009       93260
      10 2011      522024
      10 2020       26800
      10 2003       48580
      10 2012      105220
      10 2010      144000
      16 2009           .
      16 2007    1.50e+08
      16 2018      680764
      16 2004           .
      16 2015    53702000
      16 2008    7.63e+07
      16 2005           .
      16 2003           .
      16 2006    2.00e+07
      16 2014      100000
      16 2012           .
      16 2011    16015000
      16 2013       24000
      16 2005           .
      16 2002           .
      16 2016    90013400
      16 2020    1.60e+07
      16 2021 1.29809e+10
      16 2017    45430000
      16 2010       60600
      16 2019    1.50e+07
      16 2000       35000
      29 2002       10000
      29 2003           .
      29 2004       16676
      29 2010      600000
      29 2007      362438
      29 2016       10200
      29 2019       19128
      29 2018       13858
      33 2012           .
      end
      ------------------ copy up to and including the previous line ------------------


      Regarding your note on inflating the dataset, I agree, but the same command I will use for other variables that have not reported some financial statistics when it was 0. Later, for running a panel regression, I will need those missing observations with replaced by 0s.

      Thank you very much for your support.

      Comment


      • #4
        What are those difficulties then? I can see from your data example that duplicates need to be cleaned up:

        Code:
        . duplicates list Company Year
        
        Duplicates in terms of Company_identifier Year
        
          +-----------------------+
          | Obs   Compan~r   Year |
          |-----------------------|
          |  76         16   2005 |
          |  83         16   2005 |
          +-----------------------+
        
        .


        Comment

        Working...
        X