Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting the first year of data available for one variable

    Hello everyone,
    I need to calculate the founding year of each company based on the first year of data available for Total Asset. For example for Company with ID 001000 founding year is 1961 and for company with ID 001111 the founding year is 1986.

    ID Year Total Asset
    001000 1961 44.025
    001000 1962 38.586
    001000 1963 23.905
    001000 1964 1.416
    001000 1965 2.31
    001000 1966 2.43
    001000 1967 2.456
    001000 1968 5.922
    001111 1985 .
    001111 1986 33.45
    001111 1987 29.33
    001111 1988 19.907
    001111 1989 21.771
    001111 1990 25.638

    Could you please tell me which command I can use to extract founding year for each company in my sample?
    ​​​​​​​Thanks in advance.

  • #2
    Welcome to Statalist, Fimi! Please go through the FAQ for advice on how to most effectively post questions to Statalist. Read especially #12, which covers amongst other things, how you should use the dataex command to provide data examples. That would be much better than copy-pasting the data the way you did.

    Given data as follows:

    Code:
    clear
    input str6 ID int Year float Total_Asset
    "001000" 1961 44.025
    "001000" 1962 38.586
    "001000" 1963 23.905
    "001000" 1964 1.416
    "001000" 1965 2.31
    "001000" 1966 2.43
    "001000" 1967 2.456
    "001000" 1968 5.922
    "001111" 1985 .
    "001111" 1986 33.45
    "001111" 1987 29.33
    "001111" 1988 19.907
    "001111" 1989 21.771
    "001111" 1990 25.638
    end
    you just need to do

    Code:
    egen wanted = min(cond(!missing(Total_Asset), Year, .)), by(ID)
    which gives you

    Code:
    . list, noobs sep(0) abbrev(20)
    
      +--------------------------------------+
      |     ID   Year   Total_Asset   wanted |
      |--------------------------------------|
      | 001000   1961        44.025     1961 |
      | 001000   1962        38.586     1961 |
      | 001000   1963        23.905     1961 |
      | 001000   1964         1.416     1961 |
      | 001000   1965          2.31     1961 |
      | 001000   1966          2.43     1961 |
      | 001000   1967         2.456     1961 |
      | 001000   1968         5.922     1961 |
      | 001111   1985             .     1986 |
      | 001111   1986         33.45     1986 |
      | 001111   1987         29.33     1986 |
      | 001111   1988        19.907     1986 |
      | 001111   1989        21.771     1986 |
      | 001111   1990        25.638     1986 |
      +--------------------------------------+
    Last edited by Hemanshu Kumar; 16 Jun 2023, 22:23.

    Comment


    • #3
      Thank you so much for yourcomments, Hemanshu. I really appreciate your help.

      Comment


      • #4
        For more on Hemanshu Kumar's method. see Section 9 in https://journals.sagepub.com/doi/pdf...867X1101100210

        Note also how a search for FAQs yields pointers:


        Code:
        . search first , faq
        
        Search of official help files, FAQs, Examples, and Stata Journals
        
        FAQ     . . . . . . . . . . . . . . . . . . . . . . . Generating the last date
                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
                7/11    How can I generate a variable containing the last of
                        several dates?
                        http://www.stata.com/support/faqs/data-management/
                        generating-last-date/
        
        FAQ     . . . . . . . . . . . . . . . First and last occurrences in panel data
                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
                3/07    How can I identify first and last occurrences
                        systematically in panel data?
                        http://www.stata.com/support/faqs/data-management/
                        first-and-last-occurrences/

        Comment


        • #5
          The method in #2 is a great method -- which I have publicised too -- but it might lead a reader to wonder how one could work out such a solution for yourself.

          Here is another approach, which may or may not appeal. The first idea is that we need to segregate missing values, which makes it easier to pick the smallest (first) value of year for each panel. With an indicator for missing total assets, the observations with missing values can be sorted to the end of each block of observations for each panel.

          The code catches the extreme case of a panel holding missing values only, presumably not much use for research, but nothing in the question implies that you might not have such a case.

          Code:
          clear
          input str6 ID int Year float Total_Asset
          "001000" 1961 44.025
          "001000" 1962 38.586
          "001000" 1963 23.905
          "001000" 1964 1.416
          "001000" 1965 2.31
          "001000" 1966 2.43
          "001000" 1967 2.456
          "001000" 1968 5.922
          "001111" 1985 .
          "001111" 1986 33.45
          "001111" 1987 29.33
          "001111" 1988 19.907
          "001111" 1989 21.771
          "001111" 1990 25.638
          end
          
          * get an indicator (0 for not missing, 1 for missing) 
          gen ismissing = missing(Total_Asset)
          
          * sort to segregate missings 
          bysort ID (ismissing Year) : gen first = cond(Total_Asset[1] < ., Year[1], .) 
          
          list, sepby(ID)
          
          * sort back to more usual ordr 
          sort ID Year 
          
          list, sepby(ID)
          Here are the listings:

          Code:
          . list, sepby(ID)
          
               +---------------------------------------------+
               |     ID   Year   Total_~t   ismiss~g   first |
               |---------------------------------------------|
            1. | 001000   1961     44.025          0    1961 |
            2. | 001000   1962     38.586          0    1961 |
            3. | 001000   1963     23.905          0    1961 |
            4. | 001000   1964      1.416          0    1961 |
            5. | 001000   1965       2.31          0    1961 |
            6. | 001000   1966       2.43          0    1961 |
            7. | 001000   1967      2.456          0    1961 |
            8. | 001000   1968      5.922          0    1961 |
               |---------------------------------------------|
            9. | 001111   1986      33.45          0    1986 |
           10. | 001111   1987      29.33          0    1986 |
           11. | 001111   1988     19.907          0    1986 |
           12. | 001111   1989     21.771          0    1986 |
           13. | 001111   1990     25.638          0    1986 |
           14. | 001111   1985          .          1    1986 |
               +---------------------------------------------+
           
          . list, sepby(ID)
          
               +---------------------------------------------+
               |     ID   Year   Total_~t   ismiss~g   first |
               |---------------------------------------------|
            1. | 001000   1961     44.025          0    1961 |
            2. | 001000   1962     38.586          0    1961 |
            3. | 001000   1963     23.905          0    1961 |
            4. | 001000   1964      1.416          0    1961 |
            5. | 001000   1965       2.31          0    1961 |
            6. | 001000   1966       2.43          0    1961 |
            7. | 001000   1967      2.456          0    1961 |
            8. | 001000   1968      5.922          0    1961 |
               |---------------------------------------------|
            9. | 001111   1985          .          1    1986 |
           10. | 001111   1986      33.45          0    1986 |
           11. | 001111   1987      29.33          0    1986 |
           12. | 001111   1988     19.907          0    1986 |
           13. | 001111   1989     21.771          0    1986 |
           14. | 001111   1990     25.638          0    1986 |
               +---------------------------------------------+
          That is still a little tricksy. Consider this line

          Code:
          bysort ID (ismissing Year) : gen first = cond(Total_Asset[1] < ., Year[1], .)
          We could have gone

          Code:
          bysort ID (ismissing Year) : gen first = Year[1] if Total_Asset[1] < .
          but then we would need to spread the value in the first observation to all the others. That is a standard copying down as explained at https://www.stata.com/support/faqs/d...issing-values/

          Code:
          ​​​​​​​bysort ID (ismissing Year) : replace first = first[_n-1] if missing(first)
          Here just as the so-called subscript

          Code:
          [_n-1]
          indicates under by: the previous observation in the same panel, so also

          Code:
          [1]
          indicates under by: the first observation in the same panel.

          Comment


          • #6
            Nick, Thank you for your complete explanations. It was very helpful

            Comment

            Working...
            X