Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dealing with Product wise and Year wise firm data

    I am working on a data that consists the following variables - firms, years and the product codes of the goods produced by these firms during their reported years. So, for example, firm A produces goods 1, 2 and 3 in 2004, goods 2 and 4 in 2005 and so on.Now I want to create two variables - one, that indicates the number of products added by a firm during each reported year andtwo, that indicates the number of products dropped by that firm during each year. Can anyone please help me in dealing with these two questions?

  • #2
    As here, trying to describe a data set in words often fails to give the necessary information to solve the problem. There are different ways in which this data could be organized in Stata that are all consistent with your description. To get a useful answer to your question, you need to show an actual example of your Stata data set. The way to do that so that those who want to help you will actually be able to make use of it, is by using the -dataex- command. -dataex- is not part of official Stata: you have to run -ssc install dataex- to get it. Then run -help dataex- to read the simple instructions for using it.

    If you post back with that, I'm confident you will get the help you need.

    Comment


    • #3
      Hi, thank you so much for your reply. I have attached a small sample of my data here.
      Attached Files

      Comment


      • #4
        There are aspects of your question that are ambiguous, so let me start by setting out how I have interpreted it:

        1. I consider a product to be dropped in the final year that the company offers it (as opposed to the following year).
        2. I consider a product to be introduced in the first year that it appears in the data set, even though, if that is the company's first year in the data set they may have been carrying it in the years before they became part of the data set.
        3. If a product is introduced, dropped, and then brought back, I am counting the year it is brought back as a new product introduction.
        4. All products are considered dropped in the final year that the company appears in the data set, even though the company may remain in business beyond the data set and continue to carry the product.

        With those things clarified:

        Code:
        clear*
        use sample_data
        isid co_code products_product_code year
        
        gen long obs_no = _n
        
        
        //    MARK SPELLS OF PRODUCTS WITHIN FIRMS
        by co_code products_product_code (year), sort: gen spell = sum(year != year[_n-1]+1)
        
        //    IDENTIFY WHEN PRODUCTS INTRODUCED AND TERMINATED
        by co_code products_product_code spell (year), sort: gen year_introduced = year[1]
        by co_code products_product_code spell (year): gen year_dropped = year[_N]
        
        rangestat (count) products_added = obs_no, by(co_code year) interval(year_introduced year year)
        rangestat (count) products_dropped = obs_no, by(co_code year) interval(year_dropped year year)
        mvencode products_added products_dropped, mv(0)
        To run this, you will have to install Robert Picard, Nick Cox, and Roberto Ferrer's -rangestat- command. -ssc install rangestat-.

        It would not have been possible to answer your question without the sample data to work with, so thank you for posting it. However, there is a better way to post data, as explained in #2: the -dataex- command (written by the same Robert Picard). For these purposes, the -dataex- version is easier to use than an attached data file. In the future, please use -dataex- to post example data, not attachments.

        Comment


        • #5
          Hi, thank you very much for all your help. It really means a lot. I am, however, considering a product to be dropped in year 'i' if a company was producing that product during year 'i-1', but discontinued its production during year i. Could you please help me in finding that out? Based on these commands, my next task is to find out the number of products added and dropped by each firm within a period of 3 years (so instead of lag 1, I will then have to consider lag3 years to determine the same variables. For eg, if a company reports data for years 2000-2007, then the number of products added in the year 2004 will be decided by comparing the product codes in the year 2001 and 2004). Thank you once again
          And, i will definitely share data using -dataex- from the next time onwards.

          Comment


          • #6
            So, changing the definition of the year in which a product is dropped is simple enough:
            Code:
            CHANGE
            by co_code products_product_code spell (year): gen year_dropped = year[_N]
            
            TO
            by co_code products_product_code spell (year): gen year_dropped = year[_N]+1
            But I'm not clear about what you want in the 3 year lag. Your description is internally inconsistent. So let's focus on what we want to report in year 2004. Is it

            A. The number of products discontinued between 2001 and 2003 inclusive?
            B. The number of products discontinued between 2002 and 2004 inclusive?
            C. The number of products discontinued in 2001 only?
            D. The number of products discontinued between 2001 and 2004 inclusive?


            Comment


            • #7
              I'm extremely sorry for not being clear while explaining my queries. For a particular firm in the year 2004, I want to consider the following:
              a. For product added, I mean the following,
              If a product code appears in year 2004 but doesn't feature in the year 2001, then it is 'product added'
              b. For product dropped, I mean the following:
              If a product code appears in year 2001, but doesn't feature in the year 2004, then it is 'product dropped'
              This means that, while reporting new variables (products added and dropped) for the year 2004, I want to focus on the products produced during the year 2001 only and likewise for other years.
              Thank you so much for all your help. I am learning a lot.

              Comment


              • #8
                And, I tried making that change in the command for generating the variable 'year_dropped', however, I couldn't succeed in creating correct values for the variable 'products_dropped'. Should we also make a change in the command for generating the variable 'products_dropped'? Could you please help me in clearing this doubt?
                Also, do you know any good reference to understand the command 'rangestat'. It is actually a very helpful command for me at this stage and I want to learn more about that. Thank you once again.

                Comment


                • #9
                  OK. I've rewritten it using a slightly different approach that avoids some of the pitfalls of the earlier version. Here, I create a separate file that contains just the number of introductions and discontinuations for each company in each year, and then I -rangejoin- that back with the original data at the end, matching each year to three years forward. (So, for example, the 2001 introductions and discontinuations get paired with the 2004 observations in the original data.)

                  To run this you will now need to install -rangejoin-, also by Robert Picard. It combines some of the functions of -merge- or -joinby- with -rangestat- and is just the tool for this approach to this task. Run -ssc install rangejoin- and read -help rangejoin-. There is, to my knowledge, no other documentation of either -rangejoin- or -rangestat- beyond their help files, unfortunately.

                  If you look at the output from the code I show here, you will see that there is one aspect of the problem that may not be what you intended. If a product is dropped or added in 2016, you are asking for that to show up in the company's observations for year 2019. But there is no observation for year 2019. Situations like this are characterized in these results by a missing value for the year variable: it means that the year that Stata was trying to post these introductions and discontinuations in doesn't exist in the original data set for that company.

                  Code:
                  clear*
                  use sample_data
                  isid co_code products_product_code year
                  
                  gen long obs_no = _n
                  
                  
                  //    MARK SPELLS OF PRODUCTS WITHIN FIRMS
                  by co_code products_product_code (year), sort: gen spell = sum(year != year[_n-1]+1)
                  
                  //    IDENTIFY WHEN PRODUCTS INTRODUCED AND TERMINATED
                  by co_code products_product_code spell (year), sort: gen year_introduced = year[1]
                  by co_code products_product_code spell (year): gen year_dropped = year[_N]+1
                  
                  keep co_code products_product_code year_introduced year_dropped
                  encode products_product_code, gen(product_code)
                  
                  preserve
                  drop year_dropped
                  duplicates drop
                  collapse (count) introductions = product_code, by(co_code year_introduced)
                  rename year_introduced year
                  tempfile introductions
                  save `introductions'
                  
                  restore
                  drop year_introduced
                  duplicates drop
                  collapse (count) discontinuations = product_code, by(co_code year_dropped)
                  rename year_dropped year
                  
                  merge 1:1 co_code year using `introductions', nogenerate
                  mvencode introductions discontinuations, mv(0)
                  gen match_year = year+3
                  rename (introductions discontinuations) =_3_yr_ago
                  
                  rangejoin year match_year match_year using sample_data, by(co_code)
                  rename year year_added_or_dropped
                  rename year_U year
                  sort co_code year

                  Comment


                  • #10
                    Hi,
                    Thank you so much for all your help. I have learnt a lot from the new stata codes that you introduced in your posts.
                    Thank you so much.

                    Comment


                    • #11
                      Hi,

                      I am working on the same dataset and looking for help to create the variable for "products introduced" as defined in the previous discussion.

                      I used the codes as shared by @Clyde Schechter on sample databut was unable to process rangejoin command, where the output error says: file sample_data.dta not found.

                      Please help so that I can proceed.

                      Thank you.

                      Comment


                      • #12
                        I assume you are referring to the code in #9. That code begins with -use sample_data-. In writing that code, I used "sample_data" as a stand in for whatever the actual name of the data set you are starting with. I did that because the original poster of this thread used that name. Perhaps you are using a different name for that data file. Whatever the actual name of that file is, you need to substitute that name wherever you see "sample_data" in the code shown.

                        To be honest, I don't really understand how you got past the -use sample_data- command but then got "file not found" at the end when -rangejoin- tried to access the same file. None of the code in between changes working directories, nor does it erase or rename the file. So I would have expected the code to break right at the beginning, never letting you even get to -rangejoin-.

                        Comment


                        • #13
                          Thank you for the response.

                          I am using the same data file with the name "sample_data", all the previous codes (before rangejoin) are working fine but showing an error for this code specifically:
                          rangejoin year match_year match_year using sample_data, by(co_code)

                          I don't understand the possible reasons for this code not working.

                          Another issue I am facing while applying these codes to the larger dataset is 1:1 merging is not working (using the code below), instead, m:1 is able to merge the files.
                          merge 1:1 co_code year using `introductions', nogenerate
                          Last edited by Geeta Tiwari; 31 Mar 2023, 22:36.

                          Comment


                          • #14
                            Both of these problems do not look even possible. I cannot believe that you are using the code shown in #9 and getting those error messages. As I indicated in #12, if sample_data.dta cannot be found, then the code should already break at the top with the -use sample_data- command. As for the -merge- problem, it occurs after a -collapse- command that is guaranteed to produce a result in which co_code and year uniquely identify the observations, so there is no reason -merge 1:1- should fail.

                            I can only conclude that the code you are using has been in some ways modified from what is shown in #9. So I must ask you to post back showing the exact code that you are running. I also ask that you show the exact and complete output that Stata is giving you in the Results window. Do not edit it in any way--there is no such thing as a minor change. I also ask that you include example data, using the -dataex- command. With full information in hand I will try to solve this puzzle.

                            Comment

                            Working...
                            X