Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to replace inconsistent identity code with the majority identity within 6 years?

    I have 140,000 firms for 2010-2015. I want to check whether the industrial code (indcode) is consistent or not within 2010-2015. If a firm produces the same product within 6 years, then indcode must be identical for 6 years. But in some case, the indcode is not identical even though the firm produce a same product during 6 years . For this inconsistent industrial code I have to edit with the majority indcode during six years. I did with the following command to identify incosistent indcode then I edit manually:

    Code:
    by firmcode: gen nyear=[_N]
    by firmcode, sort: check2 = total(indcode)
    gen check2=check1/indcode
    edit year firmcode indcode nyear check1 check2 if check2!=nyear
    As a result, I have to correct the industrial code of 17,844 firms manually by seeing their historical data and type of product. It is time consuming. I am wondering if I can edit with STATA command, not manually. Can you please share the STATA command how to replace inconsistent industrial code with the majority indcode within 6 years?
    Last edited by Aisah Aisah; 25 Mar 2019, 17:07.

  • #2
    STOP! STOP! STOP!

    You should never hand edit your data like this. With so much data to edit you are almost guaranteed to make a lot of mistakes.

    You do not show an example of your data, and I don't understand the variable names you have but assuming you have a variable that identifies firms, called firm, and another that gives the industry, called industry:

    Code:
     by firm industry, sort: gen mentions = _N
     by firm (mentions), sort: gen edited_ industry = industry[_N]
    will automatically replace any inconsistent recording of industry by whichever industry is mentioned most often.

    There is one hitch: you did not say what you want to do if there is a tie. For example, if out of 6 years, 3 are listed as industry A and the other 3 as industry B, you do not say which industry you want to pick. The code above will do so at random, and, in particular, it will not necessarily pick the same one each time you run the code. If you must have a reproducible, non-random choice made, then you have to identify a rule for which way to break ties, and then modify the code accordingly.

    In the future, when asking for help with coding, always show example data, and use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.



    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Thank you very much for your advice. I did carefully one by one firm by seeing the industrial code and the product name. I am also considering if 3 are listed as industry A and 3 as industry B what will be my rule. When I do manually, I decided based on the consistency between the list of products of each industrial code with the product name produced by a firm. I read the list of products of each industrial code from manual handbook of industrial code. Then I correct with the appropriate one. For example, if A is 22 (manufacture of rubber and plastic products) and B is 32 (other manufacturing) then I have to see the firm product name. The product name of this firm is listed as product from fiber. So in this case, I will correct the industrial code to be 32 which are more closely to the description of product name. This is the case that I can not do with STATA command.

      For the case when the industrial code appear more than 3 times, it is likely correct if I follow the majority of the listed industrial code within 6 years.
      Last edited by Aisah Aisah; 25 Mar 2019, 18:21.

      Comment

      Working...
      X