Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate new column with specified data based on an existing column

    Hi,
    I have dataset with several companies, and their NAICS code. I want to make a new (additional) column that says e.g. utilities if NAIC-code starts on 21 or 22. .. Thus, I want one additional column containing a "description", ie.the name of the industry, based on the NAICS code.

    I have tried several commands, but don't make it work.

    This works:
    gen NAICS_two_digit = substr(NAICS_CORE_CODE, 1, 2)

    But the following don't work:

    gen peer_industry = "Agriculture" if NAICS_two_digit = 11
    "Utilities" if NAICS_two_digit = 21 | 22
    "Manufacturing" if NAICS_two_digit = 31 | 32 | 33
    "Construction" if NAICS_two_digit = 23
    "Wholesale" if NAICS_two_digit = 42
    "Retail" if NAICS_two_digit = 44 | 45
    "Transportation" if NAICS_two_digit = 48 | 49

    I belive this is a quite trivial problem, however, I don't manage to do it.

    Hope someone can help med out there !


  • #2
    Hi Felix,

    gen peer_industry = "Agriculture" if NAICS_two_digit = 11

    shall be changed to

    gen peer_industry = "Agriculture" if NAICS_two_digit == 11

    "Utilities" if NAICS_two_digit = 21 | 22

    shall be changed to

    replace peer_industry="Utilities" if NAICS_two_digit ==1 | NAICS_two_digit == 22

    Comment


    • #3
      There are four problems here.

      1. When testing for equality you need to use == not =. See help operators.

      2. Your new variable is still string, so references (e.g.) to 11 need to be to "11". Otherwise you can work with a numeric variable. I was not familiar with the details of NAICS but my understanding is that the first digit is never 0, and if so real(substr(NAICS_CORE_CODE, 1, 2)) would work fine to define a numeric variable.

      3.

      Code:
      "Utilities" if NAICS_two_digit = 21 | 22
      for example needs to be

      Code:
      replace  peer_industry = "Utilities" 
      followed by an if qualifier.

      4. Consider a statement of the form

      Code:
      gen foo = "whatever" if NAICS_two_digit == 21 | 22
      where I have assumed that the two digit variable is numeric and fixed the == operator. You're hoping that this is equivalent to

      Code:
      * this is legal
      gen foo = "whatever" if NAICS_two_digit == 21 | NAICS_two_digit == 22
      but it isn't. Stata will parse that statement as

      Code:
      gen foo = "whatever" if (NAICS_two_digit == 21) | 22
      which is always true as 22 is always true by the rule that non-zero numeric arguments count as true. Note that "22" presented to a logical operator would trigger a type mismatch. "

      You need either the device flagged as "this is legal" above or

      Code:
      * this is legal too
      gen foo = "whatever" if inlist(NAICS_two_digit, 21, 22)
      inrange() would work here too.

      A Tip discussing problem #4 -- which is subtle and has bit many people here -- is at press for Stata Journal 23(1),which is no help right now.

      EDIT Jin Zhang in #2 caught problems numbered 1 3 and 4 here, but 2 remains.
      Last edited by Nick Cox; 20 Feb 2023, 02:27.

      Comment


      • #4
        Thanks for the help!

        Now I tried:

        gen NAICS_two_digit = substr(NAICS_CORE_CODE, 1, 2)

        replace peer_industry == "Agriculture" if NAICS_two_digit == 11
        "Utilities" if NAICS_two_digit == 21 | 22
        "Manufacturing" if NAICS_two_digit == 31 | 32 | 33
        "Construction" if NAICS_two_digit == 23
        "Wholesale" if NAICS_two_digit == 42
        "Retail" if NAICS_two_digit == 44 | 45
        "Transportation" if NAICS_two_digit == 48 | 49

        However I get: variable peer_industry not found

        Hope you can help me once more!

        Comment


        • #5
          The suggestion in both #2 and #3 of

          Code:
          replace peer_industry =
          was a direct response to your own code in #1

          Code:
          gen peer_industry =
          In both cases I leave out details following = .

          The replace won't work if you didn't issue that generate command.

          The code you want appears to be something like

          Code:
          gen NAICS_two_digit = real(substr(NAICS_CORE_CODE, 1, 2)) 
          
          gen peer_industry = "Agriculture" if NAICS_two_digit == 11
          replace peer_industry = "Utilities" if inlist(NAICS_two_digit, 21,  22)
          and so on. Here you should

          Code:
          drop NAICS_two_digit
          if it exists.

          Comment

          Working...
          X