Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • r(109) type mismatch - recode string values in numeric values

    Hello everyone,

    I am trying to generate a dummy for campany size from a variable that includes the number of employees of the year 2012. Some companies did not state their numbers. The value of this case is n.v.. I do not want to delete this values as I can add values of other years later on.

    Data looks like that:

    company_id numberemployees_2012
    19765 1000
    87644 n.v.
    77931 25

    Code:
    encode numberofemployees2012, gen(number_employees2012_n)

    generate number_emp_large_2012 = 1
    replace number_emp_large_2012 = 0 if (number_employees2012_n <= 250 | number_employees2012_n=="n.v.")

    this gives an error message r109 "type mismatch"

    As a solution I am trying to recode the "n.v." values of numberofemployees_2012_n from string to numeric values (=0). But I don't know how to do that. I read in another post about replacing n.v. using label value?

    Thank you for your help!


  • #2
    Hi Jessica,

    -encode- converts string to numeric so you cannot refer to number_employees2012_n as "". You could either try to find out what is the numeric value corresponding to "n.v." using the data editor. Alternatively, you could use -real-

    Code:
    help real 
    
    gen number_employees2012_n = real(numberofemployees2012)
    
    generate number_emp_large_2012 = 1
    replace number_emp_large_2012 = 0 if (number_employees2012_n <= 250 | number_employees2012_n == .)

    Comment


    • #3
      You could try something like that below.

      .ÿversionÿ14.1

      .ÿ
      .ÿclearÿ*

      .ÿsetÿmoreÿoff

      .ÿ
      .ÿinputÿstr5ÿcompany_idÿstr4ÿnumberemployees_2012

      ÿÿÿÿÿcompany~dÿÿnumb~2012
      ÿÿ1.ÿ19765ÿ1000
      ÿÿ2.ÿ87644ÿn.v.
      ÿÿ3.ÿ77931ÿ25
      ÿÿ4.ÿend

      .ÿ
      .ÿ*
      .ÿ*ÿBeginÿhere
      .ÿ*
      .ÿquietlyÿdestringÿnumberemployees_2012,ÿgenerate(number_employees2012_n)ÿignore("nv.")

      .ÿquietlyÿreplaceÿnumber_employees2012_nÿ=ÿ.nÿifÿnumberemployees_2012ÿ==ÿ"n.v."

      .ÿ
      .ÿlabelÿdefineÿNVÿ.nÿ"n.v."

      .ÿlabelÿvaluesÿnumber_employees2012_nÿNV

      .ÿ
      .ÿgenerateÿbyteÿnumber_emp_large_2012ÿ=ÿ///
      >ÿÿÿÿÿÿÿÿÿ!(number_employees2012_nÿ<=ÿ250ÿ|ÿnumber_employees2012_nÿ==ÿ.n)

      .ÿ
      .ÿlist,ÿnoobsÿabbreviate(33)

      ÿÿ+------------------------------------------------------------------------------------+
      ÿÿ|ÿcompany_idÿÿÿnumberemployees_2012ÿÿÿnumber_employees2012_nÿÿÿnumber_emp_large_2012ÿ|
      ÿÿ|------------------------------------------------------------------------------------|
      ÿÿ|ÿÿÿÿÿÿ19765ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ1000ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ1000ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ1ÿ|
      ÿÿ|ÿÿÿÿÿÿ87644ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿn.v.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿn.v.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ0ÿ|
      ÿÿ|ÿÿÿÿÿÿ77931ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ25ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ25ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ0ÿ|
      ÿÿ+------------------------------------------------------------------------------------+

      .ÿ
      .ÿ/*ÿAlternative:
      >ÿquietlyÿreplaceÿnumberemployees_2012ÿ=ÿ".n"ÿifÿnumberemployees_2012ÿ==ÿ"n.v."
      >ÿquietlyÿdestringÿnumberemployees_2012,ÿgenerate(number_employees2012_n)
      >ÿetc.ÿ*/
      .ÿ
      .ÿexit

      endÿofÿdo-file


      .

      Comment


      • #4
        Just to spell it out, encode is not only wrong but also very dangerous in cases like these. Here is a simple example to make the point.

        We first simulate what happens when somehow something that "should be" numeric is input as string. We try encode as a solution; there is no error message and the describe and list output looks fine. But strip away the value labels and we find that the underlying numeric values are not even in the correct numeric order.

        Principle and problem: Without other instructions, encode uses alphanumeric sort order to map distinct values to integers 1 up. For values that mostly are numeric read as strings, the result is basically garbage, but may not seem so.

        Don't.

        Code:
        . input str5 shouldbenumeric
        
             shouldb~c
          1. "1"
          2. "2"
          3. "111111"
          4. "299"
          5. "3"
          6. "n.v."
          7. end
        
        . encode shouldbenumeric, gen(willbenumeric)
        
        . l
        
             +---------------------+
             | should~c   willbe~c |
             |---------------------|
          1. |        1          1 |
          2. |        2          2 |
          3. |    11111      11111 |
          4. |      299        299 |
          5. |        3          3 |
             |---------------------|
          6. |     n.v.       n.v. |
             +---------------------+
        
        . d
        
        Contains data
          obs:             6                          
         vars:             2                          
         size:            78 (99.9% of memory free)
        -------------------------------------------------------------------------------
                      storage  display     value
        variable name   type   format      label      variable label
        -------------------------------------------------------------------------------
        shouldbenumeric str5   %9s                    
        willbenumeric   long   %8.0g       willbenumeric
        
        . l, nola
        
             +---------------------+
             | should~c   willbe~c |
             |---------------------|
          1. |        1          1 |
          2. |        2          3 |
          3. |    11111          2 |
          4. |      299          4 |
          5. |        3          5 |
             |---------------------|
          6. |     n.v.          6 |
             +---------------------+
        See also http://www.stata-journal.com/sjpdf.h...iclenum=dm0057

        Comment


        • #5
          thank you all for your advice! the code suggested by Joseph worked.

          if i want to add values of the variable number_employees2010_n if number_employees2012_n=="n.v.". Which command could i use? I tried "fillin", but the output is wrong.

          Code:
          fillin number_employees_2010_n number_employees2012_n
          replace number_employees2012_n=number_employees_2010_n if (number_employees2012_n == .n)

          I also tried a suggestion from another post:
          sort graduate_id, stable
          foreach number_employees2012_n in frog toad newt {
          by graduate_id: replace number_employees2012_n=number_employees_2010_n if number_employees2012_n== .n
          }

          In both cases the values are changed, but the results are wrong. The numbers of all variables are changed into a random (?) number...can you help?

          Comment


          • #6
            if i want to add values of the variable number_employees2010_n if number_employees2012_n=="n.v."
            I don't understand what motivated you to try using -fillin- for this. So perhaps I don't really understand what you want to do. But the direct answer to this question, following on Joseph Coveney's code is:

            Code:
            summ employees_2010_n if employees_2012_n == .n
            display r(sum)
            If you want to do this separately for each company and have the sums in a new variable called x:

            Code:
            by company_id, sort: egen x = total(cond(employees_2012_n == .n, employees_2010_n, .))

            Comment


            • #7
              Hi Clyde,

              what I want to do is to restock my employees_2012_n variable if it has a "n.v." with values of the employees_2010_n if the values are available.

              Example:
              employees_2010_n employees_2012_n --> employees_2012_n
              100 100 100
              30 50 50
              80 n.v. 80
              n.v. n.v. n.
              I need a command to say "put in the value of 2010 each time 2012 is n.v.".

              Comment


              • #8
                Clyde already suggested code. Did you try it? What happened?

                Comment


                • #9
                  Yes, I tried it, but the values of the new variable x are wrong.

                  One example:
                  employees_2010_n employees_2012_n --> x
                  76 n.v. 809248

                  Comment


                  • #10
                    Please use dataex (SSC) to provide an example of problematic data, together with the exact code you have been using.

                    If you are still seeing n.v displayed for your variable, it seems that you were not using Clive's or Joseph's code from #3 and #6.

                    More generally please read http://www.statalist.org/forums/help#stata

                    Comment

                    Working...
                    X