Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to manage missing values in my dataset?

    Good morning.
    I have 10 columns from A to J [Year, Industry, Ticker, EVA, Size, Leverage, growth, Por, Volatility, Intangible assets)
    In my column EVA, Size, Leverage, growth, Por, Volatility, Intangible assets, I have missing values (#VALUE!, #N/A N/A, #DIV/O!)
    How to manage this in Stata?
    Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	61.9 KB
ID:	1603208


    Attached Files
    Last edited by Lucas Bordure; 13 Apr 2021, 02:13.

    Lucas BORDURE
    Student MSc. in International Finance
    Rennes School of Business
    Stata SE 16.1

  • #2
    Stata uses a period (.) as the default missing. There are auto manually defined missings that users can choose, ranging from .a to .z, 26 of them. Unless the users need to identify the types of missing, most of the time just using the default one is sufficient.

    You can use -browse- in Stata to open a spreadsheet view. The black numbers are numeric and they seem to be corrected imported, and if you scroll down you'll see some cells with just a period in it, those are the default missing.

    You can also see some red numbers, those are string variables (akin to character) and currently cannot be used as numbers. Those "#N/A N/A" were interpreted as character, causing it to be imported as text. And you may recover the numerical data using the codes:
    Code:
    gen growth_num = real(GROWTH)
    gen intangible_asset_num = real(INTANGIBLEASSETS)
    or:
    Code:
    destring GROWTH, gen(growth_num2) force
    destring INTANGIBLEASSETS, gen(intangible_asset_num2) force
    To learn more about missing in Stata, check out: https://www.stata.com/manuals/u12.pd...1Missingvalues and https://www.stata.com/manuals/dmissingvalues.pdf
    Last edited by Ken Chui; 13 Apr 2021, 06:45.

    Comment


    • #3
      Thank you for your reply.
      I can see period (.) in many columns right now but I still observe #N/A N/A for GROWTH and INTANGIBLEASSETS.
      Attached Files

      Lucas BORDURE
      Student MSc. in International Finance
      Rennes School of Business
      Stata SE 16.1

      Comment


      • #4
        I can see period (.) in many columns right now but I still observe #N/A N/A for GROWTH and INTANGIBLEASSETS.
        Of course you do, they are still their old selves. The new numerical versions of them are the two new varaibles "growth_num" and "intangible_asset_num". If you plan to do any statistical summery, use these two new ones.

        As for GROWTH and INTAGIBLEASSETS, there is no point to replace those "NAs" into empty. That will not change their nature of being a string variable which is unsuitable for many statistical summary/analysis.

        Just as an example, try run:
        Code:
        mean GROWTH
        mean growth_num
        If for any reason you plan to stick to the old names, then delete the old variable, and give that name to the new one. For example:
        Code:
        drop GROWTH
        rename growth_num GROWTH

        Comment


        • #5
          Ok. Thank you a lot for your explanations Ken Chui. You helped me a lot!
          Have a nice day.

          Lucas BORDURE
          Student MSc. in International Finance
          Rennes School of Business
          Stata SE 16.1

          Comment

          Working...
          X