Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Max value changes because of different format

    Dear Statalist,

    This is my first time on this forum and untill now I have found a lot of answers for my question on here.
    However, this time I'm struggling with a question where I can't find the answer for on this forum.
    I am using panel data with different companies in different countries and years. I want to make a dummy variable called Dominant, where a firm is dominant if it has the highest net sales out of its industry in its country by year. So per year and industry one firm should be assigned an 1. I did this by using the following commands:
    Code:
    egen Max_Sales = max( Net_Sales ), by( cntrycde SICCODE year)
    gen Dominant=1 if Net_Sales== Max_Sales
    replace Dominant=0 if Dominant==.
    When the net sales aren't too big this works perfectly. However, when the net sales are very big they will be described as 1.46e+09 for example. At this point stata doesn't recognize the max sales as the same value as the net sales:
    Click image for larger version

Name:	example1.png
Views:	1
Size:	31.3 KB
ID:	1499334




    The selected part is year 2010 for South Korean firms with the same industry code. The biggest value of net sales (8.32e+09) is the same for the max_sales variable I created, but still the Dominant value in row 21151 doesn't get a 1.

    I tried to fix this by changing the format and therefore using the following commands:
    Code:
    format %15.0g Net_Sales
    egen Max_Sales = max( Net_Sales ), by( cntrycde SICCODE year)
    format %15.0g Max_Sales
    gen Dominant=1 if Net_Sales== Max_Sales
    replace Dominant=0 if Dominant==.
    Then I can see that Stata changes the value instead of copying the exact maximum value of net sales to my created variable Max_Sales. As you can see in the first column and the last column, where observation 21152 still does not get a 1 for Dominant while this should be the case:
    Click image for larger version

Name:	example 2.png
Views:	1
Size:	29.9 KB
ID:	1499335




    I hope that my problem is clear, I need the Max_Sales value in row 21152 to be exactly the same as the Net_sales variable but I can't figure out how to get this.

    Any ideas would be highly appreciated.

    Thank you in advance,
    Lotte

  • #2
    Welcome to Statalist.

    Your diagnosis of the problem is not quite correct. You apparently are confusing the precision with which Stata stores the variable, which is a function of its storage type, and the precision with which it displays the variable, which is a function of its display format.

    I believe that if you do
    Code:
    describe Net_Sales
    you will find that it is stored as a double rather than a float. But by default MaxSales is being created as a float rather than a double. This results in a loss of precision, as discussed in the output of help precision. From the table further down in this post, you will see that the largest integer that can be stored exactly in a float variable is 16,777,216. Numbers larger than that are stored with possible "rounding" (base-2 rounding) errors.

    Try changing your command to
    Code:
    egen double Max_Sales = max( Net_Sales ), by( cntrycde SICCODE year)
    and see if that takes care of the problem.



    Here are the limits on storage of decimal integers with full accuracy in the various numeric storage types. The fixed-point variables lose the 27 largest positive values to missing value codes; the similar loss for floating point variables occurs only for the largest exponent, so it doesn't affect the much smaller integer values.
    byte - 7 bits -127 100
    int - 15 bits -32,767 32,740
    long - 31 bits -2,147,483,647 2,147,483,620
    float - 24 bits -16,777,216 16,777,216
    double - 53 bits -9,007,199,254,740,992 9,007,199,254,740,992


    To improve future posts, please take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ.

    With dataex rather than a screen shot, we would have immediately seen how Net_Sales and Max_Sales are stored.

    The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

    Comment


    • #3
      Thank you so much!
      With the 'double' added in my command everything is perfect now!

      Comment

      Working...
      X