Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference between variables doesn't work

    Hello everybody,

    I've been using this forum for reading very useful tips, for which I am really greatful. I am new to Stata and I'm struggling a bit in becoming familiar with it. I have an issue which I think it is pretty stupid but it is driving me crazy.

    I have a dataset (made by numeric variables, already checked it out) and I simply need to take the difference between two variables, which I did as
    Code:
    gen dev= Actual - SmartEstimate
    but I got nosense numbers.
    Here's a screenshot of the dataset
    Click image for larger version

Name:	dataset.JPG
Views:	1
Size:	125.8 KB
ID:	1632390

    The dataset contains US macro variables ordered per date and name, I need to take the difference between the Actual value of the variabele and its estimate (SmartEstimate), but as you can see in column dev, the numbers are completely nosense. The SD is the standard deviation computed as egen SD=sd(SmartEstimate), by(Name).

    Thank you very much in advance for your help,

    Federica

  • #2
    Federica:
    the issue seems to rest on the fact that your blue numbers (in more than one sense, and sadly so) were probably obtained from-encode-, trying to convert them from -string- to numeric format (-encode- works terrible in this respect, as wisely reported in its entry in Stata .pdf manual).
    You can look at the following toy-example to have a clue:
    Code:
    . set obs 1
    number of observations (_N) was 0, now 1
    
    . g A="32"
    
    . g B="20"
    
    . help encode
    
    . encode A , generate(new_A)
    
    . encode B , generate(new_B)
    
    . g diff= new_A-new_B
    
    . generate wanted_A= real(A)
    
    . generate wanted_B= real(B)
    
     g real_diff= wanted_A-wanted_B
    
    . list
    
         +-----------------------------------------------------------------+
         |  A    B   new_A   new_B   wanted_A   wanted_B   diff   real_d~f |
         |-----------------------------------------------------------------|
      1. | 32   20      32      20         32         20      0         12 |
         +-----------------------------------------------------------------+
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Carlo Lazzaro is right, and just possibly there is more bad news. The deeper question is why was encode used in the first place, which is usually because (a) the data were imported as string and (b) someone then thought encode was the right thing to do. (a) is usually a question of non-numeric content, such as column headers, footnotes, missing value codes, somewhere in the variable. So, you may need to go back to the original and see what's what. Usually destring is a better bet than encode

      Comment


      • #4
        Dear Nick Cox and Carlo Lazzaro , thank you very much for your reply! The destring command finally works and creates a numeric variable which allows me to take the summation.
        Sorry if it takes too long for my reply, now I will be paying more attention to the forum notification.

        Comment


        • #5
          Hello everybody, I am also a novice and before post anything in this forum I'll try at least to find some solution in BoK already in place. But Nick Cox, Carlo Lazzaro and Federica Vassalli I'm facing quite the same problem. I'm coping with a DB downloaded by Eurostat website, and, also checking for footnotes and similar (also deleting varnames or using the option firstrow in importing procedure) I get a column of number as string (type "str8", format "%9s") and I'm able to put it numbers only using the "encode" command. With the "destring" command I get the following message:

          ". destring (Workers), generate (Workforce)
          Workers: contains nonnumeric characters; no generate"

          I did the same also with "replace" option and got quite the same output.
          I really don't understand why this is so difficult. Thanks

          Davide

          Comment


          • #6
            Davide:
            welcome to this forum.
            Before going -destring- you should probably clean the variable of interest from nonnumeric characters using some tool available from -help string functions-.
            That said, -encode- should be handled with care.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              There are many twists here and here's another. encode was in Stata long before destring, but destring was first written because for one reason or another import into Stata was resulting in string variables that should have been numeric. Now (perhaps) there is more need in some quarters for destring than for encode.

              If destring baulks, you just need to know why it is doing that, which boils down to finding the non-numeric characters it is telling you about. That is easier than you might fear because


              Code:
              tab Workers if missing(real(Workers))
              shows you the values that can't be converted to numeric if you push the strings through real(). Sometimes the output may be copious, but there is one simple thing you need to fix. Sometimes the output is simple and there is one simple thing you need to fix. Sometimes ... Well, don't expect the complete truth here.

              Comment


              • #8
                Well, I got it! Thanks a lot Nick Cox and Carlo Lazzaro, and actually it was a trivial issue of order of commands. There was 3 records with values not specified, but if run command for selecting records before to run the destring command everything goes right, if I do the opposite and try to change data first the problem is there and it had to be resolve in a way that in my case is totally unnecessary. So, for the future, first select right/needed data/records, and after operate on it!
                Thanks a lot!

                Comment

                Working...
                X