Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • summarizing in Stata

    Hi, I´m a rookie in using Stata and I am stuck at this point. I have an issue using the sum function. I have a data set of 4907 observations and I have encoded my data from string variables to numeric (long) variables. When opening the data editor I therefore now have the first variables in a kolonne with string values (colored yellow) and another new kolonne of generated numeric values (colored blue). As of earlier experience the data should be colored white (?).

    The numeric variable I now have called "ncomprice" is encoded by using the following command: -encode comprice, gen(ncomprice)- because they were recognized as strings

    The problem is when I am running the -sum- command on ncomprice I am not getting the mean of the values in the observations which have a range from 3,000 to 19,000 in value. Instead I get the mean or median of the number of observations, meaning I get 577,1427 when having 4907 observations. What I want is the mean of the values for each observations over time. I hope I am explaining myself good enough.

    When I list the observations there are values for each observation.

    As reading of some earlier posts you would probably like som info:

    . describe ncomprice

    storage display value
    variable name type format label variable label
    --------------------------------------------------------------------------
    ncomprice long %9.0g ncomprice
    Comprice

    . count
    1,156

    . summarize ncomprice, detail

    Comprice
    -------------------------------------------------------------
    Percentiles Smallest
    1% 12 1
    5% 58 2
    10% 116 3 Obs 1,156
    25% 288.5 4 Sum of Wgt. 1,156

    50% 577.5 Mean 577.1427
    Largest Std. Dev. 332.9293
    75% 865.5 1150
    90% 1038 1151 Variance 110841.9
    95% 1096 1152 Skewness -.0003597
    99% 1142 1153 Kurtosis 1.79956


    Can someone explain what I need to do to get the summarized results I need? I would like to get the mean of the actual value of the 4907 different observations, the standard deviation, min and max value.

    Thank you for your help in advance.

  • #2
    Guest:
    welcome to this forum.
    Stata output is telling you that you actually have 4907 observations (count 4907). The remaining statistics refer to -ncomprice-.
    Hence, what's your concern about Stata output and, more substantively, what are you looking for?
    Last edited by sladmin; 16 Nov 2020, 05:36. Reason: anonymize original poster
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Originally posted by Guest
      [...] and I have encoded my data from string variables to numeric (long) variables.
      [...]
      "ncomprice" is encoded by using the following command: -encode comprice, gen(ncomprice)- because they were recognized as strings
      Without reading much more, the following from

      Code:
      help encode
      probably applies:

      Do not use encode if varname contains numbers that merely happen to be stored as strings; instead, use generate newvar = real(varname) or destring; see real() or [D] destring.

      Comment


      • #4
        Carlo:
        Thank you. The 4907 observations have different values. They are reflecting daily bitcoin price from middle of July 2017 until 21st of September. Therefore the value for each independent observations are varying. I would like to get the result of the mean of this price for the whole time-periode, and also min and max value in addition to the standard deviation.

        It seem like Stata is not considering the value in each cell.

        Guest
        (Stata/SE 16.0)
        Last edited by sladmin; 16 Nov 2020, 05:36. Reason: anonymize original poster

        Comment


        • #5
          Your problem is not with the summarize command (not a function, by the way) but with what was done earlier.

          encode is quite wrong as a solution for prices that present as string. Your encode command will sort prices alphanumerically and then map them to integers 1 up. The question arises of why they appear as strings but even well-behaved prices will be reduced to nonsense this way. Prices say "1.00" "2.00" "10.00" "20.00" will be sorted to "1.00" "10.00" "2.00" "20.00" and then mapped to 1 2 3 4. You do not want that

          destring is what you need and you will need to look at its options such as dpcomma and ignore().

          It's quite likely that
          encode was wrongly applied to other variables too.

          The help for
          encode warns of this problem:

          Do not use
          encode if varname contains numbers that merely happen to be stored as strings; instead, use generate newvar = real(varname) or destring; see
          real() or [D] destring.
          For more detail, see https://www.stata-journal.com/articl...article=dm0098 if you have access.

          If this is not enough explanation, it would be a good idea to use dataex to give us a data example. See FAQ Advice #12.

          Comment


          • #6
            Daniel Klein:
            Thank you for your answer. When using "generate newer = real(varname) I get the following result:

            . generate ncomprice = real(comprice)
            (1,156 missing values generated)

            The value inside the first 6 cells from my data editor is written like this:

            date comprice
            sep. 21, 2020 10 462,26
            sep. 20, 2020 10 938,27
            sep. 19, 2020 11 094,35
            sep. 18, 2020 10 944,59
            sep. 17, 2020 10 948,99
            sep. 16, 2020 10 974,90

            Do you know what I am doing wrong?

            Regards
            Guest

            Comment


            • #7
              #6 crossed with #5

              Code:
              destring comprice, replace ignore(" ") dpcomma

              Comment


              • #8
                Nick Cox:
                Thank you very much, helped me solve my problem.

                You will probably be reading more from me on this forum the coming weeks. Have good one!

                Guest

                Comment


                • #9
                  Good, but note that anything else encoded is probably garbage too.

                  Comment

                  Working...
                  X