No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Standardization / z-transformation of matrix

    Hi there,
    I want to standardize a large matrix (several 1000 rows and colums), so that it it has a mean of 0 and a standard deviation of 1.
    I can however not figure out how to do that.
    Can somebody help me how to do it?
    I would be greatful for any help!

  • #2
    That means, that I need to take the following steps:
    1. calculate the mean over all elements to subtract it from every element (so that the new mean is thatn 0)
    2. calculate the standard deviation of all elements and divide all elements by this standard deivation (so that the new standard deviation is 1)

    I however do not know how to do that ;_)


    • #3
      Let X be your original matrix. One way to accomplish your end would be to first create X as a vector:
      Xvector = vec(X)
      M = mean(Xvector)
      SD = sqrt(variance(Xvector))
      Z = (X:-M)/SD  // Z is the standardized matrix


      • #4
        Good Morning,

        thank you very much for your advise!

        Unfortunately, I had another problem with this solution.
        The first line works, if I modify it using quotes as follows:
        . Xvector = vec("B")

        The second line however does not work
        . M = mean(Xvector)
        mean(): 3253 Xvector[1,1] found where real required
        <istmt>: - function returned error

        Can anybody help me with this? What might be wrong here?



        • #5
          Stata is complaining that your vector does not contain all real values. Can you show the result of

          Edit: 1000+ columns, no need to show this. Just define another vector

          Xvector2= Re(Xvector)
          and compare this to Xvector. If you have the same number of elements and they are identical, use Xvector2 in place of Xvector.
          Last edited by Andrew Musau; 05 Dec 2019, 05:42.


          • #6
            Thank you for that advise.
            Unfortunately I get another error message

            : Xvector = vec("distance_hd")
            : Xvector2= Re(Xvector)
            Re(): 3251 nonnumeric found where numeric required
            <istmt>: - function returned error

            Maybe, the problem is, that my vector is just too long? It has 16.540.489 Elements ....
            Or should this work also for such long vectors?

            Thank you very much everyvbody for your help!


            • #7
              Maybe, the problem is, that my vector is just too long?
              No. You cannot use a string matrix here. It just has to contain numbers. Why do you have nonnumeric characters in your matrix "distance_hd"?


              • #8
                Xvector = vec(B)
                instead of
                Xvector = vec("B")
                Kind regards



                • #9
                  Thank you everybody for your help!

                  @ Niels: The second version does not work.

                  @ Andrew: I thought, that I only have numbers in my matrix. Unfortunately, I cannot control all the 16.540.489 elements. But from the way they were constructed and based on the review of the sample, there should be only numbers in the matrix.

                  Sorry, I cannot figure out how to solve this problem.

                  My colleague meanwhile found a (less elegant and more code intensive) solution by transforming the matrix into variables (using svmat), the performing the necessary operations and re-transforming the data into a matrix (using mkmat). Therefore, our concrete problem is meanwhile solved - though it would of course be nice to know why it does not work using the matrix & vector-solution ... Any further suggestions?
                  Unfortunately, all my attempts did not work out ;-)

                  Thank you everybody for your help!


                  • #10
                    My colleague meanwhile found a (less elegant and more code intensive) solution by transforming the matrix into variables (using svmat)
                    I would have done the same, at least to check out what these non-numeric characters are. I do not have the expertise to do this in Mata but it should be very easy to diagnose in Stata.


                    • #11
                      The second line is your wrong code. And you can't mix strings and numbers in a mata matrix. So all values are either strings or numbers.
                      Just check one by:
                      or use the eltype function.
                      What happens in your code
                      Xvector = vec("B")
                      is that you create a 1 by 1 matrix with the value "B"
                      Kind regards