Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mata find minimum distance across matrices

    Dear Statalists,

    I am very new to Mata and I am trying to solve a problem that involves finding the minimum distance index.

    Code:
    sysuse auto, clear
    mata:
        st_view(A=.,.,("weight","price","gear_ratio"))
        st_view(B=.,.,("mpg","length"))
        st_view(C=.,.,("displacement","gear_ratio"))
    end
    So in the above example, suppose we have three matrices A,B, and C. For each element in A, I would like to find the closest point in B (in absolute value) and record the position index, then I want to find the value in the same position in C. For example, suppose A[1,1]=23, the element in B that is closest to 23 is B[10,2]=22.9, then in a new Matrix of the same size with A, call it D, we should have D[1,1]=C[10,2]. We will do so for each element of A.

    I find the logic quite simple but get stuck with carrying it out in code. My idea is to use for loop and minindex, but I get it wrong over and over again.
    One of my shot is
    Code:
    sysuse auto, clear
    mata:
        st_view(A=.,.,("weight","price","gear_ratio"))
        st_view(B=.,.,("mpg","length"))
        st_view(C=.,.,("displacement","gear_ratio"))
        for (i=1;i<=rows(A);i++)
        {
            for (j=1;j<=cols(A);j++)
            {
                D = abs(B:-A[i,j])
                for (n=1;n<=rows(D);n++)
                {
                    minindex(D(n,.),1,in,w)
                }
            }
        }
    end
    which returns me, "<istmt>: 3499 D() not found".

    Any suggestions will be welcomed! I am particularly curious about whether we can work it out without calling the for loop. Calling so many for loop seems to be very inefficient.

  • #2
    I can discern and suggest solutions to the syntax errors, but at this point in my evening cannot improve on the basic code. The two problems are that all arguments to functions need to be declared, including those being returned by the function, and subscripts are surrounded by brackets "[ ]" not parentheses "( )"
    Code:
    sysuse auto, clear
    sysuse auto, clear
    mata:
        st_view(A=.,.,("weight","price","gear_ratio"))
        st_view(B=.,.,("mpg","length"))
        st_view(C=.,.,("displacement","gear_ratio"))
      // declare arguments to receive returned values, type doesn't matter, these will be replaced
        in = .
        w = .
        for (i=1;i<=rows(A);i++)
        {
            for (j=1;j<=cols(A);j++)
            {
                D = abs(B:-A[i,j])
                for (n=1;n<=rows(D);n++)
                {
                    minindex(D[n,.],1,in,w)
                }
                
            }
        }
    end
    Last edited by William Lisowski; 27 Jun 2021, 19:29.

    Comment


    • #3
      Behind the scenes, I think minindex() is sorting the data, so it can be inefficient or at least inelegant to call it to sort the same data many times. How about you turn B and C into column vectors and then reorder then both in parallel so that B is sorted. Then the job is just to find which pair of rows of B each element of A falls between. You don't really need to record the position indexes.

      Something like this:

      Code:
      A = runiform(2,3)
      B = runiform(2,3)
      C = runiform(2,3)
      
      vecB = colshape(B, 1)
      p = order(vecB, 1)
      vecB = vecB[p]
      vecC = colshape(C, 1)[p]
      
      D = J(rows(A), cols(B), .)
      for (i=rows(A); i; i--)
        for (j=cols(A); j; j--) {
          t = sum(A[i,j] :>= vecB)
          if (t) {
            if (t < length(vecB))
              if (vecB[t+1] - A[i,j] < A[i,j] - vecB[t])
                t++
          } else
            t = 1
          D[i,j] = vecC[t]
        }
      Last edited by David Roodman; 27 Jun 2021, 20:50.

      Comment


      • #4
        Originally posted by William Lisowski View Post
        I can discern and suggest solutions to the syntax errors, but at this point in my evening cannot improve on the basic code. The two problems are that all arguments to functions need to be declared, including those being returned by the function, and subscripts are surrounded by brackets "[ ]" not parentheses "( )"
        Code:
        sysuse auto, clear
        sysuse auto, clear
        mata:
        st_view(A=.,.,("weight","price","gear_ratio"))
        st_view(B=.,.,("mpg","length"))
        st_view(C=.,.,("displacement","gear_ratio"))
         // declare arguments to receive returned values, type doesn't matter, these will be replaced
        in = .
        w = .
        for (i=1;i<=rows(A);i++)
        {
        for (j=1;j<=cols(A);j++)
        {
        D = abs(B:-A[i,j])
        for (n=1;n<=rows(D);n++)
        {
        minindex(D[n,.],1,in,w)
        }
        
        }
        }
        end
        Thanks for pointing out the syntax errors! I am naive in Mata and get struggle with syntax errors.

        Comment


        • #5
          Originally posted by David Roodman View Post
          Behind the scenes, I think minindex() is sorting the data, so it can be inefficient or at least inelegant to call it to sort the same data many times. How about you turn B and C into column vectors and then reorder then both in parallel so that B is sorted. Then the job is just to find which pair of rows of B each element of A falls between. You don't really need to record the position indexes.

          Something like this:

          Code:
          A = runiform(2,3)
          B = runiform(2,3)
          C = runiform(2,3)
          
          vecB = colshape(B, 1)
          p = order(vecB, 1)
          vecB = vecB[p]
          vecC = colshape(C, 1)[p]
          
          D = J(rows(A), cols(B), .)
          for (i=rows(A); i; i--)
          for (j=cols(A); j; j--) {
          t = sum(A[i,j] :>= vecB)
          if (t) {
          if (t < length(vecB))
          if (vecB[t+1] - A[i,j] < A[i,j] - vecB[t])
          t++
          } else
          t = 1
          D[i,j] = vecC[t]
          }
          Many thanks to you! This is exactly what I want. Very brilliant!
          I have also thought to translate B and C into column vectors, but I was still trying to use "minindex" to find the position index in the new column vector B, not to reorder new B and C and to locate elements of A in the new column vector B.
          Thanks again!

          Comment


          • #6
            whoops, "J(rows(A), cols(B), .)" should really be "J(rows(A), cols(A), .)"

            Ah, but even better is this. Precompute the halfway points between those sorted elements of B. Then you just need to count how many of those each element of A exceeds:

            Code:
            A = runiform(2,3)
            B = runiform(2,3)
            C = runiform(2,3)
            
            vecB = colshape(B, 1)
            p = order(vecB, 1)
            vecC = colshape(C, 1)[p]
            
            vecB = vecB[p]
            vecB = 0.5 * (vecB[|2\.|] + vecB[|.\rows(vecB)-1|])
            
            D = A
            for (i=rows(A); i; i--)
              for (j=cols(A); j; j--)
                D[i,j] = vecC[1 + sum(A[i,j] :> vecB)]
            If speed were a big issue, which it probably isn't, one could replace the logic in the final line with a binary search through vecC.

            Comment


            • #7
              Thanks again, David. I corrected the cols(B) typo in my own code before.

              May I ask you another question about Mata function. I am not experienced with Mata and have spent lots of time searching the forum but did not find a solution.

              My project involves Mata to speed up, part of the code uses the code you provide above. The code also involves a loop outside the Mata part, which as you probably know, breaks when the "end" of Mata appears. So I am trying to put the Mata part outside the loop and save it as a Mata function. But it did not work out. My code is
              Code:
              use "${Working_data}/TRARS_Unit_Density.dta", clear
              glevelsof GroupID, local(groupid)
              foreach n of local groupid{
                  use "${Working_data}/TRARS_Unit_Density.dta", clear     // First dataset
              
                  keep FIPS UnitId GroupID AppYield beta_NASS Res_NASS CDF
                  keep if GroupID ==`n'
              
                  su FIPS
                  local fipsn = r(mean)
              
                  mata:
                      st_view(beta=.,.,("UnitId","beta_NASS","AppYield"))     // Get the data from the Stata dataset
                      beta = uniqrows(beta)
                      st_view(r=.,.,"Res_NASS")
              
                      X1 = J(rows(r),rows(beta),.)
                      for (x=1; x<=rows(beta); x++)
                      {    
                          X1[.,x] = (1-0.75)*beta[x,3]/100/beta[x,3]:+ r/beta[x,2]
                      }
                      mata matsave ${Working_data} X1, replace
                  end
              
                  use  "${Working_data}/AYP_ind_exp1.dta", clear   // A new dataset
                  keep if FIPS == `fipsn'
                  mata:
                      st_view(A1=.,.,("County_APH"))                        
                      st_view(A2=.,.,("Exp_c"))
                      st_view(A3=.,.,("CountyYield_Pred_2008"))
              
                      X1 = -(X1:-A3[1,1])                                    //X1 is saved by the above Mata part
                      p = order(A1,1)
                      A1 = A1[p]
                      A2 = A2[p]
              
                      B = J(rows(X1),cols(X1),.)
                      for (i=rows(X1);i;i--){
                          for (j=cols(X1);j;j--){
                              t=sum(X1[i,j]:>=A1)
                              if (t){
                                  if (t<length(A1))
                                  if (abs(A1[t+1]-X1[i,j])<abs(X1[i,j]-A1[t]))
                                  t++
                              }else
                              t=1
                              B[i,j]=A2[t]
                          }
                      }
                  end
              }
              The code works perfectly fine without the "foreach" loop. But as you see, the "foreach" loop breaks with the "end" line of the first Mata block. I tried to save this Mata part as a Mata function
              Code:
                  
              mata:
                      function matchc(ind1,ind2,ind3,ind4)
                      {
                      st_view(beta=.,.,(ind1,ind2,ind3))
                      beta = uniqrows(beta)
                      st_view(r=.,.,ind4)
              
                      X1 = J(rows(r),rows(beta),.)
                      for (x=1; x<=rows(beta); x++)
                      {    
                          X1[.,x] = (1-0.75)*beta[x,3]/100/beta[x,3]:+ r/beta[x,2]
                      }
                      return(X1)
                  }
                  end
              and then call it inside the foreach loop by
              Code:
              mata: X1 = matchc(UnitId,beta_NASS,AppYield,Res_NASS)
              Stata complains "<istmt>: 3499 UnitId not found".

              My question is that how should I construct the Mata function and then call it inside the foreach loop.

              Also, as you can see, there is a second Mata block inside the foreach loop, which uses the code you provide above (thanks again!). This Mata block uses the matrix "X1" saved in the first Mata block. I think this part should also be recoded as a Mata function, but I am not able to work it out, especially connect it with the first one.

              The post is long and I hope I have made my point clear. Thanks!
              Last edited by Alpha Che; 28 Jun 2021, 15:58.

              Comment


              • #8
                Oh, I think you just need to do:
                Code:
                mata: X1 = matchc("UnitId","beta_NASS","AppYield","Res_NASS")
                For Mata, those variable names are nothing more than strings, so you need to put them in quotes.

                Since Mata allows string vectors, you could also write it to take just two arguments:
                Code:
                mata:
                function matchc(ind1, ind2) {
                    st_view(beta=.,.,ind1)
                    beta = uniqrows(beta)
                    st_view(r=.,.,ind2)
                    X1 = J(rows(r),rows(beta),.)
                    for (x=1; x<=rows(beta); x++)
                        X1[.,x] = (1-0.75)*beta[x,3]/100/beta[x,3]:+ r/beta[x,2]
                    return(X1)
                }
                end
                which you should then be able to call with
                Code:
                mata: X1 = matchc(("UnitId", "beta_NASS", "AppYield"), "Res_NASS")
                And yes, if I understand your second question, it does seem like the second Mata block in the loop would also cause an error. You can put it inside a function in the same way, and that's probably good style. A quick-and-dirty alternative is to prefix every line with "mata" like this, using my second solution above:

                Code:
                    mata st_view(A1=.,.,"County_APH")
                    mata st_view(A2=.,.,"Exp_c")
                    mata st_view(A3=.,.,"CountyYield_Pred_2008")
                
                    mata B = X1 = -(X1:-A3[1,1])                                   //X1 is saved by the above Mata part
                    mata p = order(A1,1); A1 = A1[p]; A2 = A2[p]
                    mata A1 = 0.5 * (A1[|2\.|] + A1[|.\rows(A1)-1|])
                
                    mata for (i=rows(X1);i;i--) for (j=cols(X1);j;j--) B[i,j] = A2[1 + sum(X1[i,j] :> A1)]
                Last edited by David Roodman; 28 Jun 2021, 16:35.

                Comment


                • #9
                  Thank you so much for your help. I think now I am very close to the destination. But before that, another problem arises.

                  I have tried to code both Mata blocks into Mata functions, the first one works perfectly and returns the matrix "X1" that I want. But the second one returns an empty matrix "B"
                  Code:
                  clear all
                  mata:        
                      function matchc(ind1,ind2)        
                      {        
                          st_view(beta=.,.,ind1)        
                          beta = uniqrows(beta)        
                          st_view(r=.,.,ind2)          
                          X1 = J(rows(r),rows(beta),.)        
                          for (x=1; x<=rows(beta); x++)        
                           {                
                               X1[.,x] = (1-0.75)*beta[x,3]/100/beta[x,3]:+ r/beta[x,2]        
                               }        
                               return(X1)    
                               }    
                  end
                  
                  mata:
                      function matchi(ine1,ine2,ine3)
                      {
                      st_view(A1=.,.,ine1)
                      st_view(A2=.,.,ine2)
                      st_view(A3=.,.,ine3)
                  
                      X1 = -(X1:-A3[1,1])
                      p = order(A1,1)
                      A1 = A1[p]
                      A2 = A2[p]
                  
                      B = J(rows(X1),cols(X1),.)
                      for (i=rows(X1);i;i--){
                          for (j=cols(X1);j;j--){
                              t=sum(X1[i,j]:>=A1)
                              if (t){
                                  if (t<length(A1))
                                  if (abs(A1[t+1]-X1[i,j])<abs(X1[i,j]-A1[t]))
                                  t++
                              }else
                              t=1
                              B[i,j]=A2[t]
                          }
                      }
                      return(B)
                      }
                  end
                  
                      use "${Working_data}/TRARS_Unit_Density.dta", clear
                      keep FIPS UnitId GroupID AppYield beta_NASS Res_NASS CDF
                  
                      keep if GroupID ==170011
                      su FIPS
                      local fipsn = r(mean)
                  
                      mata: X1 = matchc(("UnitId","beta_NASS","AppYield"),"Res_NASS")
                      use  "${Working_data}/AYP_ind_exp1.dta", clear
                      keep if FIPS == `fipsn'
                      mata: X1                                                           // Can see that X1 what I needed
                    
                      mata: B = matchi("County_APH","Exp_c","CountyYield_Pred_2008")
                      mata: B                                                            // B is empty
                  But the code works if I do not use a second Mata function but keeps it as a Mata block (no for loop involved).
                  Code:
                  mata:        
                      function matchc(ind1,ind2)        
                      {        
                          st_view(beta=.,.,ind1)        
                          beta = uniqrows(beta)        
                          st_view(r=.,.,ind2)          
                          X1 = J(rows(r),rows(beta),.)        
                          for (x=1; x<=rows(beta); x++)        
                           {                
                               X1[.,x] = (1-0.75)*beta[x,3]/100/beta[x,3]:+ r/beta[x,2]        
                               }        
                               return(X1)    
                               }    
                  end
                  
                      use "${Working_data}/TRARS_Unit_Density.dta", clear
                      keep FIPS UnitId GroupID AppYield beta_NASS Res_NASS CDF
                  
                      keep if GroupID ==170011
                      su FIPS
                      local fipsn = r(mean)
                  
                      mata: X1 = matchc(("UnitId","beta_NASS","AppYield"),"Res_NASS")
                      use  "${Working_data}/AYP_ind_exp1.dta", clear
                      keep if FIPS == `fipsn'
                      mata: X1
                                                                          
                      mata:
                          st_view(A1=.,.,("County_APH"))
                          st_view(A2=.,.,("Exp_c"))
                          st_view(A3=.,.,("CountyYield_Pred_2008"))
                  
                          X1 = -(X1:-A3[1,1])
                          p = order(A1,1)
                          A1 = A1[p]
                          A2 = A2[p]
                  
                          B = J(rows(X1),cols(X1),.)
                          for (i=rows(X1);i;i--){
                              for (j=cols(X1);j;j--){
                                  t=sum(X1[i,j]:>=A1)
                                  if (t){
                                      if (t<length(A1))
                                      if (abs(A1[t+1]-X1[i,j])<abs(X1[i,j]-A1[t]))
                                      t++
                                  }else
                                  t=1
                                  B[i,j]=A2[t]
                              }
                          }
                      end
                      Mata: B                                       // B is not empty.
                  Can you help me check where the mistake is? Do I miss anything in connecting the two Mata functions?
                  Sorry that I have not checked your new code for the second Mata block. I want to focus on the most emergent issue and I will definitely study these new codes to learn more from you. Thanks!
                  Last edited by Alpha Che; 28 Jun 2021, 17:47.

                  Comment


                  • #10
                    It seems like it should work.... I would start putting dump statements inside the function(s), analogous to your "mata: X1" to see if you can figure out what is going on.

                    Comment


                    • #11
                      Thanks. I will try it.

                      Comment

                      Working...
                      X