

- #Wolfram mathematica matrix multiplication code#
- #Wolfram mathematica matrix multiplication windows#
While writing this blog post, I accidentally used the CRAN version of R. The cost of not using an optimised version of BLAS and LAPACK For example, here are the results for 1000 x 1000 matrices #Set random seed for reproducibility If that computation is being performed a lot in a tight loop (and for our real application, it was), it can add up to quite a difference.Īs the matrices get bigger, the speed-benefit in percentage terms gets lower but tcrossprod always seems to be the faster method. We are only saving microseconds here but that’s more than a factor of 4 speed-up in this small matrix case.
#Wolfram mathematica matrix multiplication code#
We time just the matrix multiplication part of the code above: microbenchmark( Let’s time the two methods using the microbenchmark package. This happens sometimes when dealing with floating point arithmetic (For example, see ). If that happens, computing the difference between the two results should convince you that all is OK and that the differences are just because of numerical noise. Sometimes, when comparing the two methods you may find that some of those entries are FALSE which may worry you! TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE In R, there is another way to do the computation c = a %*% t(b) - we can make use of the tcrossprod function (There is also a crossprod function for when you want to do t(a) %*% b) c_new = tcrossprod(a,b)
#Wolfram mathematica matrix multiplication windows#
Here, I am using version 3.3.3 of Microsoft R Open which links to Intel’s MKL (an implementation of BLAS and LAPACK) on a Windows laptop. When the speed of linear algebra computations are an issue in R, it makes sense to use a version that is linked to a fast implementation of BLAS and LAPACK and we are already doing that on our HPC system. # Multiply the matrix a by the transpose of b

One such micro-optimisation we discovered involved multiplying two matrices together where one of them needed to be transposed. 10% here and 20% there can eventually add up to something worth shouting about. Progress is being made by steadily identifying places here and there where we can do a little better. I’m working on optimising some R code written by a researcher at University of Sheffield and its very much a war of attrition! There’s no easily optimisable hotspot and there’s no obvious way to leverage parallelism.
