Wiki
Clone wikim4ri / M4RI-20091101
M4RI-20091101 Release Notes
M4RI-20091101 was released on November 2nd, 2009. It is available at:
http://m4ri.sagemath.org/downloads/
About M4RI
M4RI is a library for fast arithmetic with dense matrices over F2. The name M4RI comes from the first implemented algorithm: The "Method of the Four Russians" inversion algorithm published by Gregory Bard. This algorithm in turn is named after the "Method of the Four Russians" multiplication algorithm which is probably better referred to as Kronrod's method. M4RI implements asymptotically fast matrix multiplication, linear system solving, reduced row echelon forms, PLUQ factorisation and basic arithmetic. M4RI is used by the Sage mathematics software and the PolyBoRi library. M4RI is available under the General Public License Version 2 or later (GPLv2+).
New in 20091191
Faster Elimination
by Martin Albrecht and Clément Pernet
Since LQUP factorisation requires fewer column swaps in the 'sparse' case, we switched to LQUP instead of PLUQ. Furthermore, a variety of improvements allow for faster elimination of dense and sparse-ish matrices. Using the performance_regression.py script, we demonstrate the improvement on a few platforms:
sage.math (Core 2 Duo, Xeon)
('elim', 10000, 10000, 'm4ri'): 1.760 / 1.730 = 1.017 ('elim', 10000, 10000, 'pluq'): 0.810 / 0.880 = 0.920 ('elim', 10000, 20000, 'm4ri'): 9.310 / 9.290 = 1.002 ('elim', 10000, 20000, 'pluq'): 2.990 / 3.050 = 0.980 ('elim', 16384, 16384, 'm4ri'): 13.980 / 14.420 = 0.969 ('elim', 16384, 16384, 'pluq'): 3.410 / 5.670 = 0.601 ('elim', 16384, 32768, 'm4ri'): 44.760 / 44.740 = 1.000 ('elim', 16384, 32768, 'pluq'): 12.250 / 15.560 = 0.787 ('elim', 20000, 20000, 'm4ri'): 27.520 / 27.520 = 1.000 ('elim', 20000, 20000, 'pluq'): 5.620 / 6.020 = 0.934 ('elim', 20000, 40000, 'm4ri'): 81.540 / 81.440 = 1.001 ('elim', 20000, 40000, 'pluq'): 20.160 / 21.200 = 0.951 ('elim', 32000, 32000, 'm4ri'): 97.210 / 96.970 = 1.002 ('elim', 32000, 32000, 'pluq'): 23.560 / 24.860 = 0.948 ('elim', 32000, 64000, 'm4ri'): 232.700 / 232.330 = 1.002 ('elim', 32000, 64000, 'pluq'): 66.280 / 70.890 = 0.935 ('elim', 'hfe25_5', 'm4ri'): 2.250 / 2.170 = 1.037 ('elim', 'hfe25_5', 'pluq'): 2.190 / 2.700 = 0.811 ('elim', 'hfe30_5', 'm4ri'): 27.690 / 27.730 = 0.999 ('elim', 'hfe30_5', 'pluq'): 13.260 / 16.220 = 0.818 ('elim', 'hfe35_5', 'm4ri'): 116.880 / 114.140 = 1.024 ('elim', 'hfe35_5', 'pluq'): 72.230 / 85.480 = 0.845 ('elim', 'mutant_matrix', 'm4ri'): 26.630 / 26.570 = 1.002 ('elim', 'mutant_matrix', 'pluq'): 8.740 / 12.060 = 0.725 ('elim-sparse', 10000, 10000, 0.0002, 'm4ri'): 0.710 / 0.690 = 1.029 ('elim-sparse', 10000, 10000, 0.0002, 'pluq'): 1.440 / 2.930 = 0.491 ('elim-sparse', 10000, 10000, 0.0004, 'm4ri'): 0.770 / 0.750 = 1.027 ('elim-sparse', 10000, 10000, 0.0004, 'pluq'): 1.510 / 3.460 = 0.436 ('elim-sparse', 10000, 10000, 0.0006, 'm4ri'): 0.560 / 0.550 = 1.018 ('elim-sparse', 10000, 10000, 0.0006, 'pluq'): 1.520 / 3.670 = 0.414 ('elim-sparse', 10000, 10000, 0.0008, 'm4ri'): 0.560 / 0.550 = 1.018 ('elim-sparse', 10000, 10000, 0.0008, 'pluq'): 1.490 / 4.200 = 0.355 ('elim-sparse', 10000, 10000, 0.0010, 'm4ri'): 0.710 / 0.710 = 1.000 ('elim-sparse', 10000, 10000, 0.0010, 'pluq'): 1.420 / 3.150 = 0.451 ('mul', 10000): 1.480 / 1.470 = 1.007 ('mul', 16384): 5.640 / 5.560 = 1.014 ('mul', 20000): 10.530 / 10.540 = 0.999 ('mul', 32000): 39.410 / 39.280 = 1.003
prai243 (Opteron)
('elim', 10000, 10000, 'm4ri'): 2.664 / 2.660 = 1.002 ('elim', 10000, 10000, 'pluq'): 1.548 / 2.432 = 0.637 ('elim', 10000, 20000, 'm4ri'): 9.985 / 9.797 = 1.019 ('elim', 10000, 20000, 'pluq'): 6.152 / 8.253 = 0.746 ('elim', 16384, 16384, 'm4ri'): 11.737 / 12.313 = 0.953 ('elim', 16384, 16384, 'pluq'): 7.384 / 10.949 = 0.674 ('elim', 16384, 32768, 'm4ri'): 51.571 / 56.016 = 0.921 ('elim', 16384, 32768, 'pluq'): 31.022 / 34.298 = 0.904 ('elim', 20000, 20000, 'm4ri'): 23.321 / 24.162 = 0.965 ('elim', 20000, 20000, 'pluq'): 11.453 / 15.029 = 0.762 ('elim', 20000, 40000, 'm4ri'): 91.194 / 94.910 = 0.961 ('elim', 20000, 40000, 'pluq'): 65.392 / 67.060 = 0.975 ('elim', 32000, 32000, 'm4ri'): 105.739 / 103.194 = 1.025 ('elim', 32000, 32000, 'pluq'): 52.239 / 61.864 = 0.844 ('elim', 32000, 64000, 'm4ri'): 322.896 / 320.752 = 1.007 ('elim', 32000, 64000, 'pluq'): 179.067 / 212.049 = 0.844 ('elim', 'hfe25_5', 'm4ri'): 3.616 / 3.612 = 1.001 ('elim', 'hfe25_5', 'pluq'): 4.408 / 8.449 = 0.522 ('elim', 'hfe30_5', 'm4ri'): 25.178 / 24.750 = 1.017 ('elim', 'hfe30_5', 'pluq'): 26.682 / 35.242 = 0.757 ('elim', 'hfe35_5', 'm4ri'): 144.389 / 148.553 = 0.972 ('elim', 'hfe35_5', 'pluq'): 162.110 / 209.845 = 0.773 ('elim', 'mutant_matrix', 'm4ri'): 24.102 / 23.837 = 1.011 ('elim', 'mutant_matrix', 'pluq'): 19.093 / 41.243 = 0.463 ('elim-sparse', 10000, 10000, 0.0002, 'm4ri'): 2.560 / 2.436 = 1.051 ('elim-sparse', 10000, 10000, 0.0002, 'pluq'): 2.756 / 5.304 = 0.520 ('elim-sparse', 10000, 10000, 0.0004, 'm4ri'): 2.044 / 2.052 = 0.996 ('elim-sparse', 10000, 10000, 0.0004, 'pluq'): 2.672 / 5.564 = 0.480 ('elim-sparse', 10000, 10000, 0.0006, 'm4ri'): 1.504 / 1.512 = 0.995 ('elim-sparse', 10000, 10000, 0.0006, 'pluq'): 2.664 / 5.784 = 0.461 ('elim-sparse', 10000, 10000, 0.0008, 'm4ri'): 1.244 / 1.268 = 0.981 ('elim-sparse', 10000, 10000, 0.0008, 'pluq'): 2.504 / 6.784 = 0.369 ('elim-sparse', 10000, 10000, 0.0010, 'm4ri'): 1.424 / 1.496 = 0.952 ('elim-sparse', 10000, 10000, 0.0010, 'pluq'): 2.444 / 6.480 = 0.377 ('mul', 10000): 2.896 / 2.940 = 0.985 ('mul', 16384): 12.313 / 12.177 = 1.011 ('mul', 20000): 22.793 / 20.441 = 1.115 ('mul', 32000): 81.589 / 81.865 = 0.997
See also http://www.informatik.uni-bremen.de/cgi-bin/cgiwrap/malb/blosxom.pl/2009/06/27
Better Cross-Platform Support
by Martin Albrecht (with help by David Kirkby)
- dropped the check for the number of CPUs from the configure script which was unused and didn't work across platforms
- implemented optional tuning code to calculate the L1 and L2 cache size (not enabled by default yet) to replace the code in the configure script which does not work well on non-x86 platforms.
Other Changes
by Martin Albrecht
- slightly faster
mzd_transpose()
Wiki macro error: Changeset 140a8626e2ca not found. - fixed a SEGFAULT in
mzd_row_add_offset()
Wiki macro error: Changeset 90b24eba9644 not found.
Platforms
make check passes on the following platforms
- x86_64 Linux (Core2Duo, sage.math);
- x86 OSX (Core2Duo, bsd);
- ia64 Linux (iras);
- x86 Linux (VirtualBox);
- UltraSPARC T2 Solaris using GCC (t2.math.washington.edu)
- x86 Cygwin (winxp)
The code also builds in Visual Studio under Windows.
Updated