M4RI-20091101 Release Notes

M4RI-20091101 was released on November 2nd, 2009. It is available at:

http://m4ri.sagemath.org/downloads/

About M4RI

M4RI is a library for fast arithmetic with dense matrices over F2. The name M4RI comes from the first implemented algorithm: The "Method of the Four Russians" inversion algorithm published by Gregory Bard. This algorithm in turn is named after the "Method of the Four Russians" multiplication algorithm which is probably better referred to as Kronrod's method. M4RI implements asymptotically fast matrix multiplication, linear system solving, reduced row echelon forms, PLUQ factorisation and basic arithmetic. M4RI is used by the Sage mathematics software and the PolyBoRi library. M4RI is available under the General Public License Version 2 or later (GPLv2+).

New in 20091191

Faster Elimination

by Martin Albrecht and Clément Pernet

Since LQUP factorisation requires fewer column swaps in the 'sparse' case, we switched to LQUP instead of PLUQ. Furthermore, a variety of improvements allow for faster elimination of dense and sparse-ish matrices. Using the performance_regression.py script, we demonstrate the improvement on a few platforms:

sage.math (Core 2 Duo, Xeon)

          ('elim', 10000, 10000, 'm4ri'):   1.760 /   1.730 = 1.017
          ('elim', 10000, 10000, 'pluq'):   0.810 /   0.880 = 0.920
          ('elim', 10000, 20000, 'm4ri'):   9.310 /   9.290 = 1.002
          ('elim', 10000, 20000, 'pluq'):   2.990 /   3.050 = 0.980
          ('elim', 16384, 16384, 'm4ri'):  13.980 /  14.420 = 0.969
          ('elim', 16384, 16384, 'pluq'):   3.410 /   5.670 = 0.601
          ('elim', 16384, 32768, 'm4ri'):  44.760 /  44.740 = 1.000
          ('elim', 16384, 32768, 'pluq'):  12.250 /  15.560 = 0.787
          ('elim', 20000, 20000, 'm4ri'):  27.520 /  27.520 = 1.000
          ('elim', 20000, 20000, 'pluq'):   5.620 /   6.020 = 0.934
          ('elim', 20000, 40000, 'm4ri'):  81.540 /  81.440 = 1.001
          ('elim', 20000, 40000, 'pluq'):  20.160 /  21.200 = 0.951
          ('elim', 32000, 32000, 'm4ri'):  97.210 /  96.970 = 1.002
          ('elim', 32000, 32000, 'pluq'):  23.560 /  24.860 = 0.948
          ('elim', 32000, 64000, 'm4ri'): 232.700 / 232.330 = 1.002
          ('elim', 32000, 64000, 'pluq'):  66.280 /  70.890 = 0.935
             ('elim', 'hfe25_5', 'm4ri'):   2.250 /   2.170 = 1.037
             ('elim', 'hfe25_5', 'pluq'):   2.190 /   2.700 = 0.811
             ('elim', 'hfe30_5', 'm4ri'):  27.690 /  27.730 = 0.999
             ('elim', 'hfe30_5', 'pluq'):  13.260 /  16.220 = 0.818
             ('elim', 'hfe35_5', 'm4ri'): 116.880 / 114.140 = 1.024
             ('elim', 'hfe35_5', 'pluq'):  72.230 /  85.480 = 0.845
       ('elim', 'mutant_matrix', 'm4ri'):  26.630 /  26.570 = 1.002
       ('elim', 'mutant_matrix', 'pluq'):   8.740 /  12.060 = 0.725
('elim-sparse', 10000, 10000, 0.0002, 'm4ri'):   0.710 /   0.690 = 1.029
('elim-sparse', 10000, 10000, 0.0002, 'pluq'):   1.440 /   2.930 = 0.491
('elim-sparse', 10000, 10000, 0.0004, 'm4ri'):   0.770 /   0.750 = 1.027
('elim-sparse', 10000, 10000, 0.0004, 'pluq'):   1.510 /   3.460 = 0.436
('elim-sparse', 10000, 10000, 0.0006, 'm4ri'):   0.560 /   0.550 = 1.018
('elim-sparse', 10000, 10000, 0.0006, 'pluq'):   1.520 /   3.670 = 0.414
('elim-sparse', 10000, 10000, 0.0008, 'm4ri'):   0.560 /   0.550 = 1.018
('elim-sparse', 10000, 10000, 0.0008, 'pluq'):   1.490 /   4.200 = 0.355
('elim-sparse', 10000, 10000, 0.0010, 'm4ri'):   0.710 /   0.710 = 1.000
('elim-sparse', 10000, 10000, 0.0010, 'pluq'):   1.420 /   3.150 = 0.451
                          ('mul', 10000):   1.480 /   1.470 = 1.007
                          ('mul', 16384):   5.640 /   5.560 = 1.014
                          ('mul', 20000):  10.530 /  10.540 = 0.999
                          ('mul', 32000):  39.410 /  39.280 = 1.003

prai243 (Opteron)

          ('elim', 10000, 10000, 'm4ri'):   2.664 /   2.660 = 1.002                                 
          ('elim', 10000, 10000, 'pluq'):   1.548 /   2.432 = 0.637                                 
          ('elim', 10000, 20000, 'm4ri'):   9.985 /   9.797 = 1.019
          ('elim', 10000, 20000, 'pluq'):   6.152 /   8.253 = 0.746
          ('elim', 16384, 16384, 'm4ri'):  11.737 /  12.313 = 0.953
          ('elim', 16384, 16384, 'pluq'):   7.384 /  10.949 = 0.674
          ('elim', 16384, 32768, 'm4ri'):  51.571 /  56.016 = 0.921
          ('elim', 16384, 32768, 'pluq'):  31.022 /  34.298 = 0.904
          ('elim', 20000, 20000, 'm4ri'):  23.321 /  24.162 = 0.965
          ('elim', 20000, 20000, 'pluq'):  11.453 /  15.029 = 0.762
          ('elim', 20000, 40000, 'm4ri'):  91.194 /  94.910 = 0.961
          ('elim', 20000, 40000, 'pluq'):  65.392 /  67.060 = 0.975
          ('elim', 32000, 32000, 'm4ri'): 105.739 / 103.194 = 1.025
          ('elim', 32000, 32000, 'pluq'):  52.239 /  61.864 = 0.844
          ('elim', 32000, 64000, 'm4ri'): 322.896 / 320.752 = 1.007
          ('elim', 32000, 64000, 'pluq'): 179.067 / 212.049 = 0.844
             ('elim', 'hfe25_5', 'm4ri'):   3.616 /   3.612 = 1.001
             ('elim', 'hfe25_5', 'pluq'):   4.408 /   8.449 = 0.522
             ('elim', 'hfe30_5', 'm4ri'):  25.178 /  24.750 = 1.017
             ('elim', 'hfe30_5', 'pluq'):  26.682 /  35.242 = 0.757
             ('elim', 'hfe35_5', 'm4ri'): 144.389 / 148.553 = 0.972
             ('elim', 'hfe35_5', 'pluq'): 162.110 / 209.845 = 0.773
       ('elim', 'mutant_matrix', 'm4ri'):  24.102 /  23.837 = 1.011
       ('elim', 'mutant_matrix', 'pluq'):  19.093 /  41.243 = 0.463
('elim-sparse', 10000, 10000, 0.0002, 'm4ri'):   2.560 /   2.436 = 1.051
('elim-sparse', 10000, 10000, 0.0002, 'pluq'):   2.756 /   5.304 = 0.520
('elim-sparse', 10000, 10000, 0.0004, 'm4ri'):   2.044 /   2.052 = 0.996
('elim-sparse', 10000, 10000, 0.0004, 'pluq'):   2.672 /   5.564 = 0.480
('elim-sparse', 10000, 10000, 0.0006, 'm4ri'):   1.504 /   1.512 = 0.995
('elim-sparse', 10000, 10000, 0.0006, 'pluq'):   2.664 /   5.784 = 0.461
('elim-sparse', 10000, 10000, 0.0008, 'm4ri'):   1.244 /   1.268 = 0.981
('elim-sparse', 10000, 10000, 0.0008, 'pluq'):   2.504 /   6.784 = 0.369
('elim-sparse', 10000, 10000, 0.0010, 'm4ri'):   1.424 /   1.496 = 0.952
('elim-sparse', 10000, 10000, 0.0010, 'pluq'):   2.444 /   6.480 = 0.377
                          ('mul', 10000):   2.896 /   2.940 = 0.985
                          ('mul', 16384):  12.313 /  12.177 = 1.011
                          ('mul', 20000):  22.793 /  20.441 = 1.115
                          ('mul', 32000):  81.589 /  81.865 = 0.997

Better Cross-Platform Support

by Martin Albrecht (with help by David Kirkby)

dropped the check for the number of CPUs from the configure script which was unused and didn't work across platforms
implemented optional tuning code to calculate the L1 and L2 cache size (not enabled by default yet) to replace the code in the configure script which does not work well on non-x86 platforms.

Other Changes

by Martin Albrecht

slightly faster mzd_transpose() Wiki macro error: Changeset 140a8626e2ca not found.
fixed a SEGFAULT in mzd_row_add_offset() Wiki macro error: Changeset 90b24eba9644 not found.

Platforms

make check passes on the following platforms

x86_64 Linux (Core2Duo, sage.math);
x86 OSX (Core2Duo, bsd);
ia64 Linux (iras);
x86 Linux (VirtualBox);
UltraSPARC T2 Solaris using GCC (t2.math.washington.edu)
x86 Cygwin (winxp)

The code also builds in Visual Studio under Windows.