GMP Development Projects

An up-to-date version of this file is available at http://www.swox.com/gmp/projects.html.

This file lists projects suitable for volunteers. Please see the file tasks, or http://www.swox.com/gmp/tasks.html for smaller tasks.

If you want to work on any of the projects below, please let tege@swox.com know. If you want to help with a project that already somebody else is working on, please talk to that person too, tege@swox.com can put you in touch. (There are no email addresses of volunteers below, due to spamming problems.)

C++ wrapper
A C++ wrapper for GMP is highly desirable. Several people have started writing one, but nobody has so far finished it. For a wrapper to be useful, one needs to pay attention to efficiency.
1. Write arithmetic functions for direct applications of most elementary C++ types. This might be necessary to avoid ambiguities, but it is also desirable from an efficiency standpoint.
2. Avoid running constructors/destructors when not necessary. For example, implement a += b by directly applying mpz_add.
Faster multiplication
The current multiplication code is 3-way Toom-Cook based and runs at O(n^1.465) when the operands are both around n digits large. Several new developments are desirable:
1. Handle multiplication of operands with different digit count better than today. We now split the operands in a very inefficient way, see mpn/generic/mul.c.
2. Consider N-way Toom-Cook. See Knuth's Seminumerical Algorithms for details on the method. Code implementing it exists. This is asyptotically inferior to FFTs, but is finer grained since it can split into any number of pieces, not just powers of 2 (or of 2 and 3 or whatever).
3. Implement some variant of FFT-based multiplication. I believe the most promising variant is the original Schönhage-Strassen mod 2^n+1 algorithm. Paul Zimmermann and Robert Harley are working on this, Paul Zimmerman has working code.
4. It's possible CPU dependent effects like cache locality will have a greater impact on speed than algorithmic improvements.
Division and modular multiplication
GMP's current division routines are O(n^2), or to be accurate, O(n(m-n)) where n is the digit count of the divisor, and m is the digit count of the dividend. Modular reduction code rely on division, and is thus also slow.
Desirable new developments:
1. Implement O(n^1.58) division. There is an algorithm published recently that accomplish this.
2. Implement division based on first computing an approximation to the reciprocal of the divisor, using Newton's algorithm, and then multiplying the dividend by the approximate reciprocal.
3. Implement modular multiplication based on Peter Montgomery's REDC algorithm. This algorithm is O(n^2) and should therefore be used only for operands under a certain limit. Operands greater than that limit need to reply on computing the divisor reciprocal, see above.
4. Make mpz_powm, mpz_powm_ui, etc, use modular multiplication for the operand ranges where that helps. Actually, these functions should probably be implemented at the mpn level as part of this development.
5. Tune the aging plain division functions. (Mostly done by Torbjörn.) Perhaps write new functions with better interfaces, (mpn_tdiv_q, mpn_tdiv_r, mpn_tdiv_qr) replacing mpn_divmod and (the slow!) mpn_divrem.
Assembly routines
Write new and tune existing assembly routines. The functions in mpn/tests are useful for testing and timing the routines you write. See the README file in that directory for instructions on how to use the various test files.
Please make sure your new routines are fast for these three situations:
1. Operands that fit into the cache.
2. Small operands of less than, say, 10 limbs.
3. Huge operands that does not fit into the cache.
The most important routines are mpn_addmul_1, mpn_mul_basecase and mpn_sqr_basecase. The latter two don't exist for all machines, while mpn_addmul_1 exists for almost all machines.
Standard techniques for these routines are unrolling, software pipelining, and specialization for common operand values. For machines with poor integer multiplication, it is often possible to improve the performance using floating-point operations, or SIMD operations such as MMX or Sun's VIS.
Using floating-point operations is interesting but somewhat tricky. Since IEEE double has 53 bit of mantissa, one has to split the operands in small prices, so that no result is greater than 2^53. For 32-bit computers, splitting one operand in 16-bit pieces works. For 64-bit machines, one operand can be split in 21-bit pieces and the other in 32-bit pieces. (A 64-bit operand can be split in just three 21-bit pieces if one allows the split operands to be negative!)
Configuration
We want to recognize the processor very accurately, more accurately than other GNU packages. config.guess does not currently make the distinctions we would like it to do and a --target often needs to be set explicitly.
For example, "sparc" is not very useful as a machine architecture denotation. We want to distinguish old 32-bit SPARC without multiply support from newer 32-bit SPARC with such support. We want to recognize a SuperSPARC, since its implementation of the UDIV instruction is not complete, and will trap to the OS kernel for certain operands. And we want to recognize 64-bit capable SPARC processors as such. While the assembly routines can use 64-bit operations on all 64-bit SPARC processors, one can not use 64-bit limbs under all operating system. E.g., Solaris 2.5 and 2.6 doesn't preserve the upper 32 bits of most processor registers. For SPARC we therefore sometimes need to choose GMP configuration depending both on processor and operating system.
Other things autoconf should care about:
1. Find out whether there's an alloca available and how to use it. AC_FUNC_ALLOCA has various system dependencies covered, but we don't want its alloca.c replacement. (One thing current cpp tests don't cover: HPUX 10 C compiler supports alloca, but cannot find any symbol to test in order to know if we're on HPUX 10. Damn.)
2. Identify Mips processor under Irix: `hinv -c processor'
3. Identify Alpha processor under OSF: "/usr/sbin/sizer -c". Unfortunately, sizer is not available before some revision of Dec Unix 4.0, and it also returns some rather cryptic names for processors. Perhaps the implver and amask assembly instructions are better, but that doesn't differentiate between ev5 and ev56.
4. Get lots of information about a Solaris system: prtconf -vp
5. For specific target machines, figure out assembly language variant (like for x86 shldl).
6. For some target machines and some compilers, specific options are needed (sparcv8/gcc needs -mv8, sparcv8/cc needs -cg92, Irix64/cc needs -64, Irix32/cc might need -n32, etc). Some are set already, add more, see configure.in.
7. Options to be passed to the assembler (via the compiler, using whatever syntax the compiler uses for passing options to the assembler).
8. On Solaris 7, check if gcc supports native v9 64-bit arithmetic. If not compile using "cc -fast -xarch=v9". (Problem: -fast requires that we link with -fast too, which might not be very good. Pass "-xO4 -xtarget=native" instead?)
In general, getting the exact right configuration, passing the exact right options to the compiler, etc, might mean that the GMP performance more than doubles.
When testing, make sure to test at least the following for all out target machines: (1) Both gcc and cc (and c89). (2) Both 32-bit mode and 64-bit mode (such as -n32 vs -64 under Irix). (3) Both the system `make' and GNU `make'. (4) With and without GNU binutils.
We'd ultimately like to build multiple variants of the library under certain systems. An example is -n32, -o32, and -64 on Irix.
Improve config.guess. It should say mips2, mips3, and mips4. It should say supersparc, microsparc, ultrasparc1, ultrasparc2. Etc. Likewise for HPPA. (Alpha and x86 work already. Ensure config.sub accepts the guesses.)
Faster extended GCD
We currently have extended GCD based on Lehmer's method. But the binary method can quite easily be improved for bignums in a similar way Lehmer improved Euclid's algorithm. The result should beat Lehmer's method by about a factor of 2. Furthermore, the multiprecision step of Lehmer's method and the binary method will be identical, so the current code can mostly be reused. It should be possible to share code between GCD and GCDEXT, and probably with Jacobi symbols too.
Math functions for the mpf layer
Implement the functions of math.h for the GMP mpf layer! Check the paper "Pi and the AGM" for ideas how to do this. These functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2, cos, cosh, exp, log, log10, pow, sin, sinh, tan, tanh.
Faster sqrt
The current code for computing square roots use a Newton iteration that rely on division. It is possible to avoid using division by computing 1/sqrt(A) using this formula:
```
                                    2
                   x   = x  (3 - A x )/2.
                    i+1   i         i  
```
The final result is then computed without division using,
```
                     sqrt(A) = A x .
                                  n  
```
The resulting code should be substantially faster than the current code.
Nth root
Implement, at the mpn level, a routine for computing the nth root of a number. The routine should use Newton's method, preferably without using division.
If the routine becomes fast enough, perhaps square roots could be computed using this function.
Better random number generators
Write random number routines that accept bit counts instead of limb counts. A state should be passed to each random number generator; no global state should be used, since that would be non-reentrant.
Partially done by Linus Nordberg. Ask for the current code.