This file lists itemized GMP development tasks. Not all the tasks listed here are suitable for volunteers, but many of them are. Please see the file projects, or http://www.swox.com/gmp/projects.html for more sizeable projects.
_mpz_realloc
with a small (1 limb) size.
mpz_XXX(a,a,a)
.
mp_exp_t
, the precision of
__mp_bases[base].chars_per_bit_exactly
is insufficient and
mpf_get_str
aborts. Detect and compensate.
mpz_get_si
to work properly for MIPS N32 ABI (and other
machines that use long long
for storing limbs.)
mpz_*_ui
division routines. Perhaps make them return the
real remainder instead? Changes return type to signed long int
.
mpf_eq
is not always correct, when one operand is
1000000000... and the other operand is 0111111111..., i.e., extremely
close. There is a special case in mpf_sub
for this
situation; put similar code in mpf_eq
.
mpn_divrem
cannot throw away the quotient when XSIZE !=
0
. Document or fix.
__builtin_constant_p
is unavailable? Same problem with MacOS
X.
dump_abort
in
mp?/tests/*.c.
mpz_get_si
returns 0x80000000 for -0x100000000.
count_leading_zeros
if
the bit is set. This is an optimization on all machines, and significant
on machines with slow count_leading_zeros
.
count_trailing_zeros
is used
on more or less uniformly distributed numbers. For some CPUs
count_trailing_zeros
is slow and it's probably worth
handling the frequently occurring 0 to 2 trailing zeros cases specially.
udiv_qrnnd
for inverting limbs to
instead use mpn_invert_limb
.
umul_ppmm
to use floating-point for generating the
most significant limb (if BITS_PER_MP_LIMB
<= 52 bits).
(Peter Montgomery has some ideas on this subject.)
umul_ppmm
code in longlong.h: Add partial
products with fewer operations.
mpn_get_str
and mpn_set_str
running in
the sub O(n^2) range, using some divide-and-conquer approach, preferably
without using division.
mpn_get_str
to mpf/get_str. (Talk to Torbjörn about this.)
mpz_size
,
mpz_set_ui
, mpz_set_q
, mpz_clear
,
mpz_init
, mpz_get_ui
, mpz_scan0
,
mpz_scan1
, mpz_getlimbn
,
mpz_init_set_ui
, mpz_perfect_square_p
,
mpz_popcount
, mpf_size
,
mpf_get_prec
, mpf_set_prec_raw
,
mpf_set_ui
, mpf_init
, mpf_init2
,
mpf_clear
, mpf_set_si
.
mpn_addmul_1
, mpn_submul_1
, and
mpn_mul_1
for the 21264. On 21264, they should run at 4, 3,
and 3 cycles/limb respectively, if the code is unrolled properly. (Ask
Torbjörn for his xm.s and xam.s skeleton files.)
mpn_addmul_1
, mpn_submul_1
, and
mpn_mul_1
for the 21164. This should use both integer
multiplies and floating-point multiplies. For the floating-point
operations, the single-limb multiplier should be split into three 21-bit
chunks.
mpn_add_n
and
mpn_sub_n
. The current sparc64 code uses MOVcc
instructions, which take about 6 cycles on UltraSPARC. The correct
approach is probably to use conditional branching. That should lead to
loops that run at 4 cycles/limb. (Torbjörn has code that just needs to be
finished.)
mpn_addmul_1
,
mpn_submul_1
, and mpn_mul_1
. Should use
floating-point operations, and split the invariant single-limb multiplier
into 21-bit chunks. Should give about 18 cycles/limb, but the pipeline
will become very deep. (Torbjörn has C code that is useful as a starting
point.)
mpn_lshift
and mpn_rshift
.
Should give 2 cycles/limb. (Torbjörn has code that just needs to be
finished.)
mpn_addmul_1
, mpn_submul_1
, and
mpn_mul_1
. The current development code runs at 11
cycles/limb, which is already very good. But it should be possible to
saturate the cache, which will happen at 7.5 cycles/limb.
umul_ppmm
. Important in particular for
mpn_sqr_basecase
.
mpn_mul_basecase
and mpn_sqr_basecase
for important machines.
mpn_lshift
/mpn_rshift
.
Will bring time from 1.75 to 1.25 cycles/limb.
mpn_lshift
for shifts by 1. (See Pentium code.)
count_leading_zeros
.
udiv_qrnnd
. (Ask Torbjörn for the file
test-udiv-preinv.c as a starting point.)
mpn_add_n
and mpn_sub_n
.
It should just require 3 cycles/limb, but the current code propagates
carry poorly. The trick is to add carry-in later than we do now,
decreasing the number of operations used to generate carry-out from 4 to
to 3.
mpn_lshift
.
The pipeline is now extremely deep, perhaps unnecessarily deep. Also, r5
is unused. (Ask Torbjörn for a copy of the current code.)
mpn_rshift
based on new mpn_lshift
.
mpn_add_n
and mpn_sub_n
. Should
run at just 3.25 cycles/limb. (Ask for xxx-add_n.s as a starting point.)
mpn_mul_basecase
and
mpn_sqr_basecase
. This should use a "vertical multiplication
method", to avoid carry propagation. splitting one of the operands in
11-bit chunks.
mpn_mul_basecase
and
mpn_sqr_basecase
. Same comment applies to this as to the
same functions for Fujitsu VPP.
count_leading_zeros
for 64-bit machines:
if ((x >> W_TYPE_SIZE-W_TYPE_SIZE/2) == 0) { x <<= W_TYPE_SIZE/2; cnt += W_TYPE_SIZE/2} if ((x >> W_TYPE_SIZE-W_TYPE_SIZE/4) == 0) { x <<= W_TYPE_SIZE/4; cnt += W_TYPE_SIZE/4} ...
mpz_get_nth_ui
. Return the nth word (not necessarily the nth limb).
mpz_crr
(Chinese Remainder Reconstruction).
mpq_set_f
for assignment from mpf_t
(cf. mpq_set_d
).
mpz_init
(and mpq_init
) do lazy
allocation. Set ALLOC(var)
to 0, and have
mpz_realloc
special-handle that case. Update functions that
rely on a single limb (like mpz_set_ui
,
mpz_[tfc]div_r_ui
, and others).
mpf_out_raw
and mpf_inp_raw
. Make sure
format is portable between 32-bit and 64-bit machines, and between
little-endian and big-endian machines.
gmp_errno
.
gmp_fprintf
, gmp_sprintf
, and
gmp_snprintf
.
mpq
input and output functions.
mpz_jacobi
to a full range of inputs.
Kevin Ryde is working on this.
mpz_div
and mpz_divmod
use rounding
analogous to mpz_mod
. Document, and list as an
incompatibility.
malloc
separately for each TMP_ALLOC
block, so a redzoning
malloc debugger could be used during development.
unsigned long int
is used for
bit counts/ranges, and that mp_size_t
is used for limb counts.
mpz_inp_str
(etc) doesn't say when it stops reading digits.
Please send comments about this page to
tege@swox.com. Copyright (C) 1999, 2000 Torbjörn Granlund. |