I finally got up off the couch and finished testing the my rational number; and I decided to put that along with the unbounded integer and big decimal code into a single library. If you’re interested, see https://www.cstdbill.com/bignum/bignum.html.
I’ve used Python’s fractions.limit_denominator at https://github.com/python/cpython/blob/ace4d7ff9a247cbe7350719b996a1c7d88a57813/Lib/fractions.py#L357 to approximate the value to reduce the sizes of the numerator and denominator. In my case, work with Jaccard similarity scores with value a/b where 0 <= a <= b <= 32768. By converting the double to its nearest fraction in that range I could replace some division calculations with faster int16 multiplications.
As a Python developer, I'm used to its decimal module, based on the General Decimal Arithmetic specification and IEEE 854. I think the docs could use a bit more about why bigdec focuses on the needs of SQL over other decimal uses.
billseymoursays
Andrew, thanks for the comment.
I couldn’t find “to_integer_ratio” in the first link. In any event, I think that the good old continued fractions algorithm is the right one: it converges fairly rapidly and the answer you get is guaranteed to be in lowest terms. The question I have is whether I can rip out the “no longer converging” test.
As for why the bigint and bigdec types are intended to be C++ equivalents of SQL types, that’s the use I had for them. I don’t claim that they’re particularly useful for serious numerical work. Indeed, folks who do numerics probably already have better types to use.
Andrew Dalkesays
The first links to code in the function float_as_integer_ratio_impl(), which is the C implementation of the Python method as_integer_ratio() for its float type, which is a wrapper around C’s double. The “float_part = frexp(self_double, &exponent);” I linked to is the start of the algorithm proper. I’ve enough experience with the Python implementation that I overlook the complexity of mapping from Python method to C implementation.
I did a comparison using random numbers and found bignum::rational(0.0869406496067503) is a value which does not converge, and 1.1100695288645402e-29 gives a “Can’t convert NaNs or infinities to bigint” because the val0 = 1.0 / (val0 – static_cast(int0)); step results in an inf.
I also found bignum::rational(5e-324) (which is nextafter(0, 1)) takes 120 milliseconds while the Python version takes 4 microseconds.
While Knuth, 4.5.3 nearly always finds a ratio with a lowest term at or better than the Python algorithm, there are a couple of cases I found where it is slightly worse, like 0.9999999999999999 = nextafter(1, 0) which bignum::rational expresses as 9007199254740992/9007199254740993 while Python finds 9007199254740991/9007199254740992 which is 1 less in both numerator and denominator. For what it’s worth, the smallest equivalent rational I found is 6004799503160661/6004799503160662.
In my limited experience in this topic, I learned it’s tricky handle all the numerical corners. I urge you to look towards an existing library if all you need is SQL interoperability.
Andrew Dalke says
I saw the note about converting floating point values to rational. I am not a numerics person, but I am primarily a Python developer, and can point to its double to_integer_ratio at https://github.com/python/cpython/blob/ace4d7ff9a247cbe7350719b996a1c7d88a57813/Objects/floatobject.c#L1591 for a well-tested implementation you might be able to compare against.
I’ve used Python’s fractions.limit_denominator at https://github.com/python/cpython/blob/ace4d7ff9a247cbe7350719b996a1c7d88a57813/Lib/fractions.py#L357 to approximate the value to reduce the sizes of the numerator and denominator. In my case, work with Jaccard similarity scores with value a/b where 0 <= a <= b <= 32768. By converting the double to its nearest fraction in that range I could replace some division calculations with faster int16 multiplications.
As a Python developer, I'm used to its decimal module, based on the General Decimal Arithmetic specification and IEEE 854. I think the docs could use a bit more about why bigdec focuses on the needs of SQL over other decimal uses.
billseymour says
Andrew, thanks for the comment.
I couldn’t find “to_integer_ratio” in the first link. In any event, I think that the good old continued fractions algorithm is the right one: it converges fairly rapidly and the answer you get is guaranteed to be in lowest terms. The question I have is whether I can rip out the “no longer converging” test.
As for why the bigint and bigdec types are intended to be C++ equivalents of SQL types, that’s the use I had for them. I don’t claim that they’re particularly useful for serious numerical work. Indeed, folks who do numerics probably already have better types to use.
Andrew Dalke says
The first links to code in the function float_as_integer_ratio_impl(), which is the C implementation of the Python method as_integer_ratio() for its float type, which is a wrapper around C’s double. The “float_part = frexp(self_double, &exponent);” I linked to is the start of the algorithm proper. I’ve enough experience with the Python implementation that I overlook the complexity of mapping from Python method to C implementation.
I did a comparison using random numbers and found bignum::rational(0.0869406496067503) is a value which does not converge, and 1.1100695288645402e-29 gives a “Can’t convert NaNs or infinities to bigint” because the val0 = 1.0 / (val0 – static_cast(int0)); step results in an inf.
I also found bignum::rational(5e-324) (which is nextafter(0, 1)) takes 120 milliseconds while the Python version takes 4 microseconds.
While Knuth, 4.5.3 nearly always finds a ratio with a lowest term at or better than the Python algorithm, there are a couple of cases I found where it is slightly worse, like 0.9999999999999999 = nextafter(1, 0) which bignum::rational expresses as 9007199254740992/9007199254740993 while Python finds 9007199254740991/9007199254740992 which is 1 less in both numerator and denominator. For what it’s worth, the smallest equivalent rational I found is 6004799503160661/6004799503160662.
In my limited experience in this topic, I learned it’s tricky handle all the numerical corners. I urge you to look towards an existing library if all you need is SQL interoperability.