# ISReal: A C++ Library for Floating-point Computation with Controlled Error

## Introduction

ISReal, a C++ library, for floating-point computation with controlled error. Given a floating-point expression f and an expected upper bound of error e, ISReal automatically and dynamically adjusts precisions of all intermediate computations to produce the result y such that |y-f|<e. ISReal could potentially be combined with existing approaches for floating-point programs analysis to provide more informed result [1,2].

ISReal relies on GNU MP for big integer computation. ISReal supports the following operations with degree and radian modes for (inverse) trigonometric functions:

• Special constants: e and pi;
• Elementary operations: +, -, *, and /;
• Trigonometry: sin, cos, tan, cot, sec and csc;
• Inverse trigonometry: arcsin, arccos, arctan and arccot;
• Logarithm and exponent: ln, log and ^;
• Hyperbolic functions: sinh and cosh;
• Rounding: floor and ceil;
• Factorial: !.

## Example

Let us consider the expression f1$f_1$ from [4]
f1=333.75b6+a2(11a2b2b6121b42)+5.5b8+a2b$f_1 = 333.75b^6 + a^2 (11a^2 b^2- b^6 -121b^4 -2) +5.5b^8+\frac{a}{2b}$
where a = 77617.01, b=33096.0.

Its actual value: -0.82739605994682136814116509579816....

In ISReal form, it is:
333.75*33096.0^6+77617.0^2*(11*77617.0^2*33096.0^2-33096.0^6-121*33096.0^4-2) +5.5*33096.0^8+77617.0(2*33096.0)

• The results of our tool ISReal under three precisions 8, 16 and 20 were:

• 8 Precision: 0.82739606$-0.82739606$
• 16 Precision: 0.8273960599468214$-0.8273960599468214$
• 20 Precision: 0.82739605994682136814$-0.82739605994682136814$
• On an IBM S/370 computer, the results obtained by Rump 4] using Fortran program were:

• single: 1.172603...$1.172603...$
• double: 1.1726039400531...$1.1726039400531...$
• extended: 1.172603940053178...$1.172603940053178...$
• On a Pentium4-based workstation, using GCC and the Linux system were [5]:

• single: 2.0317×1029$2.0317\times10^{29}$;
• double: 5.960604×1020$5.960604\times10^{20}$;
• double-extended: 9.38724×10323$- 9.38724\times10^{- 323}$.
• The results in the Forte Developer 6 Update 2 Fortran 95 compiler (IEEE 754) from Sun Microsystems Inc. were [6]:

• single: 6.338253×1029$- 6.338253\times10^{29}$;
• double: 1.1805916207174113×1021$- 1.1805916207174113\times10^{21}$;
• quadruple: 1.1726039400531786318588349045201838$1.1726039400531786318588349045201838$.
• The results on Windows 8.1 with 8GB RAM and AMD64 CPU A10-6700 3.70GHz (Win8 for short) using Microsoft VS2012 were (using all modes: precise, precise, fast, long double is mapped to double in Microsoft VS 2012):

• single: 9.8750123322992547×1029$-9.8750123322992547\times10^{29}$;
• double: 1.1805916207174113×1021$-1.1805916207174113\times10^{21}$.
• The results on Ubuntu 16.10 and Intel CORE i7 (Ubuntu for short) using GCC 6.2.0 were:

• single: 9.8750123322992547249669537792×1029$-9.8750123322992547249669537792\times10^{29}$;
• double: 1.180591620717411303424×1021$-1.180591620717411303424\times10^{21}$;
• long double: 5.764607523034234891875×1017$5.764607523034234891875\times 10^{17}$.
• Maple 15 under three significant digits: 8, 16 and 20, the results were:

• 8 significant digits: 7×1029$7\times10^{29}$;
• 16 significant digits: 1×1021$1\times10^{21}$;
• 20 significant digits: 9.9999999999999998827×1016$-9.9999999999999998827\times10^{16}$.
• The results using Matlab R2012a were:

• Directly computing: 1.1806×1021$-1.1806\times10^{21}$;
• 8 significant digits(Math Toolbox): 1.1805916×1021$-1.1805916\times10^{21}$;
• 16 significant digits(Math Toolbox): 1.180591620717411×1021$-1.180591620717411\times10^{21}$;
• 20 significant digits(Math Toolbox): 1.1805916207174113034×1021$-1.1805916207174113034\times10^{21}$.

All the following tools have been tested on Windows XP/7/8. To build source code, we recommand to use Microsoft Visual Studio 2010 or later versions.

## Usage of the library

To use our library real.dll, you have to link gmp.dll and real.dll, and include the head file real.h. real.h defines a type real and contains the the following 12 functions for computing, and a function for output. Here, the angles involved in calcs are in radians, and that of calcds are in degrees.

1. The following 4 functions first calculate the values of the expression in str, and then assigns the results to r.
1.1 real &calc(real &r, expr string4) &str); // correct to 10 decimal places
1.2 real &calc(real &r, expr string &str, int n); // correct to n decimal places
1.3 real &calcd(real &r, expr string &str); // correct to 10 decimal places
1.4 real &calcd(real &r, expr string &str, int n); // correct to n decimal places
2. The following 4 functions calculate the values of the expression in r, and the results are still in r.
2.1 real &calc(real &r); // correct to 10 decimal places
2.2 real &calc(real &r, int n); // correct to n decimal places
2.3 real &calcd(real &r); // correct to 10 decimal places
2.4 real &calcd(real &r, int n); // correct to n decimal places
3. The following 4 functions calculate the values of the expression stored in the variable pointed to by str first, and then assigns the results
to r.
3.1 real &calc(real &r, char str); // correct to 10 decimal places
3.2 real &calc(real &r, char
str, int n); // correct to n decimal places
3.3 real &calcd(real &r, char str); // correct to 10 decimal places
3.4 real &calcd(real &r, char
str, int n); // correct to n decimal places

The ISReal GUI-based calculator demonstrates how to use real.dll.

## Experiments

We conduct an experimental study to check accuracy and scalability using following expressions:

• f1=333.75b6+a2(11a2b2b6121b42)+5.5b8+a2b$f_1=333.75b^6+a^2(11a^2b^2-b^6-121b^4-2) +5.5b^8+\frac{a}{2b}$
• f2=((130.333333333333333235)+(130.333333333333333759)×0.008)×1020$f_2=((\frac{1}{3}-0.333333333333333235)+(\frac{1}{3}-0.333333333333333759)\times0.008 )\times10^{20}$
• f3=sin(2100)$f_3={\sf sin}(2^{100})$
• f4=eπ163262537412640768744$f_4=e^{\pi\sqrt{163}}-262537412640768744$
• f5=2065e65ln(20)$f_5 =20^{65} -e^{65 {\sf ln}(20)}$
• f6=1340.7(tan(3×5.210.5)$f_6 =13^{40.7}({\sf tan}(3\times 5.2^{10.5})$-tan(5.210.5)tan(π3${\sf tan}(5.2^{10.5}){\sf tan}(\frac{\pi}{3}$-5.210.5)tan(π3$5.2^{10.5}){\sf tan}(\frac{\pi}{3}$+5.210.5))$5.2^{10.5}))$
• f7=15.165e65×ln(15.1)+20.260.2e60.2×ln(20.2)$f_7 = 15.1^{65} - e^{65 \times {\sf ln}(15.1)} + 20.2^{60.2} -e^{60.2\times{\sf ln}(20.2)}$
• f8=(arctan(10.55.4)arcsin(10.55.41+(10.55.4)2))×1053$f_8 = ({\sf arctan}(10.5^{5.4}) - {\sf arcsin}(\frac{10.5^{5.4}}{ \sqrt{1+ (10.5^{5.4})^2}}) )\times 10^{53}$
• f9=(cot(y)+cot(123.567)sin(y+123.567)sin(y)×sin(123.567))×1050$f_9 = ({\sf cot}(y)+{\sf cot}(123.567)- \frac{ {\sf sin}(y+123.567)}{{\sf sin}(y)\times {\sf sin}(123.567)})\times {10^{50}}$ * f10=(1+sin2(7.85395)1sin2(7.85395)1cos2(7.85395+568π))×10.940.9$f_{10} = (1+\frac{{\sf sin}^2(7.85395)}{1-{\sf sin}^2(7.85395)} - \frac{1}{{\sf cos}^2(7.85395+568\pi)})\times 10.9^{40.9}$
• f11=(1+cos2(6.28318)1cos2(6.28318)1sin2(6.28318+1200π))×11.9450.2$f_{11} = (1+\frac{{\sf cos}^2(6.28318)}{1-{\sf cos}^2(6.28318)} - \frac{1}{{\sf sin}^2(6.28318+1200\pi)})\times 11.94^{50.2}$
• f12=sin(9.42477792)cos(9.42477792)sin(9.4247779)1+cos(9.4247779)$f_{12} = \frac{{\sf sin}(\frac{9.4247779}{2}) }{{\sf cos}(\frac{9.4247779}{2}) } - \frac{{\sf sin}(9.4247779)}{1+{\sf cos}(9.4247779)}$
• f13=(sin(c)×sin(d)tan(c)sin(c+d)+sin(dc)2)×20.136.54$f_{13} = ({\sf sin}(c)\times \frac{{\sf sin}(d)}{{\sf tan}(c)} -\frac{{\sf sin}(c+d)+{\sf sin}(d-c)}{2})\times 20.1^{36.54}$
• f14=tan(x1)tan(x2$f_{14} = {\sf tan}(x_1) {\sf tan}(x_2$+9876π)cos(x1)cos(x2)$9876\pi) {\sf cos}(x_1){\sf cos}(x_2)$+cos(x1+x2+98π)cos(x1x2)2$\frac{{\sf cos}(x_1+x_2+98\pi)-{\sf cos}(x_1-x_2)}{2}$
• f15=tan(387674.1042493)2×tan(387674.10424932)1tan2(387674.10424932)$f_{15} = {\sf tan}(387674.1042493) - \frac{2 \times{\sf {tan}(\frac{387674.1042493}{2})}} {1- {\sf {tan}^2(\frac{387674.1042493}{2})}}$

f1, f3$f_1,~f_3$ and f4$f_4$ are token from [3,4], while others are manually created whose actual values are determined by axioms. The experiments were conducted on Win8 for ISReal. C++ programs using standard library (i.e., math.h) were also implemented to compute these expressions both on Win8 using VS2012 and Ubuntu using GCC 6.2.0 with all available precisions. C++ programs using MPFR [3] with precision 10000 bits and round to nearest mode on Ubuntu. Matlab R2012a and Maple 15 were tested with 16 significant bits. All the tested programs are available in Download Section.

The results are shown in the above Table. We only present 10 significant digits of the results for C++ programs when they are sufficient to reveal inaccuracy. To analyze scalability of ISReal, we consider all the expressions f1,...,f15$f_1,...,f_{15}$ parametrised by the precision n, where n ranges between 0 and 500 with step of 10. The results are shown in the following Figure.

## Reference

1. Shizhong Zhao. 2016. A reliable computing algorithm and its software ISReal for arithmetic expressions (in Chinese). SCIENTIA SINICA Informationis 46, 6 (2016), 698–713.
2. Shizhong Zhao and Fu Song. 2017. A reliable computing algorithm and its software ISReal for arithmetic expressions. Technical Report. East China Normal University, ShanghaiTech University.
3. Paul Zimmermann. 2010. Reliable computing with GNU MPFR. In International Congress on Mathematical Software. Springer, 42–
4. Siegfried M. Rump. 1988. Reliability in Computing: The Role of Interval Methods in Scientific Computing. Boston: Academic Press, Chapter Algorithms for verified inclusion: theory and practice, 109–126.
5. Jean-Michel Muller, Nicolas Brisebarre, Florent de Dinechin, Claude-Pierre Jeannerod, Vincent Lefèvre, Guillaume Melquiond, Nathalie Revol, Damien Stehlé, and Serge Torres. 2010. Handbook of Floating-Point Arithmetic. Birkhäuser.
6. Eugene Loh and G. William Walster. 2002. Rump’s Example Revisited. Reliable Computing 8, 3 (2002), 245–248.