What is a number
Overview of the problems of working out just what a number is.
One of the major divides in PHP is the concept of variable types. PHP is loosly typed, so it does not complain if you provide a string of characters as an answer to a simple numeric calculation. It will magically assume that this is a number and do the necessary conversion. If PHP was tightly typed, then there would be an error that the data is of the wrong type and one would have to decided just how to convert the string of sharacters into a variable of the type - number.
One of the problems that arises is that the magic conversion my actually give a wrong answer and it is how these default conversions are handled that is now creating yet another layer of complexity in PHP. Currrently there are a couple of mathematical extensions used with PHP, BCMath and GMP, on top of the basic built in maths functions. Just like handling multibyte character strings, there is no consistent base to build on and libraries may be using one or other option internally leading to even more confusion.
My own interest in this area comes not from the maths extension, but rather from the database drivers. Since on the whole the material I am working with is always stored in a database, and specifically in my case a Firebird database, then the conversion of numbers has to be consistent with that base. While there is still some work to be done, the current Firebird versions are basically consistent with the SQL standard and while it would be nice if that was followed by all databases, parity is something that is still lacking, so defining database schemas that produce consistent maths across all databases is still something of a lottery.
Turning the problem around a little, a recent discussion on how GMP should be modified to handle Floating Point Support flagged up something which I tend to take forgranted. The calculation "10 / 3" is always an interesting debating point, and there are actually a number of answers. The correct answer is "3 remainder 1" and this is the only accurate result. As soon as one tries to combine the remainder with the divisor then one end's up with a fractional result that is only as accurate as the mechanism used. There is a tendency to use floating point as a means of storing the result, but this may not be the most relaible path since rounding errors are not so easily controlled. It may actually be safer to retain the remainder as an integer when handling currency conversions for example. Alternatively defining a fixed fractional accuracy makes a lot of sense in these specific cases. For this reason, the SQL standard provides the NUMERIC field which retains the accuracy of integer bases, by simply moving the decimal point (or comma) to a different point in the display conversion.
Returning to the "10 / 3" example. In the Firebird SQL maths handling, this always produces simply "3" as the numbers are always assumed to be INTEGER, and so produces an integer result. A simply addition to this provides a consistent conversion, without any particular complication. "10/3.00" provides a result of 3.33 as a string which can then be simply saved, or stored as a FLOAT or NUMERIC field depending on the application.
Every thing fairly simple so far, and all easy to define well. Where the problems start is with the basic definition of an 'integer'. The very first processors were '8 bit' numbers, 0 to 255, so working with larger values always required multiple elements. My first IBM machine was a nice 16 bit AT but at that time we were already playing with 32bit, much better than the Intel processors and I have always wondered where we would be today if a decent processor had been used in the original IBM machines but that is a different discusion. The 32bit standard has been around for some time and while many smartphone and tablet devices are still 16bit devices, the use of 32 bit integers is generally assumed. The switch to 64bit integers tends to be a little OS orientated, with many windows installations still using a 32bit base, while Linux based desktop code has tended to be 64 bit for some time. Why this becomes important is because until recently, windows installs of PHP have tended to be 32bit based, while linux ones are 64bit. The result is that some cross platform processes fall over when the linux server values exceed the 32bit limit. It's this area which prompted my original move to only use Linux for the PHP servers, but Firebird happily handles 'BIGINT' elements cross platform, so using 64bit numbers in the persistant data has not been a problem for some time. However one does have to take care of how the PHP installation converts these numbers. It tends to use types other than integer when automatically loading them, some one has to take care of this manually if a clean integer numeric needs to be maintained.
The main area where the handling of 64bit integers becomes important is actually TIMESTAMP, but even this is due to a poor choice of base historically. The simple adoption of 'second' as the base for a unix timestamp has already caused problems with the overflow of the 32bit limit and so is generally defaulted to a 64bit number these days. Firebird uses a tidier base of days. where the fractional part defines the 'time' of day. Returning to numbers, the days are a simple 32bit integer which can be mapped simply to a simple DATE field, while the fractional element provides TIME with a 32bit accuracy. Little things like leap seconds can be handled a lot easier than using the seconds base, and when handling the diary jobs it provides a much better base. Only timezone information becomes a problem here, but again simplyifiying the problem to only store UTC based time information creates a clean numeric soultion. In much of the complicated areas, it's not the raw numbers that are the problem, but rather how they are displayed. The raw time only needs to be displayed with a timezone offset at the client location and this has no bearing on the numeric value underlying it.
Similarly displaying any numeric value has no part in the calculations carried out between them, so the decoration of a string element is again simply a display function like formating the display of a date. The reverse process is the part which needs a little more care, much like the original discussion of "3 remainder 1". The client side settings dictate what should be concidered as a number, and also a fractional indicator. That a string can be considered as either an integer or as a fractional number is a given, but it may be that either defining element may be bigger than the current integer size, so needs an alternate way of handling the resulting number. While it would be nice if this simply revolved around a simple 64bit integer, much of the hardware we are currently using may be restricted to 32bits. However is there a logical reason to be running PHP on anything other than 64bit server hardware? It should only be client software that needs to worry about integer sizes on mobile devices?
