Use of simple number theory in code.

In a personal project recently, I came to a problem that ended up being solved via number theory. I’m a little rusty, but this isn’t actually the first time I’ve used number theory to solve a coding problem. I thought it ended up interesting, so I decided to write a short post about it.

The issue is that I wanted a way to represent dates as numbers. The typical unix timestamp was quite a bit more specific than I wanted for my purposes, and I wanted to avoid pitfalls associated with DST and timezones, so struct tm was going to be a bit of a pain to manage. I wanted to handle dates. Only this and nothing more.

Furthermore, I didn’t want to handle something like “Number of days since 1900” or something like that, because handling leap years and different lengths of months is a bit of a pain.

So, this is the process that went through my mind.

The most intuitive way to represent a single date as a number is with multiples of 10. For example, April 13, 2017 would be 20170413. All dates for my use case would be after the year 2000, so I could just chop off those first two digits, making it 170413.

But that’s decimal, and computers think in binary, so I shouldn’t bother setting it up like this, exactly. Instead, I should use base 2. Meaning that rather than \(17\times10^{4} + 4\times10^{2} + 13\), I should use \(17\times2^{10} + 4\times2^{5} + 13\), which would give me, in binary, 100010010001101, which can be decomposed to 17, 4, and 13 in binary with simple bit shifting. (Assuming I did everything right there. I’m not double-checking it, but it’s basically the same idea as in decimal, and it should be easy to do with bit shifting.)

But there’s still a lot of wasted space there. You can still have a lot of invalid dates like the 23rd month of the year.

So, I decided to have a simple rollover system. There are, at most, 31 days in a month, so I could just have that the 32nd day means we know we’re on the next month. And we may as well start at zero, so that means that 0 = first day of the first month (Jan 1), 30 = 31st day of the month (Jan 31), and 31 = first day of next month (Feb 1). So, 35 would mean the fourth day of the second month of the year (Feb 4), and 203 would be the July 18th because it’s crossed the 31 marker 6 times and makes it to 17 more days. It’s easy to find a particular day because the whole thing is mod 31. (It’s easy to get lost in the offsets here, though.)

This still has some invalid dates like April 31st, but it’s also over a much smaller codomain than either base ten or base two.

Then we also have twelve months in a year, so we can use the same concept for finding years. Once we cross 12 months (372 days), we’re in the next year.

With this, we can convert a date into an integer with this equation: \(i = d + 31m + 372y\), where \(d\) is the day of the month (0-31), \(m\) is the month (0-11), and \(y\) is the number of years since 2001. The 31 converts the months to days and the 372 converts the years to days (as \(31\times12\)). We have a one-to-one relationship between a date and the integer resulting from this function.

Resulting in a very easy function:

/**
 * Convert a Date object to an integer.
 *
 * Integer is not human-readable as a date, but it orders the same way.
 *
 * @param   dateObj
 */
int toInt(Date dateObj)
{
    return dateObj.day + DAYMOD * dateObj.month
        + (MONTHMOD * DAYMOD) * dateObj.year;
    // Can actually simplify this a little bit.
}

With the Date struct being defined as:

typedef struct date_obj {
    /** @var Number of complete years since the year 2001. */
    int year;

    /** @var Month number, 0 - 11. */
    int month;

    /** @var Day of month, 0 - 30. */
    int day;
} Date;

But the other direction is slightly more complicated, and that’s where the number theory comes in.

Getting the day of the month is easy. \(i \equiv d + 31m + 372y (\text{mod } 31)\), and the second and third terms on the right are zero mod 31, so if we have an integer x we can easily get back the number of days with x % 31.

Getting back the month is a little more difficult. But not that difficult. Initially I tried working with the \(i \equiv d + 31m + 372y (\text{mod } 31)\), but that was dumb, because we’re using the wrong base there. Plus, there is no inverse of 31 modulo 372, so the equation is unsolvable.

It’s a little easier to start with rearranging the equation.

\[
\frac{i – d}{31} = m + 12y
\]

And we know that \(i – d\) is divisible by 31 because of how we defined \(d\). (My professors would absolutely want a better description than that in a proof, but it’s a blog post while I’m sitting in a coffee shop, so this will work for now.)

We can now use basic number theory to find \(m\) the same way that we found \(d\).

\[\frac{i-d}{31} \equiv m + 12y (\text{mod }12)\] \[\frac{i-d}{31} \equiv m (\text{mod }12)\]

Thus, (x - d)/31 % 12 is our number of months.

And finding the number of years once we have the number of days and months is trivial.

With these three parts, we have a function that’s the inverse of toInt from before:

/**
 * Convert an integer back to a Date object.
 *
 * @param   dateInt
 */
Date toDate(int dateInt)
{
    Date dateObj = {};

    dateObj.day = dateInt % DAYMOD;
    dateObj.month = (dateInt - dateObj.day) / DAYMOD % MONTHMOD;
    dateObj.year = (dateInt - dateObj.day - DAYMOD * dateObj.month)
        / (MONTHMOD * DAYMOD);

    return dateObj;
}

And that’s that.