Category Archives: Code

Use of simple number theory in code.

Published / by Andrew

In a personal project recently, I came to a problem that ended up being solved via number theory. I’m a little rusty, but this isn’t actually the first time I’ve used number theory to solve a coding problem. I thought it ended up interesting, so I decided to write a short post about it.

The issue is that I wanted a way to represent dates as numbers. The typical unix timestamp was quite a bit more specific than I wanted for my purposes, and I wanted to avoid pitfalls associated with DST and timezones, so struct tm was going to be a bit of a pain to manage. I wanted to handle dates. Only this and nothing more.

Furthermore, I didn’t want to handle something like “Number of days since 1900” or something like that, because handling leap years and different lengths of months is a bit of a pain.

So, this is the process that went through my mind.

The most intuitive way to represent a single date as a number is with multiples of 10. For example, April 13, 2017 would be 20170413. All dates for my use case would be after the year 2000, so I could just chop off those first two digits, making it 170413.

But that’s decimal, and computers think in binary, so I shouldn’t bother setting it up like this, exactly. Instead, I should use base 2. Meaning that rather than \(17\times10^{4} + 4\times10^{2} + 13\), I should use \(17\times2^{10} + 4\times2^{5} + 13\), which would give me, in binary, 100010010001101, which can be decomposed to 17, 4, and 13 in binary with simple bit shifting. (Assuming I did everything right there. I’m not double-checking it, but it’s basically the same idea as in decimal, and it should be easy to do with bit shifting.)

But there’s still a lot of wasted space there. You can still have a lot of invalid dates like the 23rd month of the year.

So, I decided to have a simple rollover system. There are, at most, 31 days in a month, so I could just have that the 32nd day means we know we’re on the next month. And we may as well start at zero, so that means that 0 = first day of the first month (Jan 1), 30 = 31st day of the month (Jan 31), and 31 = first day of next month (Feb 1). So, 35 would mean the fourth day of the second month of the year (Feb 4), and 203 would be the July 18th because it’s crossed the 31 marker 6 times and makes it to 17 more days. It’s easy to find a particular day because the whole thing is mod 31. (It’s easy to get lost in the offsets here, though.)

This still has some invalid dates like April 31st, but it’s also over a much smaller codomain than either base ten or base two.

Then we also have twelve months in a year, so we can use the same concept for finding years. Once we cross 12 months (372 days), we’re in the next year.

With this, we can convert a date into an integer with this equation: \(i = d + 31m + 372y\), where \(d\) is the day of the month (0-31), \(m\) is the month (0-11), and \(y\) is the number of years since 2001. The 31 converts the months to days and the 372 converts the years to days (as \(31\times12\)). We have a one-to-one relationship between a date and the integer resulting from this function.

Resulting in a very easy function:

/**
 * Convert a Date object to an integer.
 *
 * Integer is not human-readable as a date, but it orders the same way.
 *
 * @param   dateObj
 */
int toInt(Date dateObj)
{
    return dateObj.day + DAYMOD * dateObj.month
        + (MONTHMOD * DAYMOD) * dateObj.year;
    // Can actually simplify this a little bit.
}

With the Date struct being defined as:

typedef struct date_obj {
    /** @var Number of complete years since the year 2001. */
    int year;

    /** @var Month number, 0 - 11. */
    int month;

    /** @var Day of month, 0 - 30. */
    int day;
} Date;

But the other direction is slightly more complicated, and that’s where the number theory comes in.

Getting the day of the month is easy. \(i \equiv d + 31m + 372y (\text{mod } 31)\), and the second and third terms on the right are zero mod 31, so if we have an integer x we can easily get back the number of days with x % 31.

Getting back the month is a little more difficult. But not that difficult. Initially I tried working with the \(i \equiv d + 31m + 372y (\text{mod } 31)\), but that was dumb, because we’re using the wrong base there. Plus, there is no inverse of 31 modulo 372, so the equation is unsolvable.

It’s a little easier to start with rearranging the equation.

\[
\frac{i – d}{31} = m + 12y
\]

And we know that \(i – d\) is divisible by 31 because of how we defined \(d\). (My professors would absolutely want a better description than that in a proof, but it’s a blog post while I’m sitting in a coffee shop, so this will work for now.)

We can now use basic number theory to find \(m\) the same way that we found \(d\).

\[\frac{i-d}{31} \equiv m + 12y (\text{mod }12)\] \[\frac{i-d}{31} \equiv m (\text{mod }12)\]

Thus, (x - d)/31 % 12 is our number of months.

And finding the number of years once we have the number of days and months is trivial.

With these three parts, we have a function that’s the inverse of toInt from before:

/**
 * Convert an integer back to a Date object.
 *
 * @param   dateInt
 */
Date toDate(int dateInt)
{
    Date dateObj = {};

    dateObj.day = dateInt % DAYMOD;
    dateObj.month = (dateInt - dateObj.day) / DAYMOD % MONTHMOD;
    dateObj.year = (dateInt - dateObj.day - DAYMOD * dateObj.month)
        / (MONTHMOD * DAYMOD);

    return dateObj;
}

And that’s that.

Programming Should Be For Everybody: An Example

Published / by Andrew

Regular Expressions

(This is a copy-paste of the README.md from allquestionsinthebible, where the scripts discussed herein can be found.)

I’ve long been a proponent (admittedly, not a particularly vocal one) that everyone should learn to code at least a little. I don’t mean in the sense of the “You lost your job, so learn to code” buffoonery, just because coding professionally isn’t for everyone. That said, even if you have zero intention of using code professionally, it can be very handy to learn, say, a little bit of Python to get a job done.

What I have to present to you today is an example of how code can be a useful tool in your belt. This is just one of many instances of my using Python to demonstrate achieving a goal that would be difficult to do manually, whether you’re a professional developer or not. (And I don’t actually write Python professionally– My professional work is PHP, which is very different.)

There’s a background to this that’s relevant here. One of my Sunday School teachers made a passing comment at the Christmas party that he’d like to see a list of every question asked in the Bible. I thought to myself, “That should be easy, since every question ends with a question mark”, and I said I could probably do that with a Python script. He said he’d pay me if I did it, and I said it wasn’t necessary since it would be only about five lines of Python.

Well, it turned out not to be five lines of Python, but it was a pretty fun project in any case.

More importantly, though, it’s a great example of something that I could do because I knew how to write code, and why I think it would be beneficial for anybody to learn to code.

Essentially, code does nothing more than automate a monotonous task. So, something like finding every question in the Bible could be done by taking out a Bible and a sheet of paper, and painstakingly finding every single question, as a months-long task that in ages past would be performed by monks with nothing better to do (or by Strong, whoever that is). Or you could write a Python script that does all of it for you in seconds.

So, I took on the task during the week between Christmas and New Years, and this is how I did it. (I didn’t spend the whole week doing it, ftr.)

First off, I needed to use the World English Bible translation (WEB). The reason for this is that it is a modern language translation that is public domain. I could hypothetically do some webscraping and get the entirety of the ESV, but that would be legally sketchy at best.

I was disappointed to find that the current organization of the WEB in html is not terribly programmer-friendly. It’s far from horrendous, but I was expecting each verse to be in its own span or div, with an appropriate class name. Instead, the chapter and verse numbers themselves are in these divs/spans, and interspersed throughout the text. It makes perfect sense if you’re reading it in a browser, but makes things slightly difficult from a programmatic perspective. So, the first thing I did was reorganize everything– All of the WEB (just the Protestant canon) as an XML file. This is the first script, reorganizeasxml.py, and was the bulk of the work done.

To do this, I used the archive of the html version of the WEB. To run the first script to generate the XML, unzip the contents of that archive into a folder, drop reorganizeasxml.py into the directory, and run it. It will create complete.xml— One giant XML file containing the entire WEB translation, in a programmer-friendly XML format.

It’s about five MB in size, so I do not recommend trying to open it in Notepad or any other text editor that’s not designed for large files (I used less to view the contents).

I could have made this easier, theoretically, by using the plaintext version, which seems to have one verse per line. I decided against that, though, because I’m not 100% sure that it’s always the case that it’s one verse per line– It never explicitly says so– and since the file names seem to indicate to me that these files exist for the purpose of reading out loud, so I expect they’re not necessarily going to be particularly careful about line breaks. I think, then, that it would be better to put in the extra effort in the clearer source material. (And, honestly, a big part of my motivation here is that doing it the hard way is a lot more fun.)

reorganizeasxml.py loops through every file, uses BeautifulSoup to identify where every verse begins and ends, and dumps every verse into the output file, in (Protestant) canonical order. This is mostly pretty easy– The only thing that was mildly difficult is getting the end of the final verse, because the text content of each chapter is not organized into one big div, so I had to find where the site navigation began instead. Apart from that, it’s pretty straightforward.

After running reorganizeasxml.py, we have complete.xml. We can now run findallquestions.py, which is significantly simpler overall, and doesn’t even define any functions. It doesn’t need to because of how reorganizeasxml.py built the xml structure.

It starts out by finding every vs element that contains a question mark. Then it creates a csv file, creates a header line, and then dumps all of the information into that csv file, one line at a time. Bam. Done.

What this script does not do is identify the one asking or the one being asked. I think I’d have to use some AI/ML for that, and I’ve never done anything with that. For that reason, those columns are blank, to be filled in manually. (I don’t actually expect that that would ever be 100% completed.)

Well, there you go. There’s a case study in a one-off use of Python code to accomplish a task that would be a pain to do manually. It’s just one example of how coding can be useful in day-to-day things, whether you’re a developer or not.

Another example is something that I made using matplotlib when the Covid panic started. The only historical data Oklahoma gave us at the time was the the cumulative case count, but I had a friend that wanted a graph of the daily case count. Converting the raw data from cumulative to daily was easy, so I made a graph in matplotlib, then made a script to scrape the website and update the chart on Github. I then made a cron job on my laptop to update it daily. It was a fun project, but only lasted about a week before the Oklahoma health dept changed the format of the website so it stopped working. At that point, they started giving us the chart exactly as we wanted it anyway, so there wasn’t much point in fixing it.

The Bible questions scripts probably seem overwhelming to someone new to coding, but I can assure you that, given time, writing code becomes as natural as breathing. What’s hard today is easy tomorrow (which I keep telling myself while learning C++). I’ve learned that lesson repeatedly a million times.

Handling the equivalent of FIRST() and LAST() in MySQL

Published / by Andrew

Something that’s been an enormous pain for years is that MySQL does not have aggregate functions for first() and last(), like every other SQL-based language. Why they don’t have it, I have no idea. It’s been requested and ignored for over 15 years, and doing a search for this online reveals many people frustrated with the lack of this feature. You’ll find hundreds of “solutions”, most of which either don’t work at all or are so convoluted that they’d be nearly impossible to implement.

To make this as abstract and possible, let’s say we have a table that looks like so:

MariaDB [test_db]> select * from test_table;
+----+----------+---------+
| id | ordering | groupid |
+----+----------+---------+
|  1 |        4 |       1 |
|  2 |        1 |       1 |
|  3 |        2 |       1 |
|  4 |        4 |       2 |
|  5 |        6 |       2 |
|  6 |        1 |       2 |
|  7 |        3 |       2 |
|  8 |        8 |       3 |
|  9 |        1 |       3 |
| 10 |        5 |       3 |
+----+----------+---------+

And we need to find the id of the greatest ordering for each groupid.

If MySQL were sane, we’d be able to do this with a relatively simple query similar to this:

    -- Reminder: This does not work in MySQL!  The correct solution is later in this post.
    SELECT
        last(id)
    FROM
        test_table
    ORDER BY
        ordering
    GROUP BY
        groupid

But, no, that’s not possible.

It’s easy to figure out what the solution should be with this query:

MariaDB [test_db]> SELECT * from test_table order by groupid,ordering;
+----+----------+---------+
| id | ordering | groupid |
+----+----------+---------+
|  2 |        1 |       1 |
|  3 |        2 |       1 |
|  1 |        4 |       1 |
|  6 |        1 |       2 |
|  7 |        3 |       2 |
|  4 |        4 |       2 |
|  5 |        6 |       2 |
|  9 |        1 |       3 |
| 10 |        5 |       3 |
|  8 |        8 |       3 |
+----+----------+---------+
10 rows in set (0.00 sec)

We can look at these results and say, “Hey, it’s obvious that the ids I need are 1, 5, and 8.” But that’s not going to do us much good if we’re doing a more complex query.

I’m pretty sure that at one point I had a moderately elegant solution to this using subqueries and LIMIT 1, but I haven’t been able to figure out what that is.

But, considering how aggravating this has been, I thought I’d post my most recent solution so I’d have a reference and will, hopefully, never have to figure it out all over again.

My solution is a join with a subquery (which I’m not crazy about, but it’s better than most of the other really convoluted solutions that I’ve found online).

    SELECT
        tbl1.id,
        tbl1.groupid
    FROM
        test_table as tbl1 INNER JOIN
        (SELECT groupid,max(ordering) as maxorder FROM test_table GROUP BY groupid) as tbl2 ON
            tbl1.groupid=tbl2.groupid AND 
            tbl1.ordering=tbl2.maxorder
    ;

Which outputs:

+----+---------+
| id | groupid |
+----+---------+
|  1 |       1 |
|  5 |       2 |
|  8 |       3 |
+----+---------+

One thing that really stinks about this is that it’s probably not going to be equally useful in all situations, but this will hopefully be adaptable to different needs. Say you wanted to modify all records except the most recent ones, you could use the above as a subquery in a WHERE clause.

I’m not the only person that uses this solution to this problem, but I sure wish there were a way to do this without a subqueries.

Update: It’s never quite as easy as you’d expect.

It turns out that to actually use the above query, you need to do yet another subquery. I actually don’t really understand why it’s necessary, but I was able (with some searching) to figure out how to do it.

Let’s say we want to actually do something with this query. Let’s add another column with tinyint(1) called “myvalue” and set them all to false.

+----+----------+---------+---------+
| id | ordering | groupid | myvalue |
+----+----------+---------+---------+
|  1 |        4 |       1 |       0 |
|  2 |        1 |       1 |       0 |
|  3 |        2 |       1 |       0 |
|  4 |        4 |       2 |       0 |
|  5 |        6 |       2 |       0 |
|  6 |        1 |       2 |       0 |
|  7 |        3 |       2 |       0 |
|  8 |        8 |       3 |       0 |
|  9 |        1 |       3 |       0 |
| 10 |        5 |       3 |       0 |
+----+----------+---------+---------+

Suppose we want to set all but the most recent value to true for each group id.

So, this does not work:

-- Reminder: This does not work in MySQL!  The correct solution is later in this post.
UPDATE
    test_table
SET
    myvalue = 1
WHERE
    id NOT IN (
        SELECT
            id
        FROM
            test_table as tbl1 INNER JOIN
            (SELECT groupid,max(ordering) as maxorder FROM test_table GROUP BY groupid) as tbl2 ON
                tbl1.groupid=tbl2.groupid AND
                tbl1.ordering=tbl2.maxorder
    )
;

This results in the following error in MariaDB:

Table 'test_table' is specified twice, both as a target for 'UPDATE' and as a
separate source for data

I think I got a different error in the AWS database based off MySQL, but the result is the same– This does not work.

Instead, we have to use yet another subquery. So, this does work:

UPDATE
    test_table
SET
    myvalue = 1
WHERE
    id NOT IN ( -- Starting the added subquery here!
        SELECT
            id
        FROM (
            SELECT
                id
            FROM
                test_table AS tbl1 INNER JOIN
                (SELECT groupid,max(ordering) AS maxorder FROM test_table GROUP BY groupid) AS tbl2 ON
                    tbl1.groupid=tbl2.groupid AND
                    tbl1.ordering=tbl2.maxorder
        ) as selecttbl
    )
;

When we run SELECT * FROM test_table ORDER BY groupid,ordering;, we can see that it was successful.

+----+----------+---------+---------+
| id | ordering | groupid | myvalue |
+----+----------+---------+---------+
|  2 |        1 |       1 |       1 |
|  3 |        2 |       1 |       1 |
|  1 |        4 |       1 |       0 |
|  6 |        1 |       2 |       1 |
|  7 |        3 |       2 |       1 |
|  4 |        4 |       2 |       1 |
|  5 |        6 |       2 |       0 |
|  9 |        1 |       3 |       1 |
| 10 |        5 |       3 |       1 |
|  8 |        8 |       3 |       0 |
+----+----------+---------+---------+

Yikes.

More FastMail programming woes

Published / by Andrew

The email on my Threadstr project has been broken for quite some time, but I’ve been too busy with other projects to get it working, since it’s not functional anyway. I finally got it working (but I haven’t gotten it pushed to the website yet), and I thought I’d share the fix here.

(Sidenote– When I did a search of “fastmail nodemailer” in Duckduckgo, my previous entry on the topic was the third result. That’s both kind of neat and kind of frustrating, because, apparently, nobody else was publically working on the problem.)

If you get an error message from Nodemailer that says “Sorry, the messagingengine.com server names are no longer available,” this is because FastMail made yet another breaking change. (I’m kind of getting to the point where I’m considering moving to another service if this keeps up.) Unfortunately, the most recent version of Nodemailer hasn’t been able to catch up yet. You could change ths source code, but that’s probably not a good idea, because it would be overwritten by the next npm update.

When creating the transport (i.e., when you use Nodemailer’s createTransport function), you can’t currently use the “service” : “fastmail” option, because it uses Fastmail’s old settings. Instead, you’ll want to enter the settings manually, which, for FastMail, is the following:

"domains":["fastmail.fm"],
"host":"smtp.fastmail.com",
"port":465,
"secure":true,
"auth":{"user":"user@example.com","pass":"somepassword"}

You’ll need to use your own username and password, of course.

I hope this helps anyone coming across this problem. It’s an easy fix, but it is just frustrating that it needs to be done in the first place.

A couple of headaches with CakePHP

Published / by Andrew

I thought I’d share a couple of headaches I had with CakePHP and the solutions that I found. I’ve tried running through this tutorial to get a jump on Cake, but quickly came up with a couple of problems while running it in Apache on Ubuntu 16.04.

If you’re new to running Apache on Linux, probably the first thing you’ll want to do is run these commands:

    sudo adduser $username www-data
    sudo chown -R www-data:www-data /var/www
    sudo chmod -R g=rwx /var/www

Replace $username with your own username. This will let you write to the /var/www/html directory without having to use superuser privileges, but you’ll need to log out and back in before the change takes effect.

That’s only related to CakePHP insofar as it involves Apache with PHP, though, and is something that you’ll want to do on any dev environment for PHP. These are the two headaches I came across:

bin/cake isn’t executable

If you get a “Permission denied” error when trying to start the terminal or run the bake command, this isn’t because you don’t have correct permissions (well, not quite). It’s because bin/cake isn’t an executable file. In my case, I’d been using Linux long enough to know that that was the problem, but anybody that’s more familiar with a Windows environment might be confused when they get to this step, because the tutorial doesn’t mention it.

It’s easily resolvable. While in the project directory, run this command: sudo chmod +x bin/cake.

URL rewriting is not working

This is a second one that’s not mentioned in the tutorial, and even the CakePHP’s article on this particular topic didn’t have the correct solution. This took me hours to track down, and the reason why is because there are actually two steps that need to be done, and no one webpage contained them both as part of a single solution.

This first step I got from a source that I don’t remember, unfortunately. Run this command in the terminal: sudo a2enmod rewrite

The second step is to do some file editing. I got this one from a DigitalOcean message board. You’ll need to open apache2.conf. If you’re using a GUI, you can open this up with any text editor you want as long as you have su privileges. If you’re on a terminal, I like to use Vim, so I open it with sudo vim /etc/apache2/apache2.conf. If you want something easier to use than Vim, I hear Nano is easy to use (though I’ve never used it myself), so this should open it: sudo nano /etc/apache2/apache2.conf.

You’ll then want to find an XML-ish looking block that looks like this:

<Directory /var/www/>
        Options Indexes FollowSymLinks
        AllowOverride None
        Require all granted
</Directory>

And you’ll want to change the AllowOverride None to AllowOverride All.

When that’s done, you’ll want to restart Apache with sudo service apache2 restart.

This was all that I needed to get CakePHP running smoothly on an Ubuntu Server 16.04 VM using Apache. I can’t really guarantee that it’ll work for you, but I can say that it worked for me.