Category Archives: Code

Programming Should Be For Everybody: An Example

Published / by Andrew

Regular Expressions

(This is a copy-paste of the README.md from allquestionsinthebible, where the scripts discussed herein can be found.)

I’ve long been a proponent (admittedly, not a particularly vocal one) that everyone should learn to code at least a little. I don’t mean in the sense of the “You lost your job, so learn to code” buffoonery, just because coding professionally isn’t for everyone. That said, even if you have zero intention of using code professionally, it can be very handy to learn, say, a little bit of Python to get a job done.

What I have to present to you today is an example of how code can be a useful tool in your belt. This is just one of many instances of my using Python to demonstrate achieving a goal that would be difficult to do manually, whether you’re a professional developer or not. (And I don’t actually write Python professionally– My professional work is PHP, which is very different.)

There’s a background to this that’s relevant here. One of my Sunday School teachers made a passing comment at the Christmas party that he’d like to see a list of every question asked in the Bible. I thought to myself, “That should be easy, since every question ends with a question mark”, and I said I could probably do that with a Python script. He said he’d pay me if I did it, and I said it wasn’t necessary since it would be only about five lines of Python.

Well, it turned out not to be five lines of Python, but it was a pretty fun project in any case.

More importantly, though, it’s a great example of something that I could do because I knew how to write code, and why I think it would be beneficial for anybody to learn to code.

Essentially, code does nothing more than automate a monotonous task. So, something like finding every question in the Bible could be done by taking out a Bible and a sheet of paper, and painstakingly finding every single question, as a months-long task that in ages past would be performed by monks with nothing better to do (or by Strong, whoever that is). Or you could write a Python script that does all of it for you in seconds.

So, I took on the task during the week between Christmas and New Years, and this is how I did it. (I didn’t spend the whole week doing it, ftr.)

First off, I needed to use the World English Bible translation (WEB). The reason for this is that it is a modern language translation that is public domain. I could hypothetically do some webscraping and get the entirety of the ESV, but that would be legally sketchy at best.

I was disappointed to find that the current organization of the WEB in html is not terribly programmer-friendly. It’s far from horrendous, but I was expecting each verse to be in its own span or div, with an appropriate class name. Instead, the chapter and verse numbers themselves are in these divs/spans, and interspersed throughout the text. It makes perfect sense if you’re reading it in a browser, but makes things slightly difficult from a programmatic perspective. So, the first thing I did was reorganize everything– All of the WEB (just the Protestant canon) as an XML file. This is the first script, reorganizeasxml.py, and was the bulk of the work done.

To do this, I used the archive of the html version of the WEB. To run the first script to generate the XML, unzip the contents of that archive into a folder, drop reorganizeasxml.py into the directory, and run it. It will create complete.xml— One giant XML file containing the entire WEB translation, in a programmer-friendly XML format.

It’s about five MB in size, so I do not recommend trying to open it in Notepad or any other text editor that’s not designed for large files (I used less to view the contents).

I could have made this easier, theoretically, by using the plaintext version, which seems to have one verse per line. I decided against that, though, because I’m not 100% sure that it’s always the case that it’s one verse per line– It never explicitly says so– and since the file names seem to indicate to me that these files exist for the purpose of reading out loud, so I expect they’re not necessarily going to be particularly careful about line breaks. I think, then, that it would be better to put in the extra effort in the clearer source material. (And, honestly, a big part of my motivation here is that doing it the hard way is a lot more fun.)

reorganizeasxml.py loops through every file, uses BeautifulSoup to identify where every verse begins and ends, and dumps every verse into the output file, in (Protestant) canonical order. This is mostly pretty easy– The only thing that was mildly difficult is getting the end of the final verse, because the text content of each chapter is not organized into one big div, so I had to find where the site navigation began instead. Apart from that, it’s pretty straightforward.

After running reorganizeasxml.py, we have complete.xml. We can now run findallquestions.py, which is significantly simpler overall, and doesn’t even define any functions. It doesn’t need to because of how reorganizeasxml.py built the xml structure.

It starts out by finding every vs element that contains a question mark. Then it creates a csv file, creates a header line, and then dumps all of the information into that csv file, one line at a time. Bam. Done.

What this script does not do is identify the one asking or the one being asked. I think I’d have to use some AI/ML for that, and I’ve never done anything with that. For that reason, those columns are blank, to be filled in manually. (I don’t actually expect that that would ever be 100% completed.)

Well, there you go. There’s a case study in a one-off use of Python code to accomplish a task that would be a pain to do manually. It’s just one example of how coding can be useful in day-to-day things, whether you’re a developer or not.

Another example is something that I made using matplotlib when the Covid panic started. The only historical data Oklahoma gave us at the time was the the cumulative case count, but I had a friend that wanted a graph of the daily case count. Converting the raw data from cumulative to daily was easy, so I made a graph in matplotlib, then made a script to scrape the website and update the chart on Github. I then made a cron job on my laptop to update it daily. It was a fun project, but only lasted about a week before the Oklahoma health dept changed the format of the website so it stopped working. At that point, they started giving us the chart exactly as we wanted it anyway, so there wasn’t much point in fixing it.

The Bible questions scripts probably seem overwhelming to someone new to coding, but I can assure you that, given time, writing code becomes as natural as breathing. What’s hard today is easy tomorrow (which I keep telling myself while learning C++). I’ve learned that lesson repeatedly a million times.

Handling the equivalent of FIRST() and LAST() in MySQL

Published / by Andrew

Something that’s been an enormous pain for years is that MySQL does not have aggregate functions for first() and last(), like every other SQL-based language. Why they don’t have it, I have no idea. It’s been requested and ignored for over 15 years, and doing a search for this online reveals many people frustrated with the lack of this feature. You’ll find hundreds of “solutions”, most of which either don’t work at all or are so convoluted that they’d be nearly impossible to implement.

To make this as abstract and possible, let’s say we have a table that looks like so:

MariaDB [test_db]> select * from test_table;
+----+----------+---------+
| id | ordering | groupid |
+----+----------+---------+
|  1 |        4 |       1 |
|  2 |        1 |       1 |
|  3 |        2 |       1 |
|  4 |        4 |       2 |
|  5 |        6 |       2 |
|  6 |        1 |       2 |
|  7 |        3 |       2 |
|  8 |        8 |       3 |
|  9 |        1 |       3 |
| 10 |        5 |       3 |
+----+----------+---------+

And we need to find the id of the greatest ordering for each groupid.

If MySQL were sane, we’d be able to do this with a relatively simple query similar to this:

    -- Reminder: This does not work in MySQL!  The correct solution is later in this post.
    SELECT
        last(id)
    FROM
        test_table
    ORDER BY
        ordering
    GROUP BY
        groupid

But, no, that’s not possible.

It’s easy to figure out what the solution should be with this query:

MariaDB [test_db]> SELECT * from test_table order by groupid,ordering;
+----+----------+---------+
| id | ordering | groupid |
+----+----------+---------+
|  2 |        1 |       1 |
|  3 |        2 |       1 |
|  1 |        4 |       1 |
|  6 |        1 |       2 |
|  7 |        3 |       2 |
|  4 |        4 |       2 |
|  5 |        6 |       2 |
|  9 |        1 |       3 |
| 10 |        5 |       3 |
|  8 |        8 |       3 |
+----+----------+---------+
10 rows in set (0.00 sec)

We can look at these results and say, “Hey, it’s obvious that the ids I need are 1, 5, and 8.” But that’s not going to do us much good if we’re doing a more complex query.

I’m pretty sure that at one point I had a moderately elegant solution to this using subqueries and LIMIT 1, but I haven’t been able to figure out what that is.

But, considering how aggravating this has been, I thought I’d post my most recent solution so I’d have a reference and will, hopefully, never have to figure it out all over again.

My solution is a join with a subquery (which I’m not crazy about, but it’s better than most of the other really convoluted solutions that I’ve found online).

    SELECT
        tbl1.id,
        tbl1.groupid
    FROM
        test_table as tbl1 INNER JOIN
        (SELECT groupid,max(ordering) as maxorder FROM test_table GROUP BY groupid) as tbl2 ON
            tbl1.groupid=tbl2.groupid AND 
            tbl1.ordering=tbl2.maxorder
    ;

Which outputs:

+----+---------+
| id | groupid |
+----+---------+
|  1 |       1 |
|  5 |       2 |
|  8 |       3 |
+----+---------+

One thing that really stinks about this is that it’s probably not going to be equally useful in all situations, but this will hopefully be adaptable to different needs. Say you wanted to modify all records except the most recent ones, you could use the above as a subquery in a WHERE clause.

I’m not the only person that uses this solution to this problem, but I sure wish there were a way to do this without a subqueries.

Update: It’s never quite as easy as you’d expect.

It turns out that to actually use the above query, you need to do yet another subquery. I actually don’t really understand why it’s necessary, but I was able (with some searching) to figure out how to do it.

Let’s say we want to actually do something with this query. Let’s add another column with tinyint(1) called “myvalue” and set them all to false.

+----+----------+---------+---------+
| id | ordering | groupid | myvalue |
+----+----------+---------+---------+
|  1 |        4 |       1 |       0 |
|  2 |        1 |       1 |       0 |
|  3 |        2 |       1 |       0 |
|  4 |        4 |       2 |       0 |
|  5 |        6 |       2 |       0 |
|  6 |        1 |       2 |       0 |
|  7 |        3 |       2 |       0 |
|  8 |        8 |       3 |       0 |
|  9 |        1 |       3 |       0 |
| 10 |        5 |       3 |       0 |
+----+----------+---------+---------+

Suppose we want to set all but the most recent value to true for each group id.

So, this does not work:

-- Reminder: This does not work in MySQL!  The correct solution is later in this post.
UPDATE
    test_table
SET
    myvalue = 1
WHERE
    id NOT IN (
        SELECT
            id
        FROM
            test_table as tbl1 INNER JOIN
            (SELECT groupid,max(ordering) as maxorder FROM test_table GROUP BY groupid) as tbl2 ON
                tbl1.groupid=tbl2.groupid AND
                tbl1.ordering=tbl2.maxorder
    )
;

This results in the following error in MariaDB:

Table 'test_table' is specified twice, both as a target for 'UPDATE' and as a
separate source for data

I think I got a different error in the AWS database based off MySQL, but the result is the same– This does not work.

Instead, we have to use yet another subquery. So, this does work:

UPDATE
    test_table
SET
    myvalue = 1
WHERE
    id NOT IN ( -- Starting the added subquery here!
        SELECT
            id
        FROM (
            SELECT
                id
            FROM
                test_table AS tbl1 INNER JOIN
                (SELECT groupid,max(ordering) AS maxorder FROM test_table GROUP BY groupid) AS tbl2 ON
                    tbl1.groupid=tbl2.groupid AND
                    tbl1.ordering=tbl2.maxorder
        ) as selecttbl
    )
;

When we run SELECT * FROM test_table ORDER BY groupid,ordering;, we can see that it was successful.

+----+----------+---------+---------+
| id | ordering | groupid | myvalue |
+----+----------+---------+---------+
|  2 |        1 |       1 |       1 |
|  3 |        2 |       1 |       1 |
|  1 |        4 |       1 |       0 |
|  6 |        1 |       2 |       1 |
|  7 |        3 |       2 |       1 |
|  4 |        4 |       2 |       1 |
|  5 |        6 |       2 |       0 |
|  9 |        1 |       3 |       1 |
| 10 |        5 |       3 |       1 |
|  8 |        8 |       3 |       0 |
+----+----------+---------+---------+

Yikes.

More FastMail programming woes

Published / by Andrew

The email on my Threadstr project has been broken for quite some time, but I’ve been too busy with other projects to get it working, since it’s not functional anyway. I finally got it working (but I haven’t gotten it pushed to the website yet), and I thought I’d share the fix here.

(Sidenote– When I did a search of “fastmail nodemailer” in Duckduckgo, my previous entry on the topic was the third result. That’s both kind of neat and kind of frustrating, because, apparently, nobody else was publically working on the problem.)

If you get an error message from Nodemailer that says “Sorry, the messagingengine.com server names are no longer available,” this is because FastMail made yet another breaking change. (I’m kind of getting to the point where I’m considering moving to another service if this keeps up.) Unfortunately, the most recent version of Nodemailer hasn’t been able to catch up yet. You could change ths source code, but that’s probably not a good idea, because it would be overwritten by the next npm update.

When creating the transport (i.e., when you use Nodemailer’s createTransport function), you can’t currently use the “service” : “fastmail” option, because it uses Fastmail’s old settings. Instead, you’ll want to enter the settings manually, which, for FastMail, is the following:

"domains":["fastmail.fm"],
"host":"smtp.fastmail.com",
"port":465,
"secure":true,
"auth":{"user":"user@example.com","pass":"somepassword"}

You’ll need to use your own username and password, of course.

I hope this helps anyone coming across this problem. It’s an easy fix, but it is just frustrating that it needs to be done in the first place.

A couple of headaches with CakePHP

Published / by Andrew

I thought I’d share a couple of headaches I had with CakePHP and the solutions that I found. I’ve tried running through this tutorial to get a jump on Cake, but quickly came up with a couple of problems while running it in Apache on Ubuntu 16.04.

If you’re new to running Apache on Linux, probably the first thing you’ll want to do is run these commands:

    sudo adduser $username www-data
    sudo chown -R www-data:www-data /var/www
    sudo chmod -R g=rwx /var/www

Replace $username with your own username. This will let you write to the /var/www/html directory without having to use superuser privileges, but you’ll need to log out and back in before the change takes effect.

That’s only related to CakePHP insofar as it involves Apache with PHP, though, and is something that you’ll want to do on any dev environment for PHP. These are the two headaches I came across:

bin/cake isn’t executable

If you get a “Permission denied” error when trying to start the terminal or run the bake command, this isn’t because you don’t have correct permissions (well, not quite). It’s because bin/cake isn’t an executable file. In my case, I’d been using Linux long enough to know that that was the problem, but anybody that’s more familiar with a Windows environment might be confused when they get to this step, because the tutorial doesn’t mention it.

It’s easily resolvable. While in the project directory, run this command: sudo chmod +x bin/cake.

URL rewriting is not working

This is a second one that’s not mentioned in the tutorial, and even the CakePHP’s article on this particular topic didn’t have the correct solution. This took me hours to track down, and the reason why is because there are actually two steps that need to be done, and no one webpage contained them both as part of a single solution.

This first step I got from a source that I don’t remember, unfortunately. Run this command in the terminal: sudo a2enmod rewrite

The second step is to do some file editing. I got this one from a DigitalOcean message board. You’ll need to open apache2.conf. If you’re using a GUI, you can open this up with any text editor you want as long as you have su privileges. If you’re on a terminal, I like to use Vim, so I open it with sudo vim /etc/apache2/apache2.conf. If you want something easier to use than Vim, I hear Nano is easy to use (though I’ve never used it myself), so this should open it: sudo nano /etc/apache2/apache2.conf.

You’ll then want to find an XML-ish looking block that looks like this:

<Directory /var/www/>
        Options Indexes FollowSymLinks
        AllowOverride None
        Require all granted
</Directory>

And you’ll want to change the AllowOverride None to AllowOverride All.

When that’s done, you’ll want to restart Apache with sudo service apache2 restart.

This was all that I needed to get CakePHP running smoothly on an Ubuntu Server 16.04 VM using Apache. I can’t really guarantee that it’ll work for you, but I can say that it worked for me.