Author Archives: Andrew

Programming Should Be For Everybody: An Example

Published / by Andrew

Regular Expressions

(This is a copy-paste of the README.md from allquestionsinthebible, where the scripts discussed herein can be found.)

I’ve long been a proponent (admittedly, not a particularly vocal one) that everyone should learn to code at least a little. I don’t mean in the sense of the “You lost your job, so learn to code” buffoonery, just because coding professionally isn’t for everyone. That said, even if you have zero intention of using code professionally, it can be very handy to learn, say, a little bit of Python to get a job done.

What I have to present to you today is an example of how code can be a useful tool in your belt. This is just one of many instances of my using Python to demonstrate achieving a goal that would be difficult to do manually, whether you’re a professional developer or not. (And I don’t actually write Python professionally– My professional work is PHP, which is very different.)

There’s a background to this that’s relevant here. One of my Sunday School teachers made a passing comment at the Christmas party that he’d like to see a list of every question asked in the Bible. I thought to myself, “That should be easy, since every question ends with a question mark”, and I said I could probably do that with a Python script. He said he’d pay me if I did it, and I said it wasn’t necessary since it would be only about five lines of Python.

Well, it turned out not to be five lines of Python, but it was a pretty fun project in any case.

More importantly, though, it’s a great example of something that I could do because I knew how to write code, and why I think it would be beneficial for anybody to learn to code.

Essentially, code does nothing more than automate a monotonous task. So, something like finding every question in the Bible could be done by taking out a Bible and a sheet of paper, and painstakingly finding every single question, as a months-long task that in ages past would be performed by monks with nothing better to do (or by Strong, whoever that is). Or you could write a Python script that does all of it for you in seconds.

So, I took on the task during the week between Christmas and New Years, and this is how I did it. (I didn’t spend the whole week doing it, ftr.)

First off, I needed to use the World English Bible translation (WEB). The reason for this is that it is a modern language translation that is public domain. I could hypothetically do some webscraping and get the entirety of the ESV, but that would be legally sketchy at best.

I was disappointed to find that the current organization of the WEB in html is not terribly programmer-friendly. It’s far from horrendous, but I was expecting each verse to be in its own span or div, with an appropriate class name. Instead, the chapter and verse numbers themselves are in these divs/spans, and interspersed throughout the text. It makes perfect sense if you’re reading it in a browser, but makes things slightly difficult from a programmatic perspective. So, the first thing I did was reorganize everything– All of the WEB (just the Protestant canon) as an XML file. This is the first script, reorganizeasxml.py, and was the bulk of the work done.

To do this, I used the archive of the html version of the WEB. To run the first script to generate the XML, unzip the contents of that archive into a folder, drop reorganizeasxml.py into the directory, and run it. It will create complete.xml— One giant XML file containing the entire WEB translation, in a programmer-friendly XML format.

It’s about five MB in size, so I do not recommend trying to open it in Notepad or any other text editor that’s not designed for large files (I used less to view the contents).

I could have made this easier, theoretically, by using the plaintext version, which seems to have one verse per line. I decided against that, though, because I’m not 100% sure that it’s always the case that it’s one verse per line– It never explicitly says so– and since the file names seem to indicate to me that these files exist for the purpose of reading out loud, so I expect they’re not necessarily going to be particularly careful about line breaks. I think, then, that it would be better to put in the extra effort in the clearer source material. (And, honestly, a big part of my motivation here is that doing it the hard way is a lot more fun.)

reorganizeasxml.py loops through every file, uses BeautifulSoup to identify where every verse begins and ends, and dumps every verse into the output file, in (Protestant) canonical order. This is mostly pretty easy– The only thing that was mildly difficult is getting the end of the final verse, because the text content of each chapter is not organized into one big div, so I had to find where the site navigation began instead. Apart from that, it’s pretty straightforward.

After running reorganizeasxml.py, we have complete.xml. We can now run findallquestions.py, which is significantly simpler overall, and doesn’t even define any functions. It doesn’t need to because of how reorganizeasxml.py built the xml structure.

It starts out by finding every vs element that contains a question mark. Then it creates a csv file, creates a header line, and then dumps all of the information into that csv file, one line at a time. Bam. Done.

What this script does not do is identify the one asking or the one being asked. I think I’d have to use some AI/ML for that, and I’ve never done anything with that. For that reason, those columns are blank, to be filled in manually. (I don’t actually expect that that would ever be 100% completed.)

Well, there you go. There’s a case study in a one-off use of Python code to accomplish a task that would be a pain to do manually. It’s just one example of how coding can be useful in day-to-day things, whether you’re a developer or not.

Another example is something that I made using matplotlib when the Covid panic started. The only historical data Oklahoma gave us at the time was the the cumulative case count, but I had a friend that wanted a graph of the daily case count. Converting the raw data from cumulative to daily was easy, so I made a graph in matplotlib, then made a script to scrape the website and update the chart on Github. I then made a cron job on my laptop to update it daily. It was a fun project, but only lasted about a week before the Oklahoma health dept changed the format of the website so it stopped working. At that point, they started giving us the chart exactly as we wanted it anyway, so there wasn’t much point in fixing it.

The Bible questions scripts probably seem overwhelming to someone new to coding, but I can assure you that, given time, writing code becomes as natural as breathing. What’s hard today is easy tomorrow (which I keep telling myself while learning C++). I’ve learned that lesson repeatedly a million times.

Money or Respect.

Published / by Andrew

When this strip came out, I would have been almost ten years old. I’m not sure how old I was when I actually read it for the first time, but I probably wasn’t much older than that.

This strip offers two choices– Money or respect, but not both. Of course, the punchline is that Dilbert had neither, but this strip stuck with me as a kid. It’s something that I think about often to this day.

This was a no-brainer when I first read it. Obviously I’d rather have a high-paying job where I have little to no respect. Why wouldn’t I? I don’t care if I’m not respected, but money can go a long way.

As the saying goes, be careful what you wish for, because you just might get it.

As an adult, I’ve had jobs in which I was highly-paid while having to endure frequent abuse. (I won’t go into detail publicly.) Now I understand the impact it has. I’ve switched from a high-paying job to a lower-paying job before, and I did so with little hesitation because I was miserable in the high-paying job. I would have no problem doing it again.

There’s an old saying that I’m sure anyone reading this would have heard before– “Money isn’t everything”. I would add to that “Wow, is it ever not.” If you’re unhappy in your high-paying job, you’re probably better off switching to another job. I’ve reached a point where I don’t really care much about how much money I make as long as I’m not destitute.

I am very much aware that this opinion is a luxury that many people cannot afford. I’ve never actually been forced to chose between disrespect and poverty as Phil presented the decision to Dilbert. I know that I’m greatly oversimplifying an extraordinarily difficult topic. Many people would object “What about this? What about that?” There are certainly going to be cases in which it’s necessary to endure painful situations, but that’s really outside the scope of what I’m saying here. What I am saying is simply that if you have the choice (and that’s really the key), choose a good life over wealth.

(As a sidenote, productivity is not merely a means to wealth, and if you read this post as if it were a statement from Diogenes as an excuse to do nothing, I would submit that if you are not industrious, you will die miserable. Productivity is a necessary component of life.  I know this because of how miserable I was during long breaks in college and grad school.)

I’m not strictly speaking on happiness, either, but on all of the greater things in life.  You can fill in the blank on what you would believe those greater things to be.  Wealth is not going to provide either happiness or a better life in and of itself. Wealth is a minor component in life. It is a means to an end and not an end in itself. If you are not progressing towards the end of a life that is whole, then you need to rebalance your life.

This is where it really matters– If you’re unable to seek the greater things in life, the things that make life worth living, because of the damage your job is doing to you, you should probably leave. For what does it profit a man to gain the whole world and forfeit his soul? (Mark 8:36).

Troubleshooting Kodi — “Couldn’t connect to network server.”

Published / by Andrew

I just had an aggravating experience with using Kodi over my home network. I thought I’d go ahead and post it here in case anyone else is having the same problem, and also to cement it into my brain in case I have this problem again.

I have my music hosted on my desktop computer (which is increasingly more and more of a server machine, as I only use it via SSH from my laptop in my living room), and I set up Kodi to connect to the directory through SSH. From two different devices, I’m getting the same message: “Couldn’t connect to network server.”

These devices are a laptop and an Android phone. The cause of the problem turned out to be the same in both (at least, I think), but Kodi has a horrible problem in not explaining why it couldn’t connect to the network server, making debugging really irritating. The problem is that I had reinstalled the OS on the server machine, so the SSH verification keys were different. SSH was rejecting the connect as a security measure.

While the problem was the same for both devices, the solution was really different for each.

For the laptop, I only removed the offending key. It wasn’t a big deal– I just tried to SSH to my desktop from the laptop, and it gave me the instructions. Then I needed to restart Kodi, and everything worked fine.

My phone is a different story. The convenient tools for SSH just aren’t available here, and I don’t have any idea how to remove the bad key here. I actually just did the scorched earth option– I went to the app’s options, went to the Storage section and taped “Clear storage”. That removed all settings, all add ons, everything. Not too big a deal for me, personally, because I don’t use Kodi for much on my phone. But it removed the bad key.

There is also an additional problem with the Android version in that it does not support SSH/SFTP natively. I had to go to the Addons repository and install something to get SSH support.

From there, everything is finally back to normal.

On the off-chance that the Kodi team is reading this– Please give better error messages with network problems.

So, you accidentally dropped a Git stash.

Published / by Andrew

(Note: The information from this post comes from this SO question. I’m just collecting what I found to be the best parts of it.)

(Note 2: I’m using Linux with Bash, and the commands here probably won’t work in Windows. They may work in MacOS, though.)

They say that coding is 99% boredom and 1% pure terror.

I experienced that terror today when I had meant to type git stash pop, but accidentally typed git stash drop.

It turns out that the solution is much easier than you might expect, though finding it was a minor challenge. Upon dropping a stash, Git does not delete the information in the stash– It looks like it just deletes the reference. If you can find the hash, you’ll be able to still restore the stash.

If you only just now dropped the stash and still have the hash listed on your screen, skip ahead to the section involving actually applying the stash (just one command). Otherwise,

Finding the hash that matches the stash.

If, like me, you already tried some commands and no longer have the hash on the screen, the first thing you’ll need to do is figure out which hash is the one from the dropped stash.

The first thing you’ll want to do is echo out a whole bunch of information from git diff. You can echo it into a file (which is what the link above does) or you can pipe it into less, which is what I’ll do here–

git show $( git fsck --no-reflog | awk '/dangling commit/ {print $3}' ) | less

At this point, you’ll need to know something unique (or nearly unique) about your particular stash that you can search, and you’ll need to be able to recognize the changes well enough to identity which stash you need. Something unique like this could be the name of the stash, if you remember it and it’s unique, or the date (in format Fri Jun 28 15:09:54 2019 -0400), or maybe a function name that you wrote. Just anything that you can use to search via less to find the stash you need.

To search with less open, press the / key, and remember that search is case-sensitive. In my case, I had written a function that I distinctly remember writing, radval. (Also note that to move to the next match in a search, hit n, to go to the previous search, hit Shift+n. Scroll with arrow keys, but look up more details for less if it’s a program you plan to use a lot, as I do.)

Once you find some code that you know is in the stash you need (or just found the title or something), scroll up, if needed, to the heading of the stash. This will be four lines starting with “commit”, “Merge”, “Author”, and “Date”. The long alphanumeric string following “commit” is the hash, and it’s what you’ll need.

The hard part is done.

Applying the orphan stash.

This part is easy. If you have the stash hash, and for this example I’ll use 73d46febad13327963c7e0bf95ce2829fe35042d, simply put this in the command prompt–

git stash apply 73d46febad13327963c7e0bf95ce2829fe35042d

It should be done now, so you can breath out.

Some helpful VirtualBox command-line reference.

Published / by Andrew

A little bit of background for this post.

I love VirtualBox. Though, to be more specific, I love having a virtual machine to use for testing. Discovering bridged ip addresses was a revelation for me.

But I also love ssh. I regularly ssh from my laptop in my living room to my desktop in my office because I’m too lazy to get up. So, I want to be able to set up virtual machines via ssh.

Docker is pretty neat, but it’s not really designed for what I want it to do. I tried it, and decided it wasn’t really worth fidgeting with it to set up a bridged IP address, which I’m not 100% sure is possible in a Docker container anyway. I used Vagrant from Hashicorp for awhile, which is pretty good, but I’ve encountered a versioning problem– My version of VirtualBox was too new for my version of Vagrant, so it wouldn’t start. So, why not cut out the middleman and just manage VirtualBox via CLI? The only feature from Vagrant I really care about is starting a VM, anyway.

I thought it would be difficult, but it’s actually surpisingly easy. To do this, you’ll need install VirtualBox extensions from their website. Make sure to get the extentions for the right version of Virtualbox– If you’re installing VBox from a repo, it’s probably in the “VirtualBox older builds” page. Also, I’m not bothering with actually creating virtual machines via CLI from scratch, but instead making a base VM to copy when needed. Creating from scratch, I don’t think, can be done via CLI, though I could be wrong about that. So, I created a base VM while physically at the host machine, and can now clone that machine, and can now create a new test server via CLI.

And these are the commands that I use to manage them, in no particular order. “Ubuntu Server” is just an example name, matching whatever the name is that you’ll see when you open the VirtualBox Manager.

IP addresses:
It doesn’t seem to be possible to get a VM’s ip address using VBoxManage if they’re a bridged adapter. (The listed solutions online don’t work.) The best I can find is “nmap -sP 192.168.1.*”, which will list all IP addresses on the network. If it’s run both before and after the VM is started, the IP should be found.

List all VMs:
VBoxManage list vms

List all running VMs:
VBoxManage list runningvms

Start VM:
VBoxManage startvm "Ubuntu Server" --type headless

Pause VM:
VBoxManage controlvm "Ubuntu Server" pause --type headless

Restart paused VM:
VBoxManage controlvm "Ubuntu Server" resume --type headless

Shut down VM:
VBoxManage controlvm "Ubuntu Server" poweroff --type headless

Change network adaptor to bridged:
VBoxManage modifyvm "Ubuntu Server" --nic1 bridged

Clone vm:
VBoxManage clonevm "Ubuntu Server" --name "New Ubuntu Server" --register

  • If forget the “register” option, the “registervm” command is supposed to fix that, but I can’t get it to work. In that case, best to start over and delete the newly-created folder in ~/VirtualBox\ VMs
  • If forget the “name” option, it will just have the old name plus “clone”.

Delete vm:
VBoxManage unregistervm "Ubuntu Server" --delete

The Lost Art of MIDI Music

Published / by Andrew

This one’s a little different, but code-related, so I thought I’d post about it here.

Without going too deeply into the details of how I became interested in this, not long ago I created a couple of Python 3 classes that act as wrappers for MIDIUtil, which builds MIDI files. Yesterday I went ahead and added those to my GitHub page– Lost Art.

It’s not exactly a professional tool, but it’s been a lot of fun to play with.

(Side note: I’ve also started work on my blogging CMS, now dubbed “Nomad“, but it’s in such early stages that there’s not a lot to talk about.)

New project announcement: Yet Another Blog CMS.

Published / by Andrew

When I made this blog, I remember thinking, “Hey, it’ll just be easiest to make a WordPress droplet.” Technically, I wasn’t wrong, but WordPress seems to get more and more counter-intuitive as time goes by, not to mention bloated.

After some frustrating experiences with WordPress (they seem to think I want to write blog posts in a textbox the size of a postage stamp?), I’ve decided on a new project: A basic CMS. (It’s hard to say exactly when I’ll be working on this, but I’m hoping to start this coming weekend, or maybe in small steps before then.)

This CMS will be using some of my unpopular opinions:

  • Simpler is better– Have few features so they don’t get in the way.
  • A blog CMS should be lightweight and quick.
  • Writing in a markup language is better than “rich text” editors.
  • JavaScript should be kept to a minimum.
  • RSS is better than social media.

I know that very few people will use this, but I will, and I suppose that’s all that matters.

That said, I also do plan on adding some interactivity features so that one blog may interact (minimally) with another.

It’s also worth noting that I also plan on bringing Threadstr back from the dead, this time in PHP. I had originally killed it because I was concerned that SESTA/FOSTA would make it impossible unless I have a large staff, but I have ideas on how to make that easier to manage.

Right now, though, I’m more interested in this blogging platform to replace WordPress.

Reducing size of video files via CLI.

Published / by Andrew

(TL;DR: Conversion script is at the end of this post.)

When it comes to video entertainment, I much prefer local media to streaming services, because I like being able to manage as much as I can using Kodi. (I used to use Kodi with an Android TV box, but now I have a laptop hooked up to my TV, but that’s a story for another day.)

But, using local media presents a problem– Space management. Unless I want to keep getting more and bigger hard drives (and I don’t), I have to figure out how to shrink down the video files without losing so much quality that I render them unwatchable. I also want to make these conversions via CLI so that I can do it over SSH.

I developed a small script that converts all files in a directory (not recursively, just because I didn’t want to do it that way) into smaller files. It uses BASH (with which I have a love/hate relationship) and FFMPEG (with which I have a hate/hate relationship). This changes the constant rate factor, changes the video and audio codecs, and reduces the resolution to 540×360 for widescreen and 540xWhatever for 4:3 (too lazy to math right now). I also toyed with changing the framerate, but reducing that made a pretty small change in filesize with a big change in video quality, though it may be worth it if you have a 60fps video.

As it currently stands, this reduces filesize by half for my current files, but that will depend on the source files.

This uses the version of FFMPEG that comes with Ubuntu 16.04 (I haven’t upgraded to 18.04 yet).

Some useful commands when checking the results are:

du -sh filepath     # Shows amount of space a directory or file
mediainfo filepath  # Shows information about a media file, but requires
                    # installing mediainfo from repos.

I’d kind of like to source where I found all of this information, but the script below is a Frankenstein’s monster using organs scattered all across the web.

You’ll want to update the inputDir and newDir variables whenever you run this, or you can modify to use ${1} and ${2}, which I may do for myself in the near future. You do not need to escape spaces, though. (And be aware that this will take awhile if you’re running it on a large directory.) You’ll also want to put this script one directory above the directory that you want to convert, and make an empty directory for the target directory.

inputDir="Jontron"
newDir="Jontron Reduced" # Make sure this exists in the same directory as inputDir.

cd "${inputDir}"

shopt -s globstar # I have no idea what this does, but the script doesn't work if it's not run.

for file in **/*; do
    ffmpeg -i "$file" -c:v libx264 -crf 24 -b:v 1M -c:a aac -filter:v scale=540:-1 ../"${newDir}"/"${file}"
done;

Handling the equivalent of FIRST() and LAST() in MySQL

Published / by Andrew

Something that’s been an enormous pain for years is that MySQL does not have aggregate functions for first() and last(), like every other SQL-based language. Why they don’t have it, I have no idea. It’s been requested and ignored for over 15 years, and doing a search for this online reveals many people frustrated with the lack of this feature. You’ll find hundreds of “solutions”, most of which either don’t work at all or are so convoluted that they’d be nearly impossible to implement.

To make this as abstract and possible, let’s say we have a table that looks like so:

MariaDB [test_db]> select * from test_table;
+----+----------+---------+
| id | ordering | groupid |
+----+----------+---------+
|  1 |        4 |       1 |
|  2 |        1 |       1 |
|  3 |        2 |       1 |
|  4 |        4 |       2 |
|  5 |        6 |       2 |
|  6 |        1 |       2 |
|  7 |        3 |       2 |
|  8 |        8 |       3 |
|  9 |        1 |       3 |
| 10 |        5 |       3 |
+----+----------+---------+

And we need to find the id of the greatest ordering for each groupid.

If MySQL were sane, we’d be able to do this with a relatively simple query similar to this:

    -- Reminder: This does not work in MySQL!  The correct solution is later in this post.
    SELECT
        last(id)
    FROM
        test_table
    ORDER BY
        ordering
    GROUP BY
        groupid

But, no, that’s not possible.

It’s easy to figure out what the solution should be with this query:

MariaDB [test_db]> SELECT * from test_table order by groupid,ordering;
+----+----------+---------+
| id | ordering | groupid |
+----+----------+---------+
|  2 |        1 |       1 |
|  3 |        2 |       1 |
|  1 |        4 |       1 |
|  6 |        1 |       2 |
|  7 |        3 |       2 |
|  4 |        4 |       2 |
|  5 |        6 |       2 |
|  9 |        1 |       3 |
| 10 |        5 |       3 |
|  8 |        8 |       3 |
+----+----------+---------+
10 rows in set (0.00 sec)

We can look at these results and say, “Hey, it’s obvious that the ids I need are 1, 5, and 8.” But that’s not going to do us much good if we’re doing a more complex query.

I’m pretty sure that at one point I had a moderately elegant solution to this using subqueries and LIMIT 1, but I haven’t been able to figure out what that is.

But, considering how aggravating this has been, I thought I’d post my most recent solution so I’d have a reference and will, hopefully, never have to figure it out all over again.

My solution is a join with a subquery (which I’m not crazy about, but it’s better than most of the other really convoluted solutions that I’ve found online).

    SELECT
        tbl1.id,
        tbl1.groupid
    FROM
        test_table as tbl1 INNER JOIN
        (SELECT groupid,max(ordering) as maxorder FROM test_table GROUP BY groupid) as tbl2 ON
            tbl1.groupid=tbl2.groupid AND 
            tbl1.ordering=tbl2.maxorder
    ;

Which outputs:

+----+---------+
| id | groupid |
+----+---------+
|  1 |       1 |
|  5 |       2 |
|  8 |       3 |
+----+---------+

One thing that really stinks about this is that it’s probably not going to be equally useful in all situations, but this will hopefully be adaptable to different needs. Say you wanted to modify all records except the most recent ones, you could use the above as a subquery in a WHERE clause.

I’m not the only person that uses this solution to this problem, but I sure wish there were a way to do this without a subqueries.

Update: It’s never quite as easy as you’d expect.

It turns out that to actually use the above query, you need to do yet another subquery. I actually don’t really understand why it’s necessary, but I was able (with some searching) to figure out how to do it.

Let’s say we want to actually do something with this query. Let’s add another column with tinyint(1) called “myvalue” and set them all to false.

+----+----------+---------+---------+
| id | ordering | groupid | myvalue |
+----+----------+---------+---------+
|  1 |        4 |       1 |       0 |
|  2 |        1 |       1 |       0 |
|  3 |        2 |       1 |       0 |
|  4 |        4 |       2 |       0 |
|  5 |        6 |       2 |       0 |
|  6 |        1 |       2 |       0 |
|  7 |        3 |       2 |       0 |
|  8 |        8 |       3 |       0 |
|  9 |        1 |       3 |       0 |
| 10 |        5 |       3 |       0 |
+----+----------+---------+---------+

Suppose we want to set all but the most recent value to true for each group id.

So, this does not work:

-- Reminder: This does not work in MySQL!  The correct solution is later in this post.
UPDATE
    test_table
SET
    myvalue = 1
WHERE
    id NOT IN (
        SELECT
            id
        FROM
            test_table as tbl1 INNER JOIN
            (SELECT groupid,max(ordering) as maxorder FROM test_table GROUP BY groupid) as tbl2 ON
                tbl1.groupid=tbl2.groupid AND
                tbl1.ordering=tbl2.maxorder
    )
;

This results in the following error in MariaDB:

Table 'test_table' is specified twice, both as a target for 'UPDATE' and as a
separate source for data

I think I got a different error in the AWS database based off MySQL, but the result is the same– This does not work.

Instead, we have to use yet another subquery. So, this does work:

UPDATE
    test_table
SET
    myvalue = 1
WHERE
    id NOT IN ( -- Starting the added subquery here!
        SELECT
            id
        FROM (
            SELECT
                id
            FROM
                test_table AS tbl1 INNER JOIN
                (SELECT groupid,max(ordering) AS maxorder FROM test_table GROUP BY groupid) AS tbl2 ON
                    tbl1.groupid=tbl2.groupid AND
                    tbl1.ordering=tbl2.maxorder
        ) as selecttbl
    )
;

When we run SELECT * FROM test_table ORDER BY groupid,ordering;, we can see that it was successful.

+----+----------+---------+---------+
| id | ordering | groupid | myvalue |
+----+----------+---------+---------+
|  2 |        1 |       1 |       1 |
|  3 |        2 |       1 |       1 |
|  1 |        4 |       1 |       0 |
|  6 |        1 |       2 |       1 |
|  7 |        3 |       2 |       1 |
|  4 |        4 |       2 |       1 |
|  5 |        6 |       2 |       0 |
|  9 |        1 |       3 |       1 |
| 10 |        5 |       3 |       1 |
|  8 |        8 |       3 |       0 |
+----+----------+---------+---------+

Yikes.