Use of simple number theory in code.

Published / by Andrew

In a personal project recently, I came to a problem that ended up being solved via number theory. I’m a little rusty, but this isn’t actually the first time I’ve used number theory to solve a coding problem. I thought it ended up interesting, so I decided to write a short post about it.

The issue is that I wanted a way to represent dates as numbers. The typical unix timestamp was quite a bit more specific than I wanted for my purposes, and I wanted to avoid pitfalls associated with DST and timezones, so struct tm was going to be a bit of a pain to manage. I wanted to handle dates. Only this and nothing more.

Furthermore, I didn’t want to handle something like “Number of days since 1900” or something like that, because handling leap years and different lengths of months is a bit of a pain.

So, this is the process that went through my mind.

The most intuitive way to represent a single date as a number is with multiples of 10. For example, April 13, 2017 would be 20170413. All dates for my use case would be after the year 2000, so I could just chop off those first two digits, making it 170413.

But that’s decimal, and computers think in binary, so I shouldn’t bother setting it up like this, exactly. Instead, I should use base 2. Meaning that rather than \(17\times10^{4} + 4\times10^{2} + 13\), I should use \(17\times2^{10} + 4\times2^{5} + 13\), which would give me, in binary, 100010010001101, which can be decomposed to 17, 4, and 13 in binary with simple bit shifting. (Assuming I did everything right there. I’m not double-checking it, but it’s basically the same idea as in decimal, and it should be easy to do with bit shifting.)

But there’s still a lot of wasted space there. You can still have a lot of invalid dates like the 23rd month of the year.

So, I decided to have a simple rollover system. There are, at most, 31 days in a month, so I could just have that the 32nd day means we know we’re on the next month. And we may as well start at zero, so that means that 0 = first day of the first month (Jan 1), 30 = 31st day of the month (Jan 31), and 31 = first day of next month (Feb 1). So, 35 would mean the fourth day of the second month of the year (Feb 4), and 203 would be the July 18th because it’s crossed the 31 marker 6 times and makes it to 17 more days. It’s easy to find a particular day because the whole thing is mod 31. (It’s easy to get lost in the offsets here, though.)

This still has some invalid dates like April 31st, but it’s also over a much smaller codomain than either base ten or base two.

Then we also have twelve months in a year, so we can use the same concept for finding years. Once we cross 12 months (372 days), we’re in the next year.

With this, we can convert a date into an integer with this equation: \(i = d + 31m + 372y\), where \(d\) is the day of the month (0-31), \(m\) is the month (0-11), and \(y\) is the number of years since 2001. The 31 converts the months to days and the 372 converts the years to days (as \(31\times12\)). We have a one-to-one relationship between a date and the integer resulting from this function.

Resulting in a very easy function:

/**
 * Convert a Date object to an integer.
 *
 * Integer is not human-readable as a date, but it orders the same way.
 *
 * @param   dateObj
 */
int toInt(Date dateObj)
{
    return dateObj.day + DAYMOD * dateObj.month
        + (MONTHMOD * DAYMOD) * dateObj.year;
    // Can actually simplify this a little bit.
}

With the Date struct being defined as:

typedef struct date_obj {
    /** @var Number of complete years since the year 2001. */
    int year;

    /** @var Month number, 0 - 11. */
    int month;

    /** @var Day of month, 0 - 30. */
    int day;
} Date;

But the other direction is slightly more complicated, and that’s where the number theory comes in.

Getting the day of the month is easy. \(i \equiv d + 31m + 372y (\text{mod } 31)\), and the second and third terms on the right are zero mod 31, so if we have an integer x we can easily get back the number of days with x % 31.

Getting back the month is a little more difficult. But not that difficult. Initially I tried working with the \(i \equiv d + 31m + 372y (\text{mod } 31)\), but that was dumb, because we’re using the wrong base there. Plus, there is no inverse of 31 modulo 372, so the equation is unsolvable.

It’s a little easier to start with rearranging the equation.

\[
\frac{i – d}{31} = m + 12y
\]

And we know that \(i – d\) is divisible by 31 because of how we defined \(d\). (My professors would absolutely want a better description than that in a proof, but it’s a blog post while I’m sitting in a coffee shop, so this will work for now.)

We can now use basic number theory to find \(m\) the same way that we found \(d\).

\[\frac{i-d}{31} \equiv m + 12y (\text{mod }12)\] \[\frac{i-d}{31} \equiv m (\text{mod }12)\]

Thus, (x - d)/31 % 12 is our number of months.

And finding the number of years once we have the number of days and months is trivial.

With these three parts, we have a function that’s the inverse of toInt from before:

/**
 * Convert an integer back to a Date object.
 *
 * @param   dateInt
 */
Date toDate(int dateInt)
{
    Date dateObj = {};

    dateObj.day = dateInt % DAYMOD;
    dateObj.month = (dateInt - dateObj.day) / DAYMOD % MONTHMOD;
    dateObj.year = (dateInt - dateObj.day - DAYMOD * dateObj.month)
        / (MONTHMOD * DAYMOD);

    return dateObj;
}

And that’s that.

An idea for anonymous blogging

Published / by Andrew

I’m trying to think of a way to have a completely anonymous “blog” of sorts that’s not in the dark web. I’m sure there are already many solutions to this, but my basic idea is using the copypasta format with encryption. The text itself contains what’s needed to maintain the anonymity and it can be copy-pasted anywhere.

And it’s certainly possible that this exact idea or a better version of it is already in use.

My current idea is chaining hashed messages. To be clear, I am not an expert on encryption– This goes by my memory of Cryptology in grad school.

As some background, a hashing algorithm is used to store passwords. (Usually salting and hashing, but we’re only going to be focusing on hashing here.)

A hash is what’s called “one way” encryption. That term can technically mean many things, but in our case we specifically mean that anything can be encrypted, and nothing can be decrypted. So, what’s the point, then? Validation. I can hash something you give me to make sure it matches a value that we already have.

Imagine Rumplestiltskin. He won’t tell me his name, but I can guess as many times as I want. As soon as I get it right, he’ll tell me that I got it right.

The hash value and algorithm are Rumplestiltskin. I can’t get the message out of the hashed value or the algorithm, but as soon as I guess right, they’ll tell me it’s right

It’s validation, and it’s why we use it for passwords. (In practice, it’s more complicated than that, but that’s the idea.) It tells me “This password that this guy gave me matches the hash I have, so it must be right, even though I didn’t know what the password was.”

So, my goal here is to make messages that can spread via copy-paste and we can validate that they all came from the same source. Using the copypasta format (if you can call that a format), anybody can copy-paste it to social media, blog it, email it, print it, whatever. Using a hash algorithm, anybody can validate that all messages came from one source, while the source remains anonymous.

With all the background out of the way, here’s the basic idea: Every message is chained to the previous message using a hash. Essentially, every message has the following format:

{Message content}
{Password that matches previous messages hash.}
{Hash of next password, which will be secret until the next message is released.}

Of course the first message establishes the chain, so it would skip the password matching previous hash, since there is no previous hash.

Every individual message says “You’ll know the next message that comes from me because it’ll have the password that matches this hash.” Nobody will, in theory, be able to guess the password, but everyone will be able to validate it when it comes out.

There is one problem, though. What’s to stop someone else from copying the password and hash, or just the password, for that matter, and falsely claiming the message is linked to that password? Then we’d have two messages that both claim to be from the same source.

Maybe some blockchainy thing could solve that problem, but the only solution that comes to my mind is using a single source of truth from a trusted third-party source. Say, all messages will always be first posted to Pastebin, which I use as an example because you can post to it without an account and via the Tor network. If all messages will always be posted first to pastebin, then the first message to appear with the password will necessarily be the one from the correct source. Failing that, old-fashioned internet sleuthing should be able to find the first message posted with the password.

It’s certainly possible that there are holes in this idea or that it’s already super common and I was just oblivious. (There was one time when I had an idea for a new encryption method and it turned out to be what HBO used in the 80s to encrypt their stream, or something like that.)

I will also almost certainly never use this, myself. I’m not interested in being the type of person that would need something like this. I am no Publius.

But I enjoy thinking through this kind of thing, so I thought I’d share it.

Programming Should Be For Everybody: An Example

Published / by Andrew

Regular Expressions

(This is a copy-paste of the README.md from allquestionsinthebible, where the scripts discussed herein can be found.)

I’ve long been a proponent (admittedly, not a particularly vocal one) that everyone should learn to code at least a little. I don’t mean in the sense of the “You lost your job, so learn to code” buffoonery, just because coding professionally isn’t for everyone. That said, even if you have zero intention of using code professionally, it can be very handy to learn, say, a little bit of Python to get a job done.

What I have to present to you today is an example of how code can be a useful tool in your belt. This is just one of many instances of my using Python to demonstrate achieving a goal that would be difficult to do manually, whether you’re a professional developer or not. (And I don’t actually write Python professionally– My professional work is PHP, which is very different.)

There’s a background to this that’s relevant here. One of my Sunday School teachers made a passing comment at the Christmas party that he’d like to see a list of every question asked in the Bible. I thought to myself, “That should be easy, since every question ends with a question mark”, and I said I could probably do that with a Python script. He said he’d pay me if I did it, and I said it wasn’t necessary since it would be only about five lines of Python.

Well, it turned out not to be five lines of Python, but it was a pretty fun project in any case.

More importantly, though, it’s a great example of something that I could do because I knew how to write code, and why I think it would be beneficial for anybody to learn to code.

Essentially, code does nothing more than automate a monotonous task. So, something like finding every question in the Bible could be done by taking out a Bible and a sheet of paper, and painstakingly finding every single question, as a months-long task that in ages past would be performed by monks with nothing better to do (or by Strong, whoever that is). Or you could write a Python script that does all of it for you in seconds.

So, I took on the task during the week between Christmas and New Years, and this is how I did it. (I didn’t spend the whole week doing it, ftr.)

First off, I needed to use the World English Bible translation (WEB). The reason for this is that it is a modern language translation that is public domain. I could hypothetically do some webscraping and get the entirety of the ESV, but that would be legally sketchy at best.

I was disappointed to find that the current organization of the WEB in html is not terribly programmer-friendly. It’s far from horrendous, but I was expecting each verse to be in its own span or div, with an appropriate class name. Instead, the chapter and verse numbers themselves are in these divs/spans, and interspersed throughout the text. It makes perfect sense if you’re reading it in a browser, but makes things slightly difficult from a programmatic perspective. So, the first thing I did was reorganize everything– All of the WEB (just the Protestant canon) as an XML file. This is the first script, reorganizeasxml.py, and was the bulk of the work done.

To do this, I used the archive of the html version of the WEB. To run the first script to generate the XML, unzip the contents of that archive into a folder, drop reorganizeasxml.py into the directory, and run it. It will create complete.xml— One giant XML file containing the entire WEB translation, in a programmer-friendly XML format.

It’s about five MB in size, so I do not recommend trying to open it in Notepad or any other text editor that’s not designed for large files (I used less to view the contents).

I could have made this easier, theoretically, by using the plaintext version, which seems to have one verse per line. I decided against that, though, because I’m not 100% sure that it’s always the case that it’s one verse per line– It never explicitly says so– and since the file names seem to indicate to me that these files exist for the purpose of reading out loud, so I expect they’re not necessarily going to be particularly careful about line breaks. I think, then, that it would be better to put in the extra effort in the clearer source material. (And, honestly, a big part of my motivation here is that doing it the hard way is a lot more fun.)

reorganizeasxml.py loops through every file, uses BeautifulSoup to identify where every verse begins and ends, and dumps every verse into the output file, in (Protestant) canonical order. This is mostly pretty easy– The only thing that was mildly difficult is getting the end of the final verse, because the text content of each chapter is not organized into one big div, so I had to find where the site navigation began instead. Apart from that, it’s pretty straightforward.

After running reorganizeasxml.py, we have complete.xml. We can now run findallquestions.py, which is significantly simpler overall, and doesn’t even define any functions. It doesn’t need to because of how reorganizeasxml.py built the xml structure.

It starts out by finding every vs element that contains a question mark. Then it creates a csv file, creates a header line, and then dumps all of the information into that csv file, one line at a time. Bam. Done.

What this script does not do is identify the one asking or the one being asked. I think I’d have to use some AI/ML for that, and I’ve never done anything with that. For that reason, those columns are blank, to be filled in manually. (I don’t actually expect that that would ever be 100% completed.)

Well, there you go. There’s a case study in a one-off use of Python code to accomplish a task that would be a pain to do manually. It’s just one example of how coding can be useful in day-to-day things, whether you’re a developer or not.

Another example is something that I made using matplotlib when the Covid panic started. The only historical data Oklahoma gave us at the time was the the cumulative case count, but I had a friend that wanted a graph of the daily case count. Converting the raw data from cumulative to daily was easy, so I made a graph in matplotlib, then made a script to scrape the website and update the chart on Github. I then made a cron job on my laptop to update it daily. It was a fun project, but only lasted about a week before the Oklahoma health dept changed the format of the website so it stopped working. At that point, they started giving us the chart exactly as we wanted it anyway, so there wasn’t much point in fixing it.

The Bible questions scripts probably seem overwhelming to someone new to coding, but I can assure you that, given time, writing code becomes as natural as breathing. What’s hard today is easy tomorrow (which I keep telling myself while learning C++). I’ve learned that lesson repeatedly a million times.

Money or Respect.

Published / by Andrew

When this strip came out, I would have been almost ten years old. I’m not sure how old I was when I actually read it for the first time, but I probably wasn’t much older than that.

This strip offers two choices– Money or respect, but not both. Of course, the punchline is that Dilbert had neither, but this strip stuck with me as a kid. It’s something that I think about often to this day.

This was a no-brainer when I first read it. Obviously I’d rather have a high-paying job where I have little to no respect. Why wouldn’t I? I don’t care if I’m not respected, but money can go a long way.

As the saying goes, be careful what you wish for, because you just might get it.

As an adult, I’ve had jobs in which I was highly-paid while having to endure frequent abuse. (I won’t go into detail publicly.) Now I understand the impact it has. I’ve switched from a high-paying job to a lower-paying job before, and I did so with little hesitation because I was miserable in the high-paying job. I would have no problem doing it again.

There’s an old saying that I’m sure anyone reading this would have heard before– “Money isn’t everything”. I would add to that “Wow, is it ever not.” If you’re unhappy in your high-paying job, you’re probably better off switching to another job. I’ve reached a point where I don’t really care much about how much money I make as long as I’m not destitute.

I am very much aware that this opinion is a luxury that many people cannot afford. I’ve never actually been forced to chose between disrespect and poverty as Phil presented the decision to Dilbert. I know that I’m greatly oversimplifying an extraordinarily difficult topic. Many people would object “What about this? What about that?” There are certainly going to be cases in which it’s necessary to endure painful situations, but that’s really outside the scope of what I’m saying here. What I am saying is simply that if you have the choice (and that’s really the key), choose a good life over wealth.

(As a sidenote, productivity is not merely a means to wealth, and if you read this post as if it were a statement from Diogenes as an excuse to do nothing, I would submit that if you are not industrious, you will die miserable. Productivity is a necessary component of life.  I know this because of how miserable I was during long breaks in college and grad school.)

I’m not strictly speaking on happiness, either, but on all of the greater things in life.  You can fill in the blank on what you would believe those greater things to be.  Wealth is not going to provide either happiness or a better life in and of itself. Wealth is a minor component in life. It is a means to an end and not an end in itself. If you are not progressing towards the end of a life that is whole, then you need to rebalance your life.

This is where it really matters– If you’re unable to seek the greater things in life, the things that make life worth living, because of the damage your job is doing to you, you should probably leave. For what does it profit a man to gain the whole world and forfeit his soul? (Mark 8:36).

Troubleshooting Kodi — “Couldn’t connect to network server.”

Published / by Andrew

I just had an aggravating experience with using Kodi over my home network. I thought I’d go ahead and post it here in case anyone else is having the same problem, and also to cement it into my brain in case I have this problem again.

I have my music hosted on my desktop computer (which is increasingly more and more of a server machine, as I only use it via SSH from my laptop in my living room), and I set up Kodi to connect to the directory through SSH. From two different devices, I’m getting the same message: “Couldn’t connect to network server.”

These devices are a laptop and an Android phone. The cause of the problem turned out to be the same in both (at least, I think), but Kodi has a horrible problem in not explaining why it couldn’t connect to the network server, making debugging really irritating. The problem is that I had reinstalled the OS on the server machine, so the SSH verification keys were different. SSH was rejecting the connect as a security measure.

While the problem was the same for both devices, the solution was really different for each.

For the laptop, I only removed the offending key. It wasn’t a big deal– I just tried to SSH to my desktop from the laptop, and it gave me the instructions. Then I needed to restart Kodi, and everything worked fine.

My phone is a different story. The convenient tools for SSH just aren’t available here, and I don’t have any idea how to remove the bad key here. I actually just did the scorched earth option– I went to the app’s options, went to the Storage section and taped “Clear storage”. That removed all settings, all add ons, everything. Not too big a deal for me, personally, because I don’t use Kodi for much on my phone. But it removed the bad key.

There is also an additional problem with the Android version in that it does not support SSH/SFTP natively. I had to go to the Addons repository and install something to get SSH support.

From there, everything is finally back to normal.

On the off-chance that the Kodi team is reading this– Please give better error messages with network problems.

So, you accidentally dropped a Git stash.

Published / by Andrew

(Note: The information from this post comes from this SO question. I’m just collecting what I found to be the best parts of it.)

(Note 2: I’m using Linux with Bash, and the commands here probably won’t work in Windows. They may work in MacOS, though.)

They say that coding is 99% boredom and 1% pure terror.

I experienced that terror today when I had meant to type git stash pop, but accidentally typed git stash drop.

It turns out that the solution is much easier than you might expect, though finding it was a minor challenge. Upon dropping a stash, Git does not delete the information in the stash– It looks like it just deletes the reference. If you can find the hash, you’ll be able to still restore the stash.

If you only just now dropped the stash and still have the hash listed on your screen, skip ahead to the section involving actually applying the stash (just one command). Otherwise,

Finding the hash that matches the stash.

If, like me, you already tried some commands and no longer have the hash on the screen, the first thing you’ll need to do is figure out which hash is the one from the dropped stash.

The first thing you’ll want to do is echo out a whole bunch of information from git diff. You can echo it into a file (which is what the link above does) or you can pipe it into less, which is what I’ll do here–

git show $( git fsck --no-reflog | awk '/dangling commit/ {print $3}' ) | less

At this point, you’ll need to know something unique (or nearly unique) about your particular stash that you can search, and you’ll need to be able to recognize the changes well enough to identity which stash you need. Something unique like this could be the name of the stash, if you remember it and it’s unique, or the date (in format Fri Jun 28 15:09:54 2019 -0400), or maybe a function name that you wrote. Just anything that you can use to search via less to find the stash you need.

To search with less open, press the / key, and remember that search is case-sensitive. In my case, I had written a function that I distinctly remember writing, radval. (Also note that to move to the next match in a search, hit n, to go to the previous search, hit Shift+n. Scroll with arrow keys, but look up more details for less if it’s a program you plan to use a lot, as I do.)

Once you find some code that you know is in the stash you need (or just found the title or something), scroll up, if needed, to the heading of the stash. This will be four lines starting with “commit”, “Merge”, “Author”, and “Date”. The long alphanumeric string following “commit” is the hash, and it’s what you’ll need.

The hard part is done.

Applying the orphan stash.

This part is easy. If you have the stash hash, and for this example I’ll use 73d46febad13327963c7e0bf95ce2829fe35042d, simply put this in the command prompt–

git stash apply 73d46febad13327963c7e0bf95ce2829fe35042d

It should be done now, so you can breath out.

Some helpful VirtualBox command-line reference.

Published / by Andrew

A little bit of background for this post.

I love VirtualBox. Though, to be more specific, I love having a virtual machine to use for testing. Discovering bridged ip addresses was a revelation for me.

But I also love ssh. I regularly ssh from my laptop in my living room to my desktop in my office because I’m too lazy to get up. So, I want to be able to set up virtual machines via ssh.

Docker is pretty neat, but it’s not really designed for what I want it to do. I tried it, and decided it wasn’t really worth fidgeting with it to set up a bridged IP address, which I’m not 100% sure is possible in a Docker container anyway. I used Vagrant from Hashicorp for awhile, which is pretty good, but I’ve encountered a versioning problem– My version of VirtualBox was too new for my version of Vagrant, so it wouldn’t start. So, why not cut out the middleman and just manage VirtualBox via CLI? The only feature from Vagrant I really care about is starting a VM, anyway.

I thought it would be difficult, but it’s actually surpisingly easy. To do this, you’ll need install VirtualBox extensions from their website. Make sure to get the extentions for the right version of Virtualbox– If you’re installing VBox from a repo, it’s probably in the “VirtualBox older builds” page. Also, I’m not bothering with actually creating virtual machines via CLI from scratch, but instead making a base VM to copy when needed. Creating from scratch, I don’t think, can be done via CLI, though I could be wrong about that. So, I created a base VM while physically at the host machine, and can now clone that machine, and can now create a new test server via CLI.

And these are the commands that I use to manage them, in no particular order. “Ubuntu Server” is just an example name, matching whatever the name is that you’ll see when you open the VirtualBox Manager.

IP addresses:
It doesn’t seem to be possible to get a VM’s ip address using VBoxManage if they’re a bridged adapter. (The listed solutions online don’t work.) The best I can find is “nmap -sP 192.168.1.*”, which will list all IP addresses on the network. If it’s run both before and after the VM is started, the IP should be found.

List all VMs:
VBoxManage list vms

List all running VMs:
VBoxManage list runningvms

Start VM:
VBoxManage startvm "Ubuntu Server" --type headless

Pause VM:
VBoxManage controlvm "Ubuntu Server" pause --type headless

Restart paused VM:
VBoxManage controlvm "Ubuntu Server" resume --type headless

Shut down VM:
VBoxManage controlvm "Ubuntu Server" poweroff --type headless

Change network adaptor to bridged:
VBoxManage modifyvm "Ubuntu Server" --nic1 bridged

Clone vm:
VBoxManage clonevm "Ubuntu Server" --name "New Ubuntu Server" --register

  • If forget the “register” option, the “registervm” command is supposed to fix that, but I can’t get it to work. In that case, best to start over and delete the newly-created folder in ~/VirtualBox\ VMs
  • If forget the “name” option, it will just have the old name plus “clone”.

Delete vm:
VBoxManage unregistervm "Ubuntu Server" --delete

The Lost Art of MIDI Music

Published / by Andrew

This one’s a little different, but code-related, so I thought I’d post about it here.

Without going too deeply into the details of how I became interested in this, not long ago I created a couple of Python 3 classes that act as wrappers for MIDIUtil, which builds MIDI files. Yesterday I went ahead and added those to my GitHub page– Lost Art.

It’s not exactly a professional tool, but it’s been a lot of fun to play with.

(Side note: I’ve also started work on my blogging CMS, now dubbed “Nomad“, but it’s in such early stages that there’s not a lot to talk about.)

New project announcement: Yet Another Blog CMS.

Published / by Andrew

When I made this blog, I remember thinking, “Hey, it’ll just be easiest to make a WordPress droplet.” Technically, I wasn’t wrong, but WordPress seems to get more and more counter-intuitive as time goes by, not to mention bloated.

After some frustrating experiences with WordPress (they seem to think I want to write blog posts in a textbox the size of a postage stamp?), I’ve decided on a new project: A basic CMS. (It’s hard to say exactly when I’ll be working on this, but I’m hoping to start this coming weekend, or maybe in small steps before then.)

This CMS will be using some of my unpopular opinions:

  • Simpler is better– Have few features so they don’t get in the way.
  • A blog CMS should be lightweight and quick.
  • Writing in a markup language is better than “rich text” editors.
  • JavaScript should be kept to a minimum.
  • RSS is better than social media.

I know that very few people will use this, but I will, and I suppose that’s all that matters.

That said, I also do plan on adding some interactivity features so that one blog may interact (minimally) with another.

It’s also worth noting that I also plan on bringing Threadstr back from the dead, this time in PHP. I had originally killed it because I was concerned that SESTA/FOSTA would make it impossible unless I have a large staff, but I have ideas on how to make that easier to manage.

Right now, though, I’m more interested in this blogging platform to replace WordPress.