Hello transactions, goodbye Mongo

February 20, 2018 Leave a comment

MongoDB announced that it will include transaction support in the future. Here’s my takeaway. (Disclaimer: I’m a long-time RDBMS/SQL developer who has learned to use NoSQL databases. I understand the use cases for both, but I still prefer SQL-based databases.)


This seems like a bad move in so many ways, I’m surprised MongoDB (the company) did it. Read more…

Categories: Articles

Use existing wheels, please

November 30, 2017 Leave a comment

I often see software projects reimplement code they could have “for free” if they just searched for a library that does what they want. Back when I was a hardware design intern, we called that the “NIH Syndrome”: “Not Invented Here.”

Sometimes, software projects suffer because they build on foundations that don’t match their needs. Again, this requires custom code to do what another foundation would include “for free.”

If you’re stuck in a project with a lot of custom code, then you may just have to maintain it. But, if you’re starting fresh, perhaps this list will provide you some insight. You may be able to make some early choices and save yourself a lot of maintenance cost down the road.

Here we go.

NoSQL databases

NoSQL databases are all the rage. All the cool kids are using them. Even when their data fits really well into a structural model.

There are two serious justifications to use a NoSQL database…but you probably don’t really have them. I’m going to tell you what they are, so that when you hear a programmer say them, you’ll be able to get to the truth.

We have Big Data!

How much data do you need to have to qualify for “Big Data”?

Big Data is thousands of records every day for every person on the planet.

Is this my own home-grown definition? Yep. But if your developers tell you that you have Big Data for less than about 7 trillion records per day, be careful.

Twitter has Big Data, because almost every online user tweets. A lot. (Gross exaggeration, maybe.)

Facebook has Big Data, because people post cat pictures and political posts (that garner a bazillion snarky comments).

Amazon has Big Data, because they sell everything to everyone.

Bottom Line: Unless you collect data from a global population or an army of robots, your company probably doesn’t have Big Data.


We have Unstructured Data!

Unstructured Data is not “any weird bits you can chuck in a bucket.” That’s a useless mess. If you ever try to capture that, congratulations: You’ve become a Trash Dump.

Unstructured Data is structured data from a variety of sources that may change its structure without necessarily notifying you about the change.

Is this my own definition? You bet.

Why do you care?

Structure must be imposed somewhere

Here’s the truth: Your programmers have to impose structure at some point, simply because computers are structured machines. They cannot look at a blob of random bits and say “that’s a fish.” (Artificial Intelligence image recognition notwithstanding.)

Pragmatically, this structure can be imposed in two places:

  1. Where the data comes into your system.
  2. Where the data is used in your system.

Some programmers champion a rosy picture that “you can store anything and then extract what you need when you need it.” This sounds good, but here’s the cost: The software has to extract what it needs every time it needs anything. Consequently, what should be a one-time cost happens hundreds/thousands/millions of times during the life of that data.

It would be better to parse the data when you first see it and be done. In other words, impose structure once and only once.

A dirty little secret

The true reason programmers like “unstructured data” is that it’s a short-term lazy shortcut. Imposing structure requires forethought and design. A lot of programmers just want to dive in and “build stuff.” NoSQL databases give programmers a quick-and-dirty way to get early gains fast.

Notice, I said “early gains.” Later, when your system has a ton of data, you’re going to pay for lack of imposed structure. The cost will be paid in maintenance programming, broken systems and customers leaving.

How do you know if your programmer is keeping this dirty little secret? Ask this question: “How do you plan to handle schema changes?” A thoughtful developer will say something like “I will version the data, so that I can apply a transformation later, if necessary.” If your developer says, “We’ll worry about that later,” be very afraid.

More Custom Code you’ll Probably have to pay for

If you use a NoSQL database, you likely will pay these costs time and time again:

  • Custom query executors. NoSQL databases are really stupid in what you can ask them to tell you, so you have to program custom code to ask a multitude of dumber questions to arrive at the same answer. (By comparison, SQL is very expressive. Also, it has been getting right answers for businesses for about your whole lifetime.)
  • Custom code that mimics transactions. (Transaction support is included in the core of most RDBMS databases like MySQL and Oracle.)
  • Custom triggers in the application. When data changes in one place, sometimes other data should change elsewhere. For example, if you delete a customer, you might also want to delete his contact information stored elsewhere. (Many RDBMSes support triggers in the core, which are a ton faster than application code.)
  • Custom code to enforce uniqueness constraints. (Also a basic feature of RDBMSes.)
  • Indexes. (Ask your developer, “How can you index this data if it’s doesn’t have a structure?” Expect bluster.)
  • Text indexes for searching. (An entire industry exists for text indexing for search. You should just use a free product and not build your own.)
  • Training. All that custom code will have to be learned by some guy you hire. You can’t put “knowledge of all our custom code” in a job position posting; however, you can ask for people who already know “SQL, triggers, indexing, search text indexes.”


Nuf Said for now

OK. I’m gonna stop there. Next time, I’ll talk about how to avoid reinventing your software framework.


PS-I welcome questions from business people. If you’re a developer who loves NoSQL, feel free to post your own article somewhere else.

Categories: Articles

Content Type Advertisement

October 18, 2017 2 comments

How does an HTTP Client know what Representations for a Resource are available from an HTTP Server? The answer is simple: Use the HTTP Link Header, Luke!

Read more…

Categories: Articles, Projects

Webpage Head/Tail bookmarklets

Some sites use AJAX to load content into a DIV. This is cool, but some (all?) browsers don’t automatically scroll when more content expands the DIV past the bottom of the window. It’s pretty poor UX to have to manually scroll a webpage when you want to follow along with the output as it goes.

In order to Not Lose My Mind, I wrote a quick JavaScript bookmarklet that scrolls to the bottom three times per second. Here’s the simple code:

“Tail” bookmarklet

javascript:window.tailer = window.setInterval(function () {
}, 500);

NOTE: I have formatted this with new lines and spaces for this post. You can remove those.

You just create a new bookmark link in your bookmarks toolbar and dump that code in there. Name it something like “Tail” and away you go!

Now, once you’re tailing a webpage, how do you get back to the top? Well, you could just refresh the page. But, maybe you like symmetry and you’d like to do this with a bookmarklet. You’re in luck!

“Head” bookmarklet

javascript:window.clearInterval(tailer || 0); window.scrollTo(0, 0);

This not only cancels the “tail” callback, it also scrolls to the top. Yay!

Categories: Projects

workhere: Quick environment setup for Linux command-lines, Python and Node

April 13, 2017 Leave a comment

Python’s virtualenv-wrapper is amazing. Use it whenever possible.

Still, there are times when you’re on a system over which you don’t control all the things. For example, I am working in a project that uses make in some odd ways and puts binaries in buried directories. It does, however, use Python’s virtualenv and the Node.js Package Manager yarn.

Per standard practice, the Python virtualenv is named venv. Consequently, the binaries for Python’s virtualenv get buried under venv/bin.

Also per standard practice, the Node.js binaries get buried under node_modules/.bin.

I wanted one command to activate my Python virtualenv and add all those binaries to the path. Here’s what I came up with:

alias workhere='export PATH=venv/bin:node_modules/.bin:$PATH && \
                source venv/bin/activate'

I put that in my .bashrc file, so that it is available to any shell I open.

That way, I simply have to cd /path/to/project and then type workhere.

Problem solved.

Categories: Projects

Why Git

December 15, 2016 Leave a comment

A coworker asked me why my team uses Git. He was interested also in a comparison with Subversion, which his team uses. Here’s my response.

Read more…

Categories: Projects

SSHFS: Hacking Linux files from a Mac

December 14, 2016 Leave a comment

I’m a Linux fanboy. Used to love Microsoft, now…not so much. Used to hate Mac, now…well, I love their hardware, and OSX sure has a pretty face.

Anyway, these days, I spend a ton of time with OSX, even though I prefer developing on a Linux Desktop in a VM. But there are times it would be nice to spin up a new VM and mess with its files using some tools installed on the OSX host.

Today I found out how.

Install SSHFS

We’re just going to use the same trick that is available to Linux users, but we have to jump through some hoops to get it on Mac.

Helio Tejedor’s article has a ton of detail about installation, but I found the most relevant bit is this:


The easy way to install SSHFS is navigate to http://osxfuse.github.io and download two files:

  • OSXFUSE 2.7.3
  • SSHFS 2.5.0

Do that, but just pick the latest two versions listing right on their homepage.

Manually test it out

Digital Ocean’s article is, like most Digital Ocean articles, excellent documentation. The section “Mounting the Remote File System” has a detailed explanation about how to mount remote drives using SSHFS. I just needed these two lines for my test VM (at IP address

sudo mkdir /mnt/
sudo sshfs -o allow_other,defer_permissions user@ /mnt/

The observant reader will notice that:

  • The user is the name of the SSH account that can connect to the box.
  • I am “cheating” by using the IP address for the folder name…
  • …which may have unexpected results when my VM gets a new DHCP lease.

I’m good with all that. It works!

Script it

Mounting, unmounting, making directories…boring. I scripted this as mountvm, and now I have a simple one-liner to mount/unmount/etc. that mimics ssh.

Here’s the magic sauce, for your discerning palette.

sudo umount /mnt/$host
sudo mkdir -p /mnt/$host
sudo sshfs -o allow_other,defer_permissions $user@$host:/ /mnt/$host

Merry Christmas!

Categories: Projects