Technical Website Due Diligence

This following guest post is written by Mauro Chojrin. Mauro works as a consultant and technical trainer, and is the CEO of Leeway, an IT consulting company.


What exactly are you buying when you buy a website?

When you buy a website you are in fact taking ownership of a few things:

  • A domain name (Or perhaps several)
  • A brand
  • Some pieces of content (Photos, texts, etc…)
  • A user/visitor base
  • A piece of software powering this all

There’s a bunch of verifications that can be done from the outside (Like looking at the WhoIs history, Establishment Claims and other concerns addressed in this post), but there’s a lot to be said about the last piece of the puzzle: the software powering this all.

If you can’t run some verifications on the software, it’s very hard to answer questions such as:

  • How costly will it be to update this website to fit my plans?
  • Will it even be possible to update?
  • How long will it take?
  • What kind of help will I need to pull this through?

It’s like buying a used car without even pulling up the hood and looking inside of it. Would you make such a purchase?

What’s to be checked?

First of all, you need to know the different ways in which a website can be built. The list is, of course, huge, but I’ll keep it simple to make my point. I’d say there are three classes of websites you can find out there:

  • Based on someone’s platform (e.g. Shopify stores)
  • Based on an established standard (e.g. WordPress sites)
  • Completely built from scratch

In the first case, there’s really nothing for you to do. Your technical risk here is pretty low (together with your action margin but that’s a different story). The only advice I’d give you here is to watch out for not being too comfortable playing in someone else’s backyard.

The second case presents a low to medium technical risk. What happens here is that there’s a possibility that the original tool has been customized in some weird way that could make it hard to improve upon (Not very common but can definitely happen).

One way to check for that would be to get a fresh copy of the version the site says it’s running and compare the files with the actual site code… it can be a little messy, but it will certainly save you from some trouble.

Basically, if you can check that the used version is up to date you don’t have much to worry about.

If this is not the case though… things can get pretty messed up.

Ideally, you want to have your site run the latest version available. In order to do that you’ll have to run through the migrations that are in place (At least to get to a version that can be supported by the hosting you’re moving the site into).

Now the third scenario is where the fun is (at least for me :)).

When you find yourself in position to buy a site like this (SaaS usually fall in this category but are by no means the only website type that can), you definitely want to pay special attention to a few traits. I’ll list them in descending order of importance and then give a few pointers about why I ordered them this way:

  1. What’s the basic technology behind it? (What programming language is the code written in)
    1. Is it using any kind of framework?
  2. How well is the code documented?
  3. What technology is being used for data storage?
  4. What dependencies with third-party services are in place?
    1. How well are those dependencies isolated from the core?
  5. How complex is the overall infrastructure?

Let’s dig a little deeper into each one.

What’s the basic technology behind the platform?

This is probably the most important technical check you can run. The same piece of software can be built using a big array of different programming languages. You probably heard of PHP, Java, C# or Python, but that’s just the tip of a very big iceberg. There are almost limitless possibilities to this. Of course, as in every other industry, and even though there’s controversy around which language is THE most used, the consensus is around PHP, Java, JavaScript, Python, .Net and Ruby being at the top of that list, so… if the platform is built upon any of these technologies there’s less of a chance finding developers to support its growth will become a major bottleneck.

How do you tell which language the site is built upon?

There’s really little you can do from the outside (Meaning, without reading the actual code). In the case of PHP (My personal field of expertise), one simple way to determine that is by looking at the URL. If you find something like http://www.domain.com/page.php, then there’s no doubt that PHP is involved (It may not be the only technology used, but it’s a start). Unluckily, the URL not showing .php doesn’t automatically mean the site isn’t powered by it (As a matter of fact, it’s a very good security practice to hide this kind of information from the outside).

So… if you really want to make sure, you need to get your hands on the actual code.

Once you got access to it it’s really easy to tell what language is being used, if it’s PHP the file names will end with .php, if it’s Python they’ll end with .py, if Ruby it will be .rb if you see filenames ending with .java you’re looking at Java code and so on.

This should be your first red flag. If you can’t map the source code to any of this popular languages, you may find it hard to evolve the software with any team other than the original.

Then there’s the question of version. Bear in mind that a programming language is also a piece of software (Technicalities aside) so, as any other software piece, it evolves over time, meaning new tools are made available to programmers in order to make their job more efficient, but, at the same time, some tools are removed.

You want to make sure the system is ideally built using the last stable version of the language, or at least not a really old one.

For instance, at the time of this writing PHP’s recommended version is 7.1 (7.2 is about to come out).

Many websites are still using 5.6 (Two versions behind) which is not terrible, but certainly is less than optimal.

Now the real trouble comes when you find a website built using PHP 4. This version is so old that even language developers are not supporting it anymore (Meaning it’s vulnerable to all sort of threats) and quite frankly, it’s hard to get developers on board to use such an outdated tool.

I’m not currently using any other language, so I’m not so updated on the look of the landscape for them, but I know for sure similar things must happen.

And now the big question… how can you tell which language version is being used?

Well… this is a tricky one. What you’re looking for here are pieces of code written using old methodologies (Especially those that have been removed from the language moving forward, Php.net has many articles on how to upgrade a piece of software written in an older version that can give you hints) or, better yet, uses of language constructs that were only made available in a particular language version.

Is it using any kind of framework?

A framework is a very handy tool for developers of any language. It’s a toolbox that sits on top of the language and provides out-of-box basic functionality.

So the first thing you want to check (after you know which language the system is built upon) is whether the code is written using a framework or not.

This is usually easy to do. Basically by looking at the main file. It usually contains some reference to the framework being used (If there’s one in place).

The risk of the application not using a framework (or using a custom one for that matters) is that there’s probably little to no documentation available (meaning the learning curve will be steep for new developers joining the team) and it’s very likely that the code base quality is not too high (Standard frameworks are developed and maintained by large developer communities which usually means fewer bugs since there are significantly more eyes on the code).

Within the PHP scenario, there are plenty of candidates, but the strongest are:

  • Laravel
  • CodeIgniter
  • Symfony
  • ZendFramework
  • Yii

In order to actually map the framework being used you need to read through the code (Probably a couple of files will do) looking for a reference to a framework. If you find it, it’s a good idea to check what the community around the framework looks like (You definitely don’t want to be backed by a ghost community like it happened to me when I had to maintain an application built using Limonade PHP, notice how the latest update to the project was done over two years ago… bad sign).

How well is the code documented?

Code documentation is often the most undervalued asset in software development. Very few developers want to be involved in it’s production but, at the same time, they wish the code was better documented when they have to return to it…

But what exactly is it? There’s basically two kinds of documentation:

  1. End-user (Like manuals, FAQs, etc…)
  2. Technical

The one that we’re looking for here is the latter. What you want to have is a set of documents that can guide a new developer through the implementation complexities and at the same time be a reference for existing team members for how to fix common problems quickly.

It doesn’t really matter the length or the format of the documentation (I personally prefer something like a Wiki system where every developer can make contributions, but that’s totally personal).

What’s important is that the documentation is accurate and updated.

How do you check that? Again, this is more art than science, but you can get a very quick feeling of how good (or actually bad) the documentation is by simply reading through a couple of random documents and trying to follow them.

You will immediately notice whether the writing is clear and if it’s up to date with the actual code.

What technology is used for data storage?

Nowadays, almost every website stores some kind of data (Mostly information about their users or their content).

There are many different tools that can be used to accomplish that, and the selection is neither trivial nor simple.

The basic principle is the same as before: when in doubt, fail in favor of the most popular technology.

I’ve been involved in several projects that used the cool new technology where the system requirements clearly dictated otherwise and getting out of that situation was nothing like a walk in the park.

The problem with brand new technologies (Sometimes referred to as bleeding-edge) is that they’re usually poorly documented and, because they’re so new, not too many people have experience with them, so getting help when you’re stuck can be tricky.

This off course depends on the expectations you have for your site. It’s ok to try new things when the stakes are low (That’s the only way we could ever move forward, right?), but if you’re looking at the main software for your business you want to be a little more conservative.

What dependencies with third party services are in place?

Fairly complex websites usually depend on third-party services to perform certain tasks. The most common is email sending and tracking.

There are some excellent vendors out there that make it extremely easy for a small development team to leverage these high-quality services and build them into their own applications.

But the flip side of this is that by using these services you’re actually creating a dependency on a piece of software that you don’t control.

There’s nothing inherently wrong with that… if you know how to protect yourself from updates that could be game changers for your application, which brings me to the next question…

How well are those dependencies isolated from the code?

This is probably the most technical of the checks I’ve been talking about so far, and in order to assess this, you need a mid to high level of technical expertise in the particular tools upon which the website is built.

What you want to make sure is that no external service is used directly by the website.

How complex is the overall infrastructure?

A regular website has a fairly simple infrastructure:

  • A web server
  • A database server
  • An application server

Which, depending on traffic and concurrency can well be fit into one single computer (physical or virtual) and thus be easy to keep up with.

As the application complexity and usage grows, infrastructure tends to expand as well (There might be a need for extra servers, load balancers, etc…).

The most complex the infrastructure is, the higher the maintenance costs and the risk of failure.

As a general rule, I’d say that if the infrastructure is far from the standard, the help of an expert is useful (Not only in determining the cost of maintaining it as is, but also the possibility of reducing it without losing quality).

In conclusion

The whole idea behind these checks is to determine:

  • How flexible the software is
  • How experienced will a technical team need to be in order to take over
  • How hard will it be to bring new technical resources ready to put their hands in and tweak or upgrade the site

That should give you an idea of what you’re getting into by investing on this website. The fact that all the red flags are raised doesn’t immediately mean it’s a bad investment of course if you can purchase it at a bargain price and have the technical resources to fix all those issues it could be a great opportunity, but it’s important that you don’t get any false expectations.

Is it worth going through all this trouble?

Going back to the used car analogy… if you don’t know your way around cars you better have a trusted mechanic by your side when you’re evaluating that shiny used car.

Failing to do so can put you in a very uncomfortable situation. I should know… I’ve been there myself (With the car, not the website!).


 Have any questions? Find me on Linkedin or post a comment below!