In 1996, visionary and research aficionado Brewster Kahle founded The Internet Archive. It’s housed in an unassuming San Francisco building, but stores some incredible data: the Archive currently boasts over 281 billion web pages and digital content, with billions more added every month.
The Archive’s mission is simple and yet ridiculously complex: To chronicle the history of the internet. All of it. Lofty as it is, this goal is critical to far more than just those among us obsessed with research and history. Maintaining a bona fide chronicle of the online world is also crucial to business owners keen on understanding trends, and studying what has and has not worked throughout different stages of the dot com journey. Thanks to the Archive, it’s still possible to learn from the internet’s short yet rich history.
First Things First: Google is Not an Archive
You might wrongly assume that Google itself acts as an archive, since it’s essentially a dynamic homepage for the entire web. Since Google’s algorithms are not publicly accessible (and they can be frustratingly complicated to guess at), their data is proprietary and not preserved in the public’s interest — the days of Don’t Be Evil are long gone!
Compare that with the the Internet Archive, which is wisely set-up as a non-profit, and s0 maintains the interest of the masses, not of a corporation. Google wants to sell ads; the Archive just wants to preserve the incredible creativity and storytelling of every web page it can capture.
Yes, this sounds very hippie-ish, but the integrity of an archive is as important as that of a library, so it’s crucial that we don’t rely on a private company for either one.
How and What Data is Collected
Archive engineers are tasked to monitor and crawl the top million web addresses in the world. Data is then captured and stored, and every three months, they start over again with a brand new list. Thanks to the ever-changing, wildly dynamic nature of the web, these top sites change regularly, so staying on top of current trends is critical.
In addition to the massive amount of web data, the Archive also contains over 750,000 books, with many more slated for future addition. Information is also collected from more than 60 TV stations and YouTube videos, which are often selected due to Twitter trending. The web collections manager at the Archive, Alexis Rossi, estimates that 10 billion URLs are saved every three months, which roughly equates to one-tenth of what is released across the entire internet. “It’s a Sisyphean task,” says Rossi. “We know we’ll never get it all. The web by its nature is infinite.”
How Business Owners Can Use the Archive
Taking the time to study the history of high-value homepages can give business owners insight into how navigation and page formatting has changed over the years to suit the needs of users. It’s also an excellent way to monitor the progress of your top competitors; you can track a URL or group of addresses over a selected time period, and take special note of the major changes made. This kind of insight can help you avoid mistakes your competitors have already made, and learn from the consistencies that helped them stay afloat.
The Archive is also a mecca for people engaged in web-related lawsuits. By accessing archived pages otherwise unavailable on the web, lawyers and those involved in suits can validate digital claims, like incomplete Terms and Conditions or deceitful advertising tactics. As a business owner, you can also use the archives to research active patents that may be relevant to your business.
Last but not least, studying past design trends will also reveal plenty of artistic insights; when you see what a page looked in 2000 versus 2013, you’ll quickly realize A) how much we have evolved (we’re certainly a bit shocked at how much Flippa’s design has evolved over the years!) and B) how many dot coms have yet to actually modernize. Pro tip: you want to avoid being in that second category.
The Wayback Machine: The Archive’s Ingenious Interface
Want to fast forward to the most critical and crazy fun part of the Archive? Head to the Wayback Machine. The Wayback Machine, whose name references a segment from The Rocky and Bullwinkle Show, is the method by which we can interface with the gigantic Archive. The interface allows you to type in a URL, and essentially time travel as you view that site’s growth and changes (or utter demise) through the years.
This interface helps to navigate the behemoth Archive in a very intuitive manner. People use it to find old digital art pieces, lost articles or blogs, or to just take a fabulous trip down memory lane. As of the start of 2013, the Wayback Machine covered over 240 billion URLs. Yes, its obvious you can get lost in this machine! But what a way to spend some quality researching time.
Why Is It Important to Archive the Internet?
In additional to all the practical business applications of maintaining a comprehensive history of the web, the urgency to further develop the Archive continues to intensify. Part of this motivation comes from the fact that the data, in many ways, is in danger of disappearing. Government interests, as an example, continue to threaten the presence of much of the Archive, as they often have a strong desire to suppress and sensor data.
The inevitable damage to hardware systems that just do not last forever (yet) means that precious terabytes are lost every day, never to be recovered – unless there’s an Archive that has already secured the information. Finally, there are many who view the huge application onslaught as a potential “web killer”, as more and more digital experiences happen in the confines of an app, and not in an open web environment. It’s all the more critical to capture the nuances of the web while it still exists in such a massively public forum.
Long Live the Archive
Collecting and housing this many billions of web pages takes the same amount of energy it would take to power 45 homes. You can imagine the amount of hardware, organizational skills, and just plain tenacity it takes to pull off the salvation of even 10% of the web; but it’s a tremendous service the Archive is providing to netizens across the globe.
The Archive is like a modern day Library of Alexandria, once housed in ancient Egypt. Within the Archive lies the digital history of a generation that has changed so much in so little time, in many ways thanks to the internet itself. The Archive allows business owners to witness the evolution of their particular industry and niche, and to have a million case studies across the lifecycle of the web at their fingertips. You can’t put a price on that kind of precious historical data — but if you have a few spare dollars, it’s worth making a donation. If you haven’t yet used the Archive to further your understanding of web usage and best practices, it’s time. Give it 5 minutes, and I suspect you’ll be hooked for life.
Photo credit: ecstaticist