It’s estimated that 80% of all the content on the internet lies below the surface web, undiscovered by most search engines. We believe it’s one of the reasons the pendulum is swinging back toward the need for a librarian as a digital guide for navigating this deep, rich ocean of information.
LibSource Senior Director of Research and Intelligence, John DiGilio, recently presented a webinar on the topic of “Deep Web, Deep Insights” in which he discussed the opportunity and need for delving below the surface web, which is the purview of popular search engines like Google and Bing.
Content undiscovered in basic internet business research
John included some examples of the kinds of content found in the deep web that’s not usually available through basic web searches, even to people who are adept at creating search strings. As John explained, this content is only discovered through specialty search or special access, and includes the following:
Special collections
Special collections are known to librarians, researchers and archivists as materials (in this case digital, but could be any format) that are unique, possess artifactual or monetary value and contribute to the fundamental mission of the collection owners. Special collections are available in virtually every discipline and industry, from law to science, often behind paywalls or other restrictions.
Proprietary databases
Commercial, government, non-profit and academic organizations generate massive amounts of data through their missions and operations. As proprietary databases, much of this content is privately held and not available outside the organization. However, for some sectors, especially government, the information is available and accessible to skilled researchers.
Social media
Facebook alone is closing in on two billion users worldwide, offering information and insights on individuals and organizations that can’t be found anywhere else. While some social media posts are increasingly being indexed, user control over their social media accounts keeps most of it hidden.
Intranets and internal sites
Most organizations now manage and rely on their own intranets and other internal sites, such as networks on Yammer or Slack. While private, they would be considered part of the deep web.
John did touch on the ethics of deep web research, making it clear that ethical research does not include hacking! But in terms of volume, these categories illustrate that it’s highly likely and plausible that the deep web does indeed contain the vast majority of web content.
Why librarians are the key to deep web research
If you are a librarian or other researcher/information professional who needs to present a case for your value, or if you are a manager questioning the value of investing in research and analysis capabilities, we offer the following considerations:
- Relying on surface web research does not give an organization a competitive advantage, because it gives you the same popular information being used and referenced by everyone else.
- Some of the most timely, relevant and reliable information sources are below the surface web, in what is commonly referred to as the deep web.
- Mining the deep web can be a costly, time-consuming undertaking. Trained library and information science professionals can do it more quickly, efficiently and less expensively than most information users themselves—users like lawyers or managers in marketing, sales, business development, finance and other functional areas.
Along with time and money, John discussed other pitfalls that await deep web divers, like additional information overload that can be overwhelming. Experience and training in library and information science does make a difference, allowing those workers to avoid pitfalls and mitigate problems and potential liabilities.
We’re not saying that the surface web has no place in business research and intelligence. For many basic information needs, Google and Bing perform well. However, for organizations and managers that believe in the value of the information advantage, navigating the deep web must become a priority.