Gathering content from the dark networks (anonymous networks such as TOR, I2P, Zeronet or from deep websites such as Raidforum or XSS) is a complicated challenge. Some companies develop their own crawling capabilities, however we’ve found that it’s of very high maintenance and the quality and coverage are less than optimal vs. getting the information from an external provider that specialize in dark web data collection.
There are few key players that provide access to dark web data, which is basically an API for data originated from the dark web. The main added value of a dark web feed is that you can very easily plug it into your intelligence solution and focus on the data analysis and not on the data collection. The feed includes chatter with important hackers, trades of vulnerabilities, exploits, or ransomware as well as sales of data breaches of organizations by malicious threat actors.
In this blog post I’ll focus on data feed that can be plugged virtually to any system, whether it’s a Web Intelligence , OSInt or SIEM platform. As long as the system is capable of custom connector to an API and transforming the data (ETL) into its entity model it can be effective and provide valuable cyber intelligence.
The main companies that I’ve found providing such data are :
- CyberSixGill — Sixgill founded in 2014, located in Israel. Its main product is a cyber threat intelligence solution, but they also provide Rich data feeds such as Darkfeed™. Current customers include enterprises, financial services, MSSPs, governments and law enforcement entities.
- DarkOwl — Darkowl founded in 2015, located in Denver, US. The DarkOwl Vision platform provides DARKINT™ content and the tools and services to efficiently find leaked or otherwise compromised sensitive data. It also provides a dark web feed as part of the solution. Typical customers are goverment , Law enforcement agencies as well as SIEM & SOAR solutions.
- Webhose — Webhose founded in 2015, located in Israel. Its main products are API based both for the openweb and the darkweb (Cyber API). They provide mass data both for New, blogs , discussions, as well as for the dark web data focusing on cyber threats , vulnerabilities and breached content.
An optimal solution for the dark web would be one that discovers new sources automatically. It should have robust site coverage as well as enriched and structured data to help detect threats more efficiently.
Below you can find a table with the main features extracted from the vendor product pages. Some of the vendors also provided a trial period where I’ve experienced the products and evaluated the data.
* Percentage based on the number of valuable posts found from a keyword search and threat type. The percentage is based on a range between 0–100%; a higher value means better results.
** The Customer Success Service Level is based on how much engagement the vendor rated with the customer, as well as the success during the trial, demo and customer process.
**** The estimated price for a basic monthly package. Low is $2500 or below; Medium is between $2500-$6000; High is $6000 and above.
I’ve marked in green the important capabilities that are mandatory when consuming such service.
Automatic Discovery of New Sources — Many of the dark net sources are anonymous and dynamic. Admins change domains frequently or change channels. In some cases, they shift between encrypted apps. In addition, new marketplaces or forums open every week. Manual tracking can lead to loss of critical threat content. The ability to continuously map new and relevant sources is the best way to guard against this.
Content Quality Level — If an individual searches the vendor’s repository by a target domain, will he receive valuable content? Or will they need to spend a lot of time filtering out all the non-relevant content? The content quality level is determined by both the relevant results as well as the advanced filters that allow the vendor to quickly refine the query.
Customer Success Level — I’ve found that particularly with darknet data, many customers do not fully understand how to search based on their needs. The requirements for each vendor are often different. For example, some require an average number of different sources while others need massive amounts of data from specific sources.
The quality of a vendor’s customer success should be measured by:
- Response time — How much time it takes the vendor to respond with a meaningful answer to the request.
- Professionalism — Whether the Vendor’s representatives (Customer Success, Sales) are professional and can provide a good answer to the needs of the customer
- Flexibility — Even if a solution supports today’s specific set of capabilities, can the vendor adapt to dynamic customer needs?
As a conclusion, I’d like to note that all major vendors provide a powerful and robust dark web data feed, and each of them have a lot to offer. The cost of each feed represents the high maintenance costs of managing VPN networks, avatar management and engagement that needs to be conducted by a specialist.
At the end of the day, however, I found Webhose to be the most robust and to have the highest amount of data coverage. This was in both different sources as well as in the data depth (the quality of the data extracted from the site itself). The extracted data also represented a rich set of entities that can be effective for cybersecurity solutions, such as IP, domain, crypto address and more.
As a side note, I’d like to share that since my company provides Cyber Intelligence Solution, two out of the three vendors: SixGill and Darkowl, can be viewed as competitors since although they both provide a data feed they also provide their own solution. That was one of the reasons for choosing a data vendor only, such as Webhose and not the other two.