The Dark Crawler

Primary Contributors:

Dr. Garth Davies
Dr. Richard Frank
Bryan Monk
Julianna Mitchell

Title: Lighting up the Dark Web: Mapping Tor in Search of Violent Extremist Content

Objectives: The objectives of this research were three-fold, with all involving the development and use of a novel web-crawler which we called The Dark Crawler. Our first objective was to extend our existing web-crawler technology such that it could access the dark-web (i.e. Tor), navigate it.

The second objective of this research was to be able to automatically identify extremist content when it was encountered on Tor. This was done with the help of sentiment analysis software, which was configured appropriately to work in conjunction with the crawler. The goal was to try to  automatically interpret the sentiment of the text found, with the goal of locating and extracting violent extremist and terrorist content. Since little is known regarding the activities, discussions and habits of extremists and terrorists within the Tor network, the results of this research provided novel insights into these groups. Finally, the third objective was to develop The Dark Crawler as a packaged, stand-alone product in which end-users could use to navigate both the Tor network and the public internet for target criteria, while also storing and analyzing the captured data (if so desired).

Scope: Over the course of this project, adaptations to our existing web-crawler were made. First, we provided a solid understanding of the state-of-the-art through an exhaustive literature review and manual survey of Tor. This was followed by a plan to modify our existing web-crawling system, after which the actual modifications were completed for the crawler to access Tor. We then tested the system through an initial trial data capture phase, focusing on breadth, which demonstrated this new capability of the software. Results were analyzed, after which modifications were implemented to allow the crawler to focus on extremist content. This was done through the use of sentiment analysis. The entire system was then tested and results analyzed for accuracy. It was expected that modifications to the sentiment analysis were required to produce an optimal configuration. Once the entire system was calibrated, a final data collection took place, focusing on the automatic identification of extremist content.

PROJECT 1: Surfacing Collaborated Networks in Dark Web to Find Illicit and Criminal Content

The first part of this paper explores illicit and criminal content identified by prominent researchers in the dark web. We previously developed a web crawler that automatically searches websites on the internet based on pre-defined keywords and follows the hyperlinks in order to create a map of the network. This crawler has demonstrated previous success in locating and extracting data on child exploitation images, videos, keywords and linkages on the public internet. However, as Tor functions differently at the TCP level, and uses socket connections, further technical challenges are faced when crawling Tor. Some of the other inherent challenges for advanced Tor crawling include scalability, content selection tradeoffs, and social obligation. We discuss these challenges and the measures taken to meet them.

Our modified web crawler for Tor, termed the “Dark Crawler,” has been able to access Tor while simultaneously accessing the public internet. We present initial findings regarding what extremist and terrorist contents are present in Tor and how this content is connected to each other in a mapped network that facilitates dark web crimes. Our results so far indicate the most popular websites in the dark web are acting as catalysts for dark web expansion by providing necessary knowledgebase, support and services to build Tor hidden services and onion websites (see Figure 1).

Figure 1. Collaborated Network with Extracted Tor Websites

Relevant Publications:

Zulkarnine, A., Frank, R., Monk, B., Mitchell, J., & Davies, G. (2016). Surfacing Collaborated Networks in Dark Web to Find Illicit and Criminal Content. In Proceedings of the 2016 IEEE International Conference on Intelligence and Security Informatics (ISI).

PROJECT 2: Uncovering the Dark Web: Examining Tor Through Social Network Analysis

The Darkweb is a part of the internet that requires specialized software to access it. Tor remains the most prominent darkweb in existence. Qualitative analysis has previously focused on case studies in Tor such as the Silk Road which is only one small piece of the network. Websites are connected through hyperlinks which allows information to flow within Tor.

This study explores the ways that Tor nodes are using these pathways and identifies core sites (central and connecting hubs) through social network analysis. The core websites have a significant portion of all the connections within the network and are largely composed of directory sites which link users to specific content (see Figure 2).

The core is consistent with other dark networks but indicates its limited generalizability to online networks. Network topology may play a larger role in hyperlink formation than network typology. The core serves a critical function which has implications for detecting how users and goods/services move through this network.


Figure 2. Central and Connecting Hubs of the Dark Web

Relevant Publications:

Monk, B., Mitchell, J., Frank, R., & Davies, G. (2018). Uncovering Tor: An Examination of the Network Structure. Security and Communication Networks.