Design Architecture

Phishalytics contains 4 core systems:

In addition to the 4 core systems, Phishalytics also contains a number of shared components:

The overall architecture of Phishalytics is shown in the figure below. The correlation between each of the 4 core systems and their relevant research project is summarised in the table below.

System Research Project
BAS Blacklist Analysis Study
PMTDS Time-of-Post Twitter Study
WBTS Web Browser Phishing Detection Study
TURLSIS Time-of-Click Twitter Study

Key data sources that we use for our experiments include: Twitter's API, Google's Safe Browsing (GSB) API, and the PhishTank (PT), and OpenPhish (OP) data feeds. For our measurement studies we build an infrastructure that is designed to work around some limitations of the data sources we use. These limitations include: Twitter's data feed is a "small sample" of all global tweets, the GSB API blacklist, in some of our studies, is limited to 10,000 daily lookup per day, and our Twitter URL Shortener Investigation System (TURLSIS) cannot send too many HTTP requests per second because this would flood Twitter's servers. The design Phishalytics to ensure that, despite these limitations of the data sources, our measurement framework, and experiments we run on this framework, produce accurate results to answer our research questions.

Blacklist Analysis System (BAS)

In our Blacklist Analysis Study we use BAS to analyse 3 key blacklists: GSB, OP, and PT. We investigate URL uptake, dropout, typical lifetimes, and overlap. These characteristics of the blacklists help us to evaluate the effectiveness of the blacklists. BAS interacts with the BULS component to retrieve information about each of the 3 blacklists. BAS then analyses the blacklists' information through a number of different experiments to determine the characteristics of the blacklists.

Phishing & Malware Tweet Detection System (PMTDS)

In our Time-of-Post Twitter Study we use PMTDS to investigate how effective Twitter's use of blacklists is at protecting its users from phishing and malware attacks. Our study focuses on the delay period between an attack URL first being tweeted to appearing in one of the 3 blacklists. PMTDS interacts with the shared components: TCS, RCES, and BULS. PMTDS does this by checking all publicly tweeted URLs in the TCS component for blacklist membership in the BULS component. The redirection chains for all URLs in the TCS component are extracted via the RCES component. PMTDS then carries out a number of measurements and experiments to investigate how effective Twitter's use of blacklists is in protecting its users from phishing and malware attacks.

Web Browser Testing Suite (WBTS)

In our Web Browser Phishing Detection Study we use WBTS to test the detection rates of popular web browsers across different operating systems. WBTS comprises 4 core components: the Master Controller (MC), Test Machines (TMs), a Monitoring System (MS), and the Test Suite Software (TSS). The architecture design for WBTS can be seen in the figure below.

One of the main problems encountered during the tests was that, occasionally, the system for testing a web browser would fail. This could have been for any number of reasons such as the browser failing to load, keyboard shortcut not working, application unavailable for focusing etc. Detecting these failures was problematic since it was not always feasible to watch the TMs constantly. Therefore the Monitoring System (MS) provided an effective solution to alert the author to any problems.

Component Role
Master Controller (MC) Extract phishing URLs from data source, orchestrate tests (on TMs), collate data, analyse data, produce statistics and reports.
Testing Machines (TMs) Run various operating systems and web browsers in a safe environment; execute TSS to determine web browser detection rates (when instructed by MC).
Monitoring System (MS) Provide a monitoring and alert system to detect errors and improve recovery time from failed tests.
Test Suite Software (TSS) Custom-built software tailored to each specific operating system and component; runs on MC, TMs, and MS.

Twitter URL Shortener Investigation System (TURLSIS)

In our Time-of-Click Twitter Study we use TURLSIS to investigate how effective Twitter's URL shortening service (t.co) is at protecting Twitter users from phishing and malware attacks. TURLSIS interacts with the same shared components as PMTDS, that is: BULS, RCES, and TCS. TURLSIS also interacts with PMTDS to provide additional functionality. TURLSIS does this by checking all tweets that contain phishing URLs (from PMTDS) and checking which URLs have been blocked by Twitter at time of click.