EmailTime Project |
|| Home || On-Line CV || Master's Thesis || Projects || Publications || Contact Info ||
I have completed my M.Sc. program in the School of Interactive Art and Technology at the Simon Fraser University. I was working under the supervision of Prof. Christopher D. Shaw and Prof. John Dill. EmailTime (originally is designed and implemented by Ji-Dong Yim and Will Chao) is a tool for visualizing email dataset over the course of time, data storage is handled with txt and xml files IDE: Eclipse, Language: Java. Introduction
Although the discovery and analysis of communication patterns in large complex email datasets is a difficult task, it can be a valuable source of information. In my thesis, we described the design and visualization technique of EmailTime, a tool for visual analysis of email correspondence patterns over the course of time that interactively portrays personal and interpersonal networks. EmailTime helps email dataset explorers interpret archived messages by providing interactions, visualizing histograms and measuring centrality (To, Cc and Sent) and frequency (sent and received). We performed case studies on the Enron dataset to discover impacts of executive position on the email behaviour of organizational workers using a series of metrics e.g. number of sent and received emails as determined by From:, To: and Cc: fields, recipient counts of sent emails. In addition, we evaluated the visualization through pilot and user studies to find out whether users were able to recognize the selected capabilities.
Main Contribution
The main contribution of this thesis is visualizing:
• Changes of activities over time,
• Correspondence patterns between email users over time and,
• Role of the owners of email addresses in an event.Visualization Design
The left side is a Graph view of a small network. The right side is a Plot view of the same network for aaron@a.org by EmailTime. A message can draw multiple circles in three different colours; black for sent email as determined by From: field, blue for received email as determined by To: field, and green for received email as determined by Cc: field. The size of a sent node represents the number of recipients. (e.g. the Message #2 is sent by Beth to Aaron, and copied it to Chris and David.)
![]()
EmailTime Visualization
This is a snapshot of EmailTime visualization. The left side presents a visualization displaying a collection of emails from datasets of six Enron workers with different executive positions in two years (2000-2001). The activities of email addresses (Y-axis) are plotted over time (X-axis). On the right side is the control panel that provides axes controls, keyword search, visibility filters, centrality and frequency analysis tabs, and more option tabs.
![]()
Functionality
System functionalities include basic interactions such as zooming, panning, highlighting, tooltip, seeing the content of a message; visibility filters which is a node type selector applied to the three types of email node - Sent, To, and Cc; search options; some statistic measurements namely frequency (sent and received), centrality (sent, To and Cc) and histogram views of sent and received emails. For example, here is the received emails as determined by the To: field for an Enron Trader with 40 intervals in 2001. There is a peak of received emails in October.
System Capabilities
Time Comparison: Dataset of an Enron Employee in two different time periods.
Most Frequent Correspondents: Dataset of an Enron Vice President.
Email Address Comparison: Dataset of Enron President.
Find an Event by the Search Option: Dataset of Enron CEO.
Usability Study
In the usability study, we hypothesized that EmailTime visualization enables users/analysts to find interpersonal social activity in email datasets through visualization including:
• Changes of activities over time (e.g. switching from one email address to another),
• Correspondence patterns between email users over time (such as the most frequent correspondents and types of their correspondences; general or private messages) and,
• Role of the owners of email addresses in an event (e.g. secretary or leader in a biweekly meeting in an organization dataset)These capabilities form a basis for interpreting data that is visualized by EmailTime, such as:
1. Time Comparison: Compare different time periods to each other and recognize their differences with respect to the crowded eras, large gaps (no activity), sent emails with Large number of recipients, etc.
2. Most Frequent Correspondents: Find the most frequent correspondents of a person and types of their correspondences (private or general messages based on the recipient count of sent emails).
3. Email Address Comparison: Compare different email addresses to each other with respect to the duration, and activity level and role (sender, receiver or both) of each email address, and discover which email addresses were switched.The focus was on investigating the dataset of individuals. Twenty-three graduate students from SIAT, SFU participated in the one to two hour testing sessions (four participants in Pilot Study I, six participants in Pilot Study II and thirteen participants in User Study).
The scenarios in the user study contain inferential and deductive tasks. Therefore, the tasks were not easy, as the users need to do inference and draw conclusions. The majority of the participants were able to complete the tasks but some of them were confused in the deductive part of the scenarios and asked the observer (me). We expected that the participants would accomplish the scenarios between 7 to 10 minutes. It appears that users accomplished easier tasks faster. Generally most of the participants mentioned that they are able to recognize similar scenarios.
Case Study
I was interested in the impacts of the executive positions (in this case Enron email dataset) on the email behaviour of people in an organization for the metrics (including the number of sent emails as determined by From: field and received emails as determined by To: or Cc: fields, Number of email addresses, number of created folders and recipient count of sent emails.
Benchmark: Enron dataset.
Average number of sent and received (as determined by the Cc: and To: fields) emails for Enron organizational positions from 2000 to 2001.
Contribution Index (CI) for organizational positions. Average of CI, To-CI and Cc-CI.
![]()
Average normalized number for sent emails with Small, Medium and Large number of recipients for each organizational position.
My Contribution and More Information on This Study
The EmailTime project was initiated by Prof. Christopher D. Shaw at SFU in 2008. As a lead researcher of the project in 2009 and 2010, I myself:
• Designed research questions, methods, and experiments,
• Collaborated with two researchers (Christopher D. Shaw and Ji-Dong Yim),
• Implemented the user interface and statistics features of the system,
• Investigated the Enron dataset by performing case studies and quantitative analyses, and
• Conducted usability studies which involved development of experiment tasks, questionnaires, and statistical analysis on the experiment data.Main Collaborators: Ji-Dong Yim, Prof. Christopher D. Shaw.
See Slide.
See Thesis.
Read more.
|| Home || On-Line CV || Master's Thesis || Projects || Publications || Contact Info ||
All resources at this website are copyrighted by Minoo Erfani Joorabchi.