DINAMIC Lab Group Meetings

Fall 2014

Date/Location Speaker Talk Comment
9/19/2014
1-2pm
ITE 471
Cailing Dong Title: A Weakly Supervised Framework for Data Stream Filtering and Summarization
Abstract: In this paper we present a weakly supervised framework for relevant content filtering and summarization from social media platforms such as Twitter. Social media platforms are a rich source of information these days, however, of all the available information, there is only a small fraction which is of general interest. In this paper, we present a framework to filter out topic-specific relevant information from irrelevant information in the stream of text provided by social media platforms, and further provide sequential summary or event story line for the given topic or event. Our framework does not depend on any labeled data, however it is capable of using domain knowledge in the form of rules and guidelines provided by domain experts. It is therefore easily extensible for new topics and events. The proposed framework is built keeping the streaming nature of social media platforms in mind, i.e., it is able to discover the content relevant to a specific event as it evolves in the text stream; based on which it captures the most important and updated information for summarization. We experiment on a dataset provided by TREC, and show that the framework not only filters relevant content for an event but also generates its story line effectively.
10/3/2014
1-2pm
ITE 471
10/17/2014
1-2pm
ITE 471
10/31/2014
1-2pm
ITE 471
11/14/2014
1-2pm
ITE 471
11/28/2014
1-2pm
ITE 471
12/12/2014
1-2pm
ITE 471

Spring 2013

Date/Location Speaker Talk Comment
4/12/2013
1-2pm
ITE 471
Fan Yang Title: A New Differential-Privacy Mechanism for Privacy-Preserving KNN
Abstract: Data mining presents many opportunities for enhanced services and products in diverse areas such as healthcare, banking, traffic planning, online search, and so on. However, its promise is hindered by concerns regarding the privacy of the individuals whose data are being mined. Therefore, there is great value in data mining solutions that provide privacy preserving without significantly compromising accuracy. This presentation will bring our recent study about Differential Privacy models and the mechanisms implemented on K-Nearest Neighbors (KNN) algorithm. We will outline some of the limitations of the existing models and compare them with our new method which can achieve the same level of accuracy as the naive KNN algorithm and also guarantee reliable privacy.
4/19/2013
1:30-3:30pm
ITE 459
Ahmed AlEroud Title: Contextual Information Infusion to Detect Known and zero-day Cyber-attacks
Abstract: Over the past decades, Intrusion Detection Systems (IDSs) have been used against malicious computer attacks. In general, IDSs utilize machine learning techniques to discriminate among different types of network activities. Although modern IDSs keep on improving, they still have difficulty in analyzing ever-increasing amounts of data, generate a high amount of false alarms, fail to identify unknown (zero-day) attacks, and exhibit a low degree of reliability. The focus of this research is a novel contextual framework that has been utilized in detecting known and most importantly zero-day cyber-attacks. Besides expanding or restricting IDS predictions, the framework utilizes several extractable contexts from IDS data to detect known and zero-day attacks, and keep the false positive rate as low as possible, while keeping the computational overhead as minimum as possible. We consider the contextual dimensions - relation, activity, individuality, time and location in creating known and zero-day attack detection techniques. The components of our framework are implemented in a prototype system which works in two modes. The first mode is used to detect known attacks, and the second one is utilized to detect zero-day attacks. We performed a series of experiments to prove the effectiveness of our techniques. A comparison with other approaches shows that the components of our framework have superior results in term of efficiency and attack detection rate.
Proposal Defense
4/26/2013
1-2pm
ITE 471
Ibrahim Toure Title: A Method for Analyzing Terrorist Attacks
Abstract: Analyzing terrorist attacks is important for homeland security. Analyses of past records can provide important information on those attacks and enable appropriate actions to prevent similar attacks in the future. In this research, we present a novel method based on Latent Dirichlet Allocation to analyze data collected by START (Study of Terrorism and Responses to Terrorism) from 1970 to 2010. The first step in our method consists of generating topic models from the data. We then identify the most frequent terms occurring across various topic distributions. Moreover, we study the evolution of different kinds of attacks that occurred over time. The results show that a distinct change in attack patterns emerges over the past four decades.
5/3/2013
1-2pm
ITE 471
Cailing Dong Title: Personalized/ Ensemble Web Spam Detection
Abstract: Most existing studies about web spam detection explicitly or implicitly assume that web spam detection is performed on the search engine¡¯s server side. We argue that in some particular scenarios, web spam detection is preferred to be conducted on the client side (e.g., intelligent web browsers). When a page is viewed using intelligent web browsers, the server side global detector will suggest a spamicity score, and the integrated personalized detector can determine whether the page is spam or not specifically tailored to users¡¯ judgments of spam. We propose a practical framework to implement the personalized web spam detector. The experimental results obtained from an empirical evaluation verify the effectiveness of the proposed personalized web spam detection solution. Besides, we further split the global spamicity decision makers to be peer users who have the similar browsing history with the given user and the remaining users who are called ordinary users. We regard spam detection as a collaborative work and build ensemble classifier based on the historic judgment of the three set of users. Our experiment simulates the collaborative process and proves the effectiveness of this model.
5/10/2013
1-2pm
ITE 471
Amir Karami Title: A Quick View to Topic Model Evaluation Methods
Abstract: Topic models are one of unsupervised techniques that extracts likely topics from texts. Among topic models Latent Dirichlet Allocation (LDA) is the most famous and popular technique. One of LDA¡¯s challenges is related to its evaluation. There are different methods for this problems. In this presentation, I categorize the evaluation methods in two major groups, namely as internal and external methods. I will focus more on the first group. In internal category, we have two approaches. The first one is to measure performance on some secondary task, such as document classification. The next one is by estimating the probability of unseen held-out documents given some training documents. At the end, I show some comparisons of the methods based on the literature.

Fall 2012

Date/Location Speaker Talk Comment
10/5/2012
1-2pm
ITE 457
Gergely Kovacs Title: Making sense: accessing facts and patterns in textual information

Abstract:
There is an explosion of textual information in electronic forms that presents great opportunities in identifying fine-grained relations among concepts. The recent review article mentioned above serves as the basis for introducing established methods and highlighting intriguing research directions in text processing. Text processing will be approached from the starting point of interpreting limited length strings, such as a log entry or a query. The interpretation problem is defined as the task of finding relations that help expand the initial string to eliminate ambiguity.

The concepts of semantic / concept map, ontology, as well as part-of-speech tagging, term extraction and conventional measures of success are introduced. The examples and methods are likely to be familiar for an expert audience, but the talk serves the dual purpose of eliciting input and opening up my thought process for fellow first / second year graduate students.
Research Project
10/12/2012
1-2pm
ITE 457
Cailing Dong Title: Effectively Detecting Content Spam on the Web Using Topical Diversity Measures

Abstract:
Recent studies about web spam detection have utilized various content-based and link-based features to construct a spam classification model. In this project, we conduct a thorough analysis of content spam on the web using topic models and propose several novel topical diversity measures for content spam detection. We adopt the web spam benchmark data set WEBSPAM-UK2007 for evaluation, and the experimental results verify that by integrating our topical diversity measures the performance of the state-of-the-art web spam detection methods can be greatly improved.

In addition, comparing to existing features for training spam classification models, our topical diversity measures can achieve high spam detection performance using small set of training data. In personalized web spam detection, the training data (i.e., user's spam labeling results) are typically small. Our finding makes personalized web spam detection highly achievable. We develop an efficient and effective regression model using topical diversity measures for personalized web spam detection, and present some promising results obtained from an empirical study.
Research Project
10/19/2012
11am-12pm
ITE 456
Department's Distinguished Information Technology Lecture (Link)
10/26/2012
1-2pm
ITE 457
Fan Yang Title: Survey of Differential Privacy

Abstract:
A lot of individual information are currently collected and analyzed by a broad spectrum of organizations. While these data clearly hold great potential for analysis, they are commonly collected under the premise of privacy. Careless disclosures can cause harm to the data¡¯s subjects and jeopardize future access to such sensitive information.

This presentation will bring our recent study about a privacy-preserving model: Differential Privacy. Existing privacy models depend on the background knowledge of attackers or the data itself. Differential Privacy tries to ensure that the removal or addition of any record in the database does not change the outcome of any analysis by much. In other words, the presence of an individual is protected regardless of the attacker¡¯s knowledge. We will provide a general overview of the state-of-the-art in Differential Privacy, and outline some of the limitations of the model and the various mechanisms that have been proposed to implement it. Also we will list some practical challenges to the application of Differential Privacy.
Reading & Discussion on Differential Privacy
11/2/2012 Cancelled (due to Hurricane Sandy)
11/9/2012
1-2pm
ITE 406
(A joint meeting with Health/IT group)
Ali Azari Title: Predicting Hospital Length of Stay (PHLOS) : A Multi-Tiered Data Mining Approach

Abstract:
A model to predict the Length of Stay (LOS) for hospitalized patients can be an effective tool for health care providers. Such a model will enable early interventions to prevent complications and prolonged LOS and also enable more efficient utilization of manpower and facilities in hospitals.In this paper, we propose an approach for Predicting Hospital Length of Stay (PHLOS) using a multi-tiered data mining approach. In this paper we propose a methodology that employs clustering to create the training sets to train different classification algorithms. We compared the performance of different classifiers along several different performance measures and consistently found that using clustering as a precursor to form the training set gives better prediction results as compared to non-clustering based training sets.We have also found the accuracies to be consistently higher than some reported in the current literature for predicting individual patient LOS. The classification techniques used in this study are interpretable, enabling us to examine the details of the classification rules learned from the data. As a result,this study provides insight into the underlying factors that influence hospital length of stay. We also examine our results with domain expert insights.
Research Project
11/16/2012
1-2pm
ITE 457
Ahmed AlEroud Title: Context and Semantics for Detection of Cyber Attacks

Abstract:
In order to increase the accuracy of intrusion detection systems and reduce the false alarm rate for cyber security analysis, attack correlation has become an indispensable component in most intrusion detection systems. However, traditional intrusion detection techniques often fail to handle the complex and uncertain network attack correlation tasks. We present a layered cyber-attack detection system with semantics and context capabilities. The described approach uses semantic information about related attacks to infer possible related suspicious network activities from connections between hosts. We use semantic networks, which represent relationships among network attacks and assist in automatically identifying and predicting related attacks. In addition, we detect with increased precision probable attacks by using context information on attack profiles, and host contexts. We show that context information can be used to decrease false positives rate. A prototype system has been implemented and evaluated on the NSL-KDD intrusion detection dataset, where the experimental results have shown competitive precision and recall values of the proposed system compared with previous approaches.
Research Project
11/23/2012 Cancelled (Thanksgiving Break)
11/30/2012 Cancelled
12/7/2012 Cancelled
12/14/2012
1-2pm
ITE 459
Karim Said and Amrita Anam Talk by Karim Said: Same Old Song

Music is a highly complex expression of culture, so it makes intuitive sense that music would evolve alongside culture. This is anecdotally supported by clashes between generations that relentlessly defend the music of their formative years while simultaneously decrying the music of their ancestors and progeny. But to what extent does music actually change in terms of the many complex characteristics that may be used to understand music in a mathematically decomposable way?

This study attempts to explore the composition of music across five decades (1960s-2000s) along the dimensions of loudness and tempo. A basic knowledge discovery lifecycle approach is used to analyze a collection of one-million songs, and results are presented that indicate distinct similarities in the way music is composed across the evaluated dataset. To conclude, a brief discussion of challenges related to the mathematical evaluation of music is presented, along with a potential roadmap for future investigations.

Talk by Amrita Anam: Crime in United States: Finding the Reasons and Outliers

Crime analysis can actually have a very useful impact in the society. In order to prevent crime, it is necessary to know what causes crime. Though these underlying factors vary from place to place, most of them are fairly consistent globally. Data mining can play a crucial role in identifying these factors. In this project, have proposed a method to find the main reasons behind crime in United States and how the general pattern is throughout the country. The method also finds the trend in all factors in each state and all states in each factor. From this I have detected some outliers and studied those outliers to observe the trend inside those states.
TBA