Research Interests
- Data Mining and Database Systems: data warehousing and OLAP; web data management and integration; graph mining; social network analysis.
- Machine Learning and Web Search: machine learning for ranking; keyword search; web spam detection.
- Privacy Protection and Cyber Security: data mining applications in cyber security; models and algorithms for privacy preserving data publishing and data mining.
Recent Research Projects
- Data Analysis and Mining of Structured and Unstructured Data: The boom of the Web 2.0 has given rise to an ever increasing amount of text data associated with multiple structured attributes. Such data can be easily maintained using relational databases where some text attributes are used to store text-rich information. A more analytical view of such text-rich data, associated with multi-dimensional structured attributes is needed through analysis, groupings, aggregations, and so on.
- Privacy Protection in Web Search: Web search engines track users' search activities so as to provide personalized search results. Information such as IP address, search queries, click-through data are all maintained in search logs. Not surprisingly, detailed user profiles can be constructed from search logs. In such a case, search engine companies have to be trusted to not abuse their privileges. However, this arrangement in practice is not always desirable. The objective of this project is to investigate methods that provide privacy protection guarantees for Web users without compromising Web search performance substantially.
- Social Spam Detection: Online social networks such as Facebook and Twitter nowadays have become one of the major information sources for millions of users. Associated with this, however, are increasing amounts of spam information spread over the entire networks. There is great need for methods that can detect spams (e.g., spam information, spam user) effectively and timely. The objective of this project is to investigate methods that can detect spams in online social networks using a unified model by integrating both structural and textual information.