Integrating biology databases

Current Projects

Secure and flexible Information Sharing in Coalition Environments

Privacy preserving data mining and data management

Data Exploration and Navigation

Adversarial learning

Efficient and scalable RDF store with support for federated search and reasoning

Past Projects

Semantic-based Search and Data Integration Using Semantic Networks

Database Compression

XML and substring selectivity estimation and XML indexing

Workload-aware mapping of XML to relational tables

Privacy preserving data mining and data management

In the environment of today's world where data collection and storage is growing at an exponential rate, the privacy issues related to sensitive individual information cannot be overemphasized. Many organizations need to protect privacy of data and at the same time still allow useful patterns being discovered from the data. I have been working on optimizing utility and privacy tradeoff for privacy preserving data mining algorithms. This research has been funded by NSF, NIST and MITRE.

Recent Publications:

Ahmed AlEroud, Fan Yang, Sai Chaithanya Pallaprolu, Zhiyuan Chen, and George Karabatis, Anonymization of Network Trace Data through Condensation-based Differential Privacy, ACM Digital Threats: Research and Practice, accepted in 2020.
Pooja Parameshwarappa, Zhiyuan Chen, Gunes Koru, Efficient Approach for Anonymizing Large-Scale Physical Activity Data: Multi-level Clustering Based Anonymization, International Journal of Information Security and Privacy, 14(3), 72-94, 2020.
Ohud Alqahtani, Zhiyuan Chen, Qiong Huang, Karthik Gottipati, Is Bigger Safer? Analyzing Factors Related to Data Breaches Using Publicly Available Information, in Fourth International Conference on Information Systems Security and Privacy (ICISSP), January 22-24, Funchai, Portugal, 2018.
Madhu Ahluwalia, Aryya Gangopadhyay, Zhiyuan Chen, and Yelena Yesha. Target-Based, Privacy Preserving, and Incremental Association Rule Mining, IEEE Transactions on Services Computing, 10(4), 2017.
Shaikha Al-Duaij, Zhiyuan Chen, and Aryya Gangopadhyay. Using Crowd Sourcing to Analyze Consumers’ Response to Privacy Policies of Online Social Network and Financial Institutions at Micro Level, International Journal of Information Security and Privacy, 10(2), 2016.
Ahmed Aleroud, Zhiyuan Chen, George Karabatis, Network traffic Anonymization Using a Prefix- Preserving Condensation-based Technique, in Proc. of Cloud and Trusted Computing 2016, 24-26 Oct 2016, Rhodes, Greece.
Castillo, Saul Ricardo Medrano, and Zhiyuan Chen, Using Transfer Learning to Identify Privacy Leaks in Tweets. In Collaboration and Internet Computing (CIC), 2016 IEEE 2nd International Conference on, pp. 506-513. IEEE, 2016.
Tamas S, Gal, Thomas C. Tucker, Aryya Gangopadhyay, and Zhiyuan Chen. A Data Recipient Centered De-identification Method to Retain Statistical Attributes. Journal of biomedical informatics, August 2014, 50: 32-45.
Madhushri Banerjee, Zhiyuan Chen, and Aryya Gangopadhyay. "A generic and distributed privacy preserving classification method with a worst-case privacy guarantee." Distributed and Parallel Databases, 32(1): 5-35, 2014.
Dongjin Kim, Zhiyuan Chen and Aryya Gangopadhyay, Optimizing Privacy-Accuracy Tradeoff for Privacy Preserving Distance-Based Classification, International Journal of Information Security and Privacy, 2012. 6(2): 16-33.
Yu Fu, Zhiyuan Chen, A. Gunes Koru, and Aryya Gangopadhyay. A Privacy Protection Technique for Publishing Data Mining Models and Research Data, ACM Transactions on Management Information Systems, 2010.1(1): 1-20.
Shibnath Mukherjee, Aryya Gangopadhyay, and Zhiyuan Chen. A Partial Optimization Approach for Privacy Preserving Frequent Itemset Mining. International Journal of Computational Models and Algorithms in Medicine, 2010. 1(1): p. 19-33. IGI Global, Hershey, PA.
Madhu V. Ahluwalia, Aryya Gangopadhyay, and Zhiyuan Chen. Preserving Privacy in Mining Quantitative Association rules, International Journal of Information Security and Privacy, 2009. 3(4): p. 1-17. IGI Global, Hershey, PA.
Tamas Gal, Zhiyuan Chen, and Aryya Gangopadhyay, A Privacy Protection Model for Patient Data with Multiple Sensitive Attributes. International Journal of Information Security and Privacy, 2008. 2(3): p. 28-44.
Shibnath Mukherjee, Madhushri Banerjee, Zhiyuan Chen, and Aryya Gangopadhyay, A Privacy Preserving Technique for Distance-based Classification with Worst Case Privacy Guarantees. Data & Knowledge Engineering, 2008. 66(2): p. 264-288.
Shibnath Mukherjee, Zhiyuan Chen, and Aryya Gangopadhyay, A Fuzzy Programming Approach for Data Reduction and Privacy in Distance Based Mining. International Journal of Information and Computer Security, 2008. 2(1): p. 27-47.
Shibnath Mukherjee, Zhiyuan Chen, and Aryya Gangopadhyay, A Privacy Preserving Technique for Euclidean Distance-Based Mining Algorithms Using Fourier-Related Transforms. VLDB Journal, 2006. 15(4): p. 293–315.
Shibnath Mukherjee, Zhiyuan Chen, Aryya Gangopadhyay, and Stephen Russell, A Secure Face Recognition System for Mobile-devices without The Need of Decryption. Workshop on Secure Knowledge Management (SKM 2008), 2008, Dallas, Texas.
Madhu Ahluwalia, Zhiyuan Chen, Aryya Gangopadhyay, and Zhiling Guo, Preserving Privacy in Supply Chain Management: a Challenge for Next Generation Data Mining. NSF Symposium on next generation data mining, 2007.

Data Exploration and Navigation

Due to the large volume of data stored in many different databases, information overload becomes one of the major obstacles for ordinary people to search for useful information in a database. My research on data exploration and navigation aims to use navigation and search techniques to help user quickly useful information. The novelty of my approach is as follows: (1) my approach takes into account the diversity of user preferences, (2) my approach uses probabilistic models and is robust to poor quality data.

Recent Publications:

Liang Tang, Tao Li, Yexi Jiang, and zhiyuan chen. Dynamic Query Forms for Database Queries. IEEE Transactions on Knowledge and Data Engineering, 26(9): 2166 - 2178, 2014.
Zhiyuan Chen, Tao Li, and Yanan Sun, A Learning Approach to SQL Query Results Ranking Using Skyline and Users' Current Navigational Behavior. IEEE Transactions on Knowledge and Data Engineering, Volume 25 Issue 12, December 2013, Pages 2683-2693.
Shenghuo Zhu, Tao Li, Zhiyuan Chen, Dingding Wang, and Yihong Gong, Dynamic Active Probing of Helpdesk Databases. International Conference on Very Large Data Bases, 2008, Auckland, New Zealand: p. 748-760(Acceptance rate: 16.7%).
Zhiyuan Chen and Tao Li, Addressing Diverse User Preferences in SQL-Query-Result Navigation. ACM SIGMOD Conference, 2007, Beijing, China: p. 641-652(Acceptance rate: 14%). 　
Navin Kumar, Aryya Gangopadhyay, Sanjay Bapna, George Karabatis, and Zhiyuan Chen, Measuring interestingness of discovered skewed patterns in data cubes. Decision Support Systems, 2008. 46(1): p. 429-439. 　
Navin Kumar, Aryya Gangopadhyay, George Karabatis, Sanjay Bapna, and Zhiyuan Chen, Navigation Rules for Exploring Large Multidimensional Data Cubes. International Journal of Data Warehousing and Mining, 2006. 2(4): p. 27-48. 　
Dongsong Zhang, George Karabatis, Zhiyuan Chen, Boonlit Adipat, Liwei Dai, Tony Zhang, and Yu Wang, Personalization and Visualization on Handheld Devices. ACM Symposium on Applied Computing, 2006, Dijon, France: p. 1008-1012.

Adversarial Learning　

With the arrival of big data era, data mining techniques have been widely used to build detection models for cyber security applications such as spam filtering, virus or malware detection, and intrusion detection. At the same time, attackers may try to modify their attack to evade detection. For example, an email spammer may drop certain words or symbols from spam emails to avoid detection of spam filtering software. An attacker may use a variant of an attack to evade detection by an intrusion detection system. My research on adversarial learning studies possible evasion attacks against cyber security protection techniques as well as techniques to increase robustiness of cyber security techniques. This research has been funded by IBM as part of the project "Accelerating Cognitive Cyber Security".

Recent Publications:

Fan Yang, Zhiyuan Chen, Aryya Gangopadhyay, Using Randomness to Improve Robustness of Tree-based Models Against Evasion Attacks, IEEE Transactions on Knowledge and Data Engineering, accepted in 2020.
Fan Yang, Zhiyuan Chen, Aryya Gangopadhyay. Using Randomness to Improve Robustness of Tree-based Models Against Evasion Attacks, 5th ACM International Workshop on Security and Privacy Analytics 2019, Dallas, Texas, March 27, 2019.
Ashwinkumar Ganesan, Pooja Parameshwarappa, Akshay Peshave, Zhiyuan Chen, Tim Oates, Extending Signature-based Intrusion Detection Systems With Bayesian Abductive Reasoning. DYnamic and Novel Advances in Machine Learning and Intelligent Cyber Security (DYNAMICS) Workshop, December 3-7, San Juan, Puerto Rico, USA, 2018.
Pooja Parameshwarappa, Zhiyuan Chen, Gangopadhyay, Analyzing Attack Strategies Against Rule Based Intrusion Detection Systems, International Workshop on Analytics for Security in Cyber Physical Systems, Varanasi, India, January 4, 2018.

Efficient and scalable RDF store with support for federated search and reasoning

RDF is the way to represent data and knowledge on semantic web. One challenge in big data era is to efficiently store and process large scale RDF data in a distributed and sometimes resource limited environment. I am collaborating with Dr. Adina Crainiceanu from US Naval Academy on enhancing the capability of Rya (a RDF triple store initially developed at USNA) in such environment. The current focus is to develop algorithms for federated search and reasoning over multiple Rya instances. Compared to existing work in the literature, our solution addresses limited storage and network connectivity capabilities. This project is funded by USNA and Navy.

Recent Publications:

Fan Yang, Adina Crainiceanu, Zhiyuan Chen, Don Needham, Cluster-Based Join for Geographically Distributed Big RDF Data, IEEE BigData Congress, accepted, 2019. (Acceptance rate 23%).

Past Projects

Semantic-based Search and Data Integration Using Semantic Networks

There is a great need to find relevant information from diverse data sources. Existing keyword based search techniques fail to take into account implicit relationships between different data objects. For example, project managers and software developers often want to find out software modules that will be affected by a certain change. This information can not be easily returned using a keyword search query.

My research develops a technique that uses semantic network to capture relationships between data objects and helps users find related information. The semantic network can also be used in data integration to find relevant data sets to integrate.

Recent Publications:

Mikael Lindvall, Raimund L. Feldmann, George Karabatis, Zhiyuan Chen, and Vandana P. Janeja, Searching for Relevant Software Change Artifacts using Semantic Networks. ACM Symposium on Applied Computing, 2009, Hawaii.
　
Zhiyuan Chen, Aryya Gangopadhyay, George Karabatis, Steve Holden, Michael McGuire, and Cambridge Elsevier, MA., Semantic Integration of Government Data for Water Quality Management. Journal of Government Information Quarterly, 2007. 24(4): p. 716-735.
　
Zhiyuan Chen, Aryya Gangopadhyay, George Karabatis, Michael McGuire, and Claire Welty, Semantic Integration and Knowledge Discovery for Environmental Research. Journal of Database Management, 2007. 18(1): p. 43-68.

Database compression

Since CPU speed improves much faster than disk speed, it makes sense to compress data on disk to achieve better I/O performance. I have investigated novel compression techniques for databases (since database needs to support fine granularity access unit such as rows or cells) and query optimization techniques to balance the I/O savings and decompression overhead.

Publications:

Zhiyuan Chen, Johannes Gehrke, and Flip Korn, Query Optimization in Compressed Database Systems. ACM SIGMOD International Conference on Management of Data, 2001: p. 271-282(Acceptance rate: 15%).
　
Zhiyuan Chen and Praveen Seshadri, An Algebraic Compression Framework for Query Results. International Conference on Data Engineering (ICDE), 2000: p. 177-188(Acceptance rate: 14%).

XML and substring selectivity estimation and XML indexing

This project investigates techniques to estimate the selectivity of XML and substring queries as well as XML indexing techniques.

Publications:

Zhiyuan Chen, Johannes Gehrke, Flip Korn, Nick Koudas, Jayavel Shanmugasundaram, and Divesh Srivastava, Index structures for matching XML twigs using relational query processors. Data & Knowledge Engineering, 2007. 60(2): p. 283-302.
　
Zhiyuan Chen, Johannes Gehrke, Flip Korn, Nick Koudas, Jayavel Shanmugasundaram, and Divesh Srivastava, Index Structures for Matching XML Twigs Using Relational Query Processors. International Workshop on XML Schema and Data Management (XSDM'05), 2005.
　
Zhiyuan Chen, Flip Korn, Nick Koudas, and S. Muthukrishnan, Generalized Substring Selectivity Estimation. Journal of Computer and System Sciences, 2003. 66(1): p. 98-132.
　
Zhiyuan Chen, H.V. Jagadish, Flip Korn, Nick Koudas, S.Muthukrishnan, Raymond Ng, and Divesh Srivastava, Counting Twig Matches in A Tree. International Conference on Data Engineering (ICDE), 2001: p. 595-604(Acceptance rate: 17%).
　
Zhiyuan Chen, Flip Korn, Nick Koudas, and S.Muthukrishnan, Selectivity Estimation for Boolean Queries. ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), 2000: p. 216-225(Acceptance rate: 22%).

Workload-aware mapping of XML to relational tables

I studied the problem of storing XML data into a relational storage such that the evaluation of queries over such XML data is optimized. Unlike existing work, this work takes into account the interplay of logical design (how to store XML in tables) and physical design (how to select indexes).

Publications:

Surajit Chaudhuri, Zhiyuan Chen, Kyuseok Shim, and Yuqing Wu, Storing XML (with XSD) in SQL Databases: Interplay of Logical and Physical Designs. IEEE Transaction on Knowledge and Data Engineering, 2005. 17(12): p. 1595-1609.
　
Surajit Chaudhuri, Zhiyuan Chen, Kyuseok Shim, and Yuqing Wu, Storing XML (with XSD) in SQL Databases: Interplay of Logical and Physical Designs. International Conference on Data Engineering (ICDE), 2004(Acceptance rate: 20%).