Current Projects
Secure and flexible Information Sharing in Coalition Environments
Privacy
preserving data mining and data management
Data Exploration and Navigation
Adversarial
learning
Efficient and scalable RDF
store with support for federated search and reasoning
Past Projects
Semantic-based Search and Data Integration Using Semantic Networks
Database
Compression
XML and substring
selectivity estimation and XML indexing
Workload-aware
mapping of XML to relational tables
Privacy
preserving data mining and data management
In the environment of today's world where data collection and storage is growing at an exponential rate, the privacy issues related to sensitive individual information cannot be overemphasized. Many organizations need to protect privacy of data and at the same time still allow useful patterns being discovered from the data. I have been working on optimizing utility and privacy tradeoff for privacy preserving data mining algorithms.
This research has been funded by NSF, NIST and MITRE.
Recent Publications:
- Ahmed AlEroud, Fan Yang, Sai Chaithanya Pallaprolu,
Zhiyuan Chen, and George Karabatis, Anonymization of Network Trace Data
through Condensation-based Differential Privacy, ACM Digital Threats:
Research and Practice, accepted in 2020.
- Pooja Parameshwarappa, Zhiyuan Chen, Gunes Koru,
Efficient Approach for Anonymizing Large-Scale Physical Activity Data:
Multi-level Clustering Based Anonymization, International Journal of
Information Security and Privacy, 14(3), 72-94, 2020.
- Ohud Alqahtani, Zhiyuan Chen, Qiong Huang, Karthik
Gottipati, Is Bigger Safer? Analyzing Factors Related to Data Breaches Using
Publicly Available Information, in Fourth International Conference on
Information Systems Security and Privacy (ICISSP), January 22-24, Funchai,
Portugal, 2018.
- Madhu Ahluwalia, Aryya Gangopadhyay,
Zhiyuan Chen, and Yelena Yesha. Target-Based, Privacy Preserving,
and Incremental Association Rule Mining, IEEE Transactions on Services
Computing, 10(4), 2017.
- Shaikha Al-Duaij, Zhiyuan Chen, and Aryya Gangopadhyay.
Using Crowd Sourcing to Analyze Consumers¡¯ Response to Privacy Policies of
Online Social Network and Financial Institutions at Micro Level, International
Journal of Information Security and Privacy, 10(2), 2016.
- Ahmed Aleroud, Zhiyuan Chen,
George Karabatis, Network traffic Anonymization Using a Prefix- Preserving
Condensation-based Technique, in Proc. of Cloud and Trusted Computing 2016,
24-26 Oct 2016, Rhodes, Greece.
- Castillo, Saul Ricardo Medrano, and Zhiyuan Chen,
Using
Transfer Learning to Identify Privacy Leaks in Tweets. In
Collaboration and Internet Computing (CIC), 2016 IEEE 2nd International
Conference on, pp. 506-513. IEEE, 2016.
- Tamas S, Gal, Thomas C. Tucker, Aryya Gangopadhyay, and Zhiyuan Chen. A Data Recipient Centered De-identification Method to Retain Statistical Attributes. Journal of biomedical informatics, August 2014, 50: 32-45.
- Madhushri Banerjee, Zhiyuan Chen, and Aryya Gangopadhyay. "A generic and distributed privacy preserving classification method with a worst-case privacy guarantee." Distributed and Parallel Databases, 32(1): 5-35, 2014.
- Dongjin Kim, Zhiyuan
Chen and Aryya Gangopadhyay, Optimizing Privacy-Accuracy Tradeoff for Privacy Preserving Distance-Based Classification, International Journal of
Information Security and Privacy, 2012. 6(2): 16-33.
- Yu Fu, Zhiyuan
Chen, A. Gunes Koru, and Aryya Gangopadhyay. A Privacy Protection Technique for Publishing Data Mining Models and Research Data, ACM
Transactions on Management Information Systems, 2010.1(1): 1-20.
- Shibnath Mukherjee, Aryya Gangopadhyay, and Zhiyuan Chen. A Partial
Optimization Approach for Privacy Preserving Frequent Itemset Mining.
International Journal of Computational Models and Algorithms in Medicine,
2010. 1(1): p. 19-33. IGI Global, Hershey, PA.
- Madhu V. Ahluwalia, Aryya Gangopadhyay, and Zhiyuan Chen. Preserving Privacy
in Mining Quantitative Association rules, International Journal of Information
Security and Privacy, 2009. 3(4): p. 1-17. IGI Global, Hershey, PA.
- Tamas Gal, Zhiyuan Chen, and Aryya
Gangopadhyay, A Privacy Protection
Model for Patient Data with Multiple Sensitive Attributes. International
Journal of Information Security and Privacy, 2008. 2(3): p. 28-44.
- Shibnath Mukherjee, Madhushri Banerjee,
Zhiyuan Chen, and Aryya Gangopadhyay, A
Privacy Preserving Technique for Distance-based Classification with Worst Case
Privacy Guarantees. Data & Knowledge Engineering, 2008. 66(2): p.
264-288.
- Shibnath Mukherjee, Zhiyuan Chen, and
Aryya Gangopadhyay, A Fuzzy Programming
Approach for Data Reduction and Privacy in Distance Based Mining.
International Journal of Information and Computer Security, 2008. 2(1):
p. 27-47.
- Shibnath Mukherjee, Zhiyuan Chen, and
Aryya Gangopadhyay, A Privacy Preserving
Technique for Euclidean Distance-Based Mining Algorithms Using Fourier-Related
Transforms. VLDB Journal, 2006. 15(4): p. 293¨C315.
- Shibnath Mukherjee, Zhiyuan Chen, Aryya
Gangopadhyay, and Stephen Russell,
A Secure Face Recognition System
for Mobile-devices without The Need of Decryption. Workshop on Secure
Knowledge Management (SKM 2008), 2008, Dallas, Texas.
- Madhu Ahluwalia, Zhiyuan Chen, Aryya
Gangopadhyay, and Zhiling Guo, Preserving Privacy in Supply Chain Management:
a Challenge for Next Generation Data Mining. NSF Symposium on next
generation data mining, 2007.
Data Exploration and Navigation
Due to the
large volume of data stored in many different databases, information overload
becomes one of the major obstacles for ordinary people to search for useful
information in a database. My research on data exploration and navigation aims
to use navigation and search techniques to help user quickly useful information.
The novelty of my approach is as follows: (1) my approach takes into account the
diversity of user preferences, (2) my approach uses probabilistic models and is
robust to poor quality data.
Recent Publications:
- Liang Tang, Tao Li, Yexi Jiang, and zhiyuan chen. Dynamic Query Forms for Database Queries. IEEE Transactions on Knowledge and Data Engineering, 26(9): 2166 - 2178, 2014.
- Zhiyuan
Chen, Tao Li, and Yanan Sun, A Learning Approach to SQL Query Results Ranking Using Skyline and Users' Current Navigational Behavior. IEEE Transactions on Knowledge and Data
Engineering, Volume 25 Issue 12, December 2013,
Pages 2683-2693.
- Shenghuo Zhu, Tao Li, Zhiyuan Chen,
Dingding Wang, and Yihong Gong,
Dynamic Active Probing of Helpdesk Databases. International Conference
on Very Large Data Bases, 2008, Auckland, New Zealand: p. 748-760(Acceptance
rate: 16.7%).
- Zhiyuan Chen and Tao Li,
Addressing Diverse User
Preferences in SQL-Query-Result Navigation. ACM SIGMOD Conference, 2007,
Beijing, China: p. 641-652(Acceptance rate: 14%).
¡¡
- Navin Kumar, Aryya Gangopadhyay, Sanjay Bapna,
George Karabatis, and Zhiyuan Chen,
Measuring interestingness of discovered skewed patterns in data cubes.
Decision Support Systems, 2008. 46(1): p. 429-439.
¡¡
- Navin Kumar, Aryya Gangopadhyay, George
Karabatis, Sanjay Bapna, and Zhiyuan Chen,
Navigation Rules for Exploring Large
Multidimensional Data Cubes. International Journal of Data Warehousing
and Mining, 2006. 2(4): p. 27-48.
¡¡
- Dongsong Zhang, George Karabatis, Zhiyuan
Chen, Boonlit Adipat, Liwei Dai, Tony Zhang, and Yu Wang,
Personalization and Visualization on
Handheld Devices. ACM Symposium on Applied Computing, 2006, Dijon,
France: p. 1008-1012.
Adversarial Learning¡¡
With the arrival of big data era,
data mining techniques have been widely used to build detection models for cyber
security applications such as spam filtering, virus or malware detection, and
intrusion detection. At the same time, attackers may try to modify their attack
to evade detection. For example, an email spammer may drop certain words or
symbols from spam emails to avoid detection of spam filtering software. An
attacker may use a variant of an attack to evade detection by an intrusion
detection system. My research on adversarial learning studies possible evasion
attacks against cyber security protection techniques as well as techniques to
increase robustiness of cyber security techniques. This research has been funded
by IBM as part of the project "Accelerating
Cognitive Cyber Security".
Recent Publications:
-
Fan Yang, Zhiyuan Chen, Aryya Gangopadhyay, Using
Randomness to Improve Robustness of Tree-based Models Against Evasion
Attacks, IEEE Transactions on Knowledge and Data Engineering, accepted in
2020.
-
Fan Yang, Zhiyuan Chen, Aryya Gangopadhyay.
Using Randomness to Improve Robustness of Tree-based Models Against Evasion
Attacks, 5th ACM International Workshop on Security and Privacy
Analytics 2019, Dallas, Texas, March 27, 2019.
-
Ashwinkumar Ganesan, Pooja Parameshwarappa, Akshay
Peshave, Zhiyuan Chen, Tim Oates, Extending Signature-based Intrusion
Detection Systems With Bayesian Abductive Reasoning. DYnamic and Novel
Advances in Machine Learning and Intelligent Cyber Security (DYNAMICS)
Workshop, December 3-7, San Juan, Puerto Rico, USA, 2018.
-
Pooja Parameshwarappa, Zhiyuan Chen, Gangopadhyay,
Analyzing Attack Strategies Against Rule Based Intrusion Detection Systems,
International Workshop on Analytics for Security in Cyber Physical Systems,
Varanasi, India, January 4, 2018.
Efficient and scalable RDF store with support for
federated search and reasoning
RDF is the way to represent data and knowledge on
semantic web. One challenge in big data era is to efficiently store and process
large scale RDF data in a distributed and sometimes resource limited
environment. I am collaborating with Dr.
Adina Crainiceanu from US
Naval Academy on enhancing the capability of Rya (a RDF triple store initially
developed at USNA) in such environment. The current focus is
to develop algorithms for federated search and reasoning over multiple
Rya instances. Compared to existing work in the literature, our solution
addresses limited storage and network connectivity capabilities. This project is funded by USNA and Navy.
Recent Publications:
Fan Yang, Adina Crainiceanu, Zhiyuan Chen, Don
Needham, Cluster-Based Join for Geographically Distributed Big RDF Data,
IEEE BigData Congress, accepted, 2019. (Acceptance rate 23%).
¡¡
Past Projects
Semantic-based Search and Data Integration Using Semantic Networks
There is a
great need to find relevant information from diverse data sources. Existing
keyword based search techniques fail to take into account implicit relationships
between different data objects. For example, project managers and software
developers often want to find out software modules that will be affected by a
certain change. This information can not be easily returned using a keyword
search query.
My research
develops a technique that uses semantic network to capture relationships between
data objects and helps users find related information. The semantic network can
also be used in data integration to find relevant data sets to integrate.
Recent Publications:
-
Mikael Lindvall, Raimund L. Feldmann, George
Karabatis, Zhiyuan Chen, and Vandana P. Janeja, Searching for Relevant
Software Change Artifacts using Semantic Networks. ACM Symposium on Applied
Computing, 2009, Hawaii.
¡¡
- Zhiyuan Chen,
Aryya Gangopadhyay, George Karabatis, Steve Holden, Michael McGuire, and
Cambridge Elsevier, MA., Semantic
Integration of Government Data for Water Quality Management. Journal of
Government Information Quarterly, 2007. 24(4): p. 716-735.
¡¡
- Zhiyuan Chen, Aryya Gangopadhyay, George
Karabatis, Michael McGuire, and Claire Welty,
Semantic Integration and Knowledge Discovery for Environmental Research.
Journal of Database Management, 2007. 18(1): p. 43-68.
Database
compression
Since CPU speed improves much faster than disk
speed, it makes sense to compress data on disk to achieve better I/O
performance. I have investigated novel compression techniques for databases
(since database needs to support fine granularity access unit such as rows or
cells) and query optimization techniques to balance the I/O savings and
decompression overhead.
Publications:
¡¡
XML and substring selectivity
estimation and XML indexing
This project investigates techniques to estimate
the selectivity of XML and substring queries as well as XML indexing techniques.
Publications:
- Zhiyuan Chen, Johannes Gehrke, Flip Korn,
Nick Koudas, Jayavel Shanmugasundaram, and Divesh Srivastava,
Index structures for matching XML twigs
using relational query processors. Data & Knowledge Engineering, 2007.
60(2): p. 283-302.
¡¡
- Zhiyuan Chen, Johannes Gehrke, Flip Korn,
Nick Koudas, Jayavel Shanmugasundaram, and Divesh Srivastava, Index
Structures for Matching XML Twigs Using Relational Query Processors.
International Workshop on XML Schema and Data Management (XSDM'05), 2005.
¡¡
- Zhiyuan Chen,
Flip Korn, Nick Koudas, and S. Muthukrishnan,
Generalized Substring Selectivity Estimation.
Journal of Computer and System Sciences, 2003. 66(1): p. 98-132.
¡¡
- Zhiyuan Chen, H.V. Jagadish, Flip Korn,
Nick Koudas, S.Muthukrishnan, Raymond Ng, and Divesh Srivastava,
Counting Twig Matches in A Tree.
International Conference on Data Engineering (ICDE), 2001: p. 595-604(Acceptance
rate: 17%).
¡¡
- Zhiyuan Chen, Flip Korn, Nick Koudas, and
S.Muthukrishnan, Selectivity Estimation for
Boolean Queries. ACM SIGMOD-SIGACT-SIGART Symposium on Principles of
Database Systems (PODS), 2000: p. 216-225(Acceptance rate: 22%).
¡¡
Workload-aware
mapping of XML to relational tables
I studied the
problem of storing XML data into a relational storage such that the evaluation
of queries over such XML data is optimized. Unlike existing work, this work
takes into account the interplay of logical design (how to store XML in tables)
and physical design (how to select indexes).
Publications:
- Surajit Chaudhuri, Zhiyuan Chen, Kyuseok
Shim, and Yuqing Wu, Storing XML (with
XSD) in SQL Databases: Interplay of Logical and Physical Designs. IEEE
Transaction on Knowledge and Data Engineering, 2005. 17(12): p.
1595-1609.
¡¡
- Surajit Chaudhuri, Zhiyuan Chen, Kyuseok
Shim, and Yuqing Wu, Storing XML (with XSD) in SQL Databases: Interplay of
Logical and Physical Designs. International Conference on Data Engineering (ICDE),
2004(Acceptance rate: 20%).