IS 721: Semi-structured Data Management

Description: This course offers understanding of the latest technologies to manage semi-structured data such as XML and provides hands-on experience on managing and querying XML data using Oracle DBMS. This course also introduces students to two important application areas of semi-structured data: data exchange and data privacy. Topics include, but are not limited to basic concepts of XML, XML Schema (XSD), XML query languages such as XPath and XQuery, storing XML in databases, querying XML in databases, publishing XML from databases, privacy issues for data sharing, solutions to privacy issues including Platform for Privacy Preferences and XML encryptions, privacy preserving data mining, and economic aspects of data privacy. Students will keep abreast of the latest research innovations in the field of semi-structured data management, data exchange, and data privacy. There will be database programming assignments to familiarize students with the course topics. In addition, a group project will be part of the course to expose students to real life application of semi-structured data management technologies.

 

Prerequisites: IS 620 or equivalent, or permission of instructor.  Knowledge of PL/SQL is highly recommended.

 

Please contact Dr. Chen at zhchen@umbc.edu if you have any questions.

 

FAQ:

   

What is this course about?

 

This course covers a very hot topic in both industry and research: XML data management. XML has become the standard of data exchange over the internet and this course covers not only the basic knowledge of XML but also how to use XML in databases, which is extremely important if you are looking for a database job.

 

More specifically, this course covers basic stuff about XML (XML document, XML schema, XPath, XQuery). It also covers two important application areas of XML: data exchange and data privacy. For data exchange, this course covers how to use XML to exchange data between databases. From the database industry¡¯s point of view, this is a major way of how XML will be used in database applications.

 

Data privacy itself is a very hot topic recently. As data sharing becomes ubiquitous, there are great concerns of privacy leak (e.g., revealing the identity of a patient or an online shopper). This course will cover some general issues about data privacy, as well as how XML has been used in some solutions to privacy problems. The instructor and Dr. Gangopadhyay are actively engaged in research in this area and have been recently awarded a grant from NSF. This course is a great starting point if you are interested in research in this area.  

 

Does this course cover new content?

 

This course has a small amount of overlap with IS 620 on PL/SQL and with IS 651 on the basic stuff about XML, but the majority of the content is new (e.g., how to use XML data in databases and topics on data privacy).

 

What are the requirements for this course?  

 

This course will include assignments, a mid-term exam on XML basics (covering about 1/3 of content of the whole course), and a group project (smaller than what you did in 620). In addition, students will read some research papers and each student will be asked to present one of them in class.

 

Do I have to take IS 620 before taking this course? 

 

IS 620 is recommended. However, as long as you have some experiences with databases, you will be able to take this course. You need to use PL/SQL in some of the assignment and the project, but you can pick it up very quickly if you have previous experiences with databases.