Distributed Systems

##Chapter 4 - SOAP ###Introduction SOAP was originally defined as the "simple object access protocol" in 1998 in a Microsoft project, but since is really has nothing to do with objects, it is now just a name and not an acronym. It is the messaging protocol for XML web services and a requestor sends the message to the provider of a service in a service-oriented architecture (SOA) using SOAP. SOAP is a wire protocol that gets information from one place to another at the application layer in a distributed system. SOAP is a standard XML document that encodes that information. SOAP is transport independent and can use any network protocol to carry the XML, but it is almost always used with HTTP using the regular web infrastructure. The SOAP document is put into the entity body of HTTP. This is very convenient since all information systems today connect to the web and already have firewalls configured to pass port 80 traffic. This was one of the big problems with CORBA (chapter 2). Since there was no such common infrastructure in those days, it had to be created from scratch. Recall IIOP which was the wire protocol for CORBA. Figure 4.1 repeats the general architecture of XML web services under SOA. The system works like this: 1. The requestor searches a registry database for the kind of web service that it needs. A standard XML protocol for this search is the universal description, discovery and integration protocol (UDDI). UDDI is not a very popular standard and we will discuss alternatives for the registry in chapter 5. 2. When it finds one, it requests a description of that service from the provider of the service. This description tells the requestor how exactly to invoke the service using SOAP. An XML standard for this is the web services description language (WSDL) and we learn this in chapter 5. 3. Finally the requestor invokes the service using SOAP and the provider returns the result with SOAP. ![SOA with XML web services protocols](is651-images/f4-1_opt.png) Figure 4.1. SOA with XML web services protocols. When SOAP was first developed, it was used to do RPC-style messaging where for example, the SOAP request made a procedure call and a SOAP response returned the result of the procedure. In those days, there was no XMLSchema and so the SOAP specification included a SOAP encoding that standardized how data structures and types were encoded in SOAP messages. We will see this distinction more specifically in the next chapter on WSDL, but since the adoption of XMLSchema, the SOAP encoding is depreciated and only XMLSchema is typically used to specify types in a SOAP massage now. So we will learn XMLSchema and the related concept of namespaces before we turn back to learning SOAP XML. ###XMLSchema and Namespaces Recall that XMLSchema is the alternative and more modern method of validating XML documents. We will use the on-line w3schools tutorial on XMLSchema (all sections of Learn Schema) for our examples in this chapter. See the on-line syllabus for further information. XMLSchema is a schema language that was developed in 2001 and published by the W3C. It has become the dominant schema language today. An older schema language is the one we learned earlier, DTD. Other modern alternatives are RELAX NG and Schematron. A schema language describes a set of rules for a markup language so it can be validated automatically. XMLSchema has three major differences from DTDs: - it has XML syntax - it has datatypes - it uses namespaces XMLSchemas, as we will see below, have XML syntax unlike DTDs. XMLSchema files are separate (they cannot be internal) from the XML file(s) that they describe and have the extension .xsd by convention. XMLSchemas also define datatypes unlike DTDs. There are three kinds of types: - built-in datatypes - simple datatypes - complex datatypes Built-in datatypes correspond to basic fundamental types such as decimal, integer, and string. Simple types can be created for XML elements that contain only text, while complex type can be made up of more complex combinations of elements. See the w3schools tutorial for examples of each. XMLSchemas also support restrictions that can define in detail the acceptable values for any elements or attributes in XML, such as only integers between 0 and 120. As you can see, XMLSchema is much more powerful than DTDs. It is also the one used with XML web service protocols. XMLSchema is very similar to a class in object-oriented programming as illustrated in figure 4.2. A class is like a template for creating instances of objects and a schema serves the same purpose and also allows validating those instances. ![Classes are similar to schemas](is651-images/f4-2_opt.png) Figure 4.2. Classes are similar to schemas. Listing 4.1 shows the very simple schema of the basic note example at w3schools. We see that the example has XML syntax and datatypes, but it also has namespaces. Namespaces are a general concept from programming to avoid name collision. Name collision happens when the same name is chosen mistakenly for two different objects. For example, if you and I were collaboratively programming, but in separate locations, we would have to make up variable names in our program. It might happen that we would sometimes accidentally choose the same variable name for different logical things. In order to avoid this, we can use namespaces where the names must only be unique within the namespace. So if I used the namespace kip and the other programmer used the namespace sue we could choose any name we wanted. We might adopt a prefix convention where our names were prefixed with our namespace to make the name unique as in kip:item and sue:item. < ?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> Listing 4.1. A simple XMLSchema file. This is exactly what XML namespaces do. In listing 4.1, we see that the prefix xs is used for all the schema tags. The prefix is defined with the xmlns attribute which points to the actual namespace that always has the form of a URL. These namespace identifiers are arbitrary and only need to guarantee uniqueness. They do not need to point to anything on the web. But for standard namespaces like the one for XMLSchema, they usually do point to some documentation as you can verify. The prefixes are arbitrary and you can make them up, although for standard namespaces, there are usually conventions like the xs in the example although you will often see xsd instead. A default namespace is declared with the xmlns attribute without any prefix. This means that all elements with no prefix are in that default namespace. XMLSchemas are for defining a namespace for the elements in an XML document as we will see below and so the default is reserved for the content being validated. We will use the same purchase order example that we used in chapter 2 for DTDs, but validate it with a schema this time (the example comes from the XSD Example section of the w3schools schema tutorial, but note that it does not have a namespace there as it does here). Listing 4.2 shows the XML document and listing 4.3 shows the schema. The bulleted list describes what we need to understand for these examples. One of the most important things to understand is how the two separate documents XML and XSD reference each other. < ?xml version="1.0" encoding="UTF-8"?> <shiporder orderid="889923" xmlns="http://userpages.umbc.edu/~canfield/po" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://userpages.umbc.edu/~canfield/po shiporder2.xsd"> <orderperson>John Smith</orderperson> <shipto> <name>Ola Nordmann</name> <address>Langgt 23</address> <city>4000 Stavanger</city> <country>Norway</country> </shipto> <item prodid="1"> <title>Empire Burlesque</title> <note>Special Edition</note> <quantity>1</quantity> <price>10.90</price> </item> <item prodid="2"> <title>Hide your heart</title> <quantity>1</quantity> <price>9.90</price> </item> </shiporder> listing 4.2. XML document (shiporder2.xml) that uses XMLSchema. For the XML document in figure 4.2: - All the namespaces are defined inside the root shiporder tag. - The default namespace xmlns="http://userpages.umbc.edu/~canfield/po" is one I made up for my purchase order (PO) document. Remember that they only need to guarantee uniqueness and so I used my gl account URL. I know that any unique name (like po) in my account is unique on the web since the web would not work if URLs were not unique. Furthermore, that URL does not actually have to point to anything and indeed it does not. - Note that all the tags in the PO do not have a prefix since they are default. There can only be one default. - The xsi prefix is used once to define the XML document as an instance of a schema - in this case from the file shiporder.xsd and with the namespace http://userpages.umbc.edu/~canfield/po. - Note that the schema filename and the namespace are separated with a space in the schemaLocation attribute. For the XSD document in listing 4.3: - The targetNamespace attribute defines this as the schema that defines my PO markup language. It defines all tags and structure for a PO document in my namespace. - The default is also set to my namespace and so the schema tags must all have a prefix. - The attribute value qualified just means that everything must be correctly prefixed. - It is best to start from the bottom of the schema using named types like this one in order to understand it better. - We define our own type and make up the name shipordertype and then define that type in a complexType block just above. This type ends up being a type for the entire XML document since it is for the root tag. - The complex type of shipordertype is defined as a sequence of orderperson, shipto and multiple items. Note that the sequence tag is equivalent to the parentheses in DTDs and the attribute maxOccurs="unbounded" is equivalent to the +. So items can be repeated. - The itemtype is also defined as a sequence of tags. The note tag is optional due to the attribute. (Recall that the terms tag and element are used interchangeably). Many of the tags are defined with built-in types such as xs:string. - The inttype is not defined as a built-in type because we needed to put a restriction on it. The inttype is a simple type because it only has simple content with no attributes. In the XML document, we have <quantity>1</quantity> for that type. < ?xml version="1.0" encoding="utf-8" ?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://userpages.umbc.edu/~canfield/po" xmlns="http://userpages.umbc.edu/~canfield/po" elementFormDefault="qualified"> <xs:simpleType name="inttype"> <xs:restriction base="xs:positiveInteger"/> </xs:simpleType> <xs:complexType name="shiptotype"> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="address" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="country" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:complexType name="itemtype">  <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="note" type="xs:string" minOccurs="0"/> <xs:element name="quantity" type="inttype"/> <xs:element name="price" type="xs:decimal"/>  </xs:sequence> </xs:complexType> <xs:complexType name="shipordertype"> <xs:sequence> <xs:element name="orderperson" type="xs:string"/> <xs:element name="shipto" type="shiptotype"/> <xs:element name="item" maxOccurs="unbounded" type="itemtype"/> </xs:sequence> <xs:attribute name="orderid" type="inttype" use="required"/> </xs:complexType> <xs:element name="shiporder" type="shipordertype"/> </xs:schema> Listing 4.3. XMLSchema document (shiporder.xsd). This is a short example, but there are a lot of new concepts here. Be sure and study this example carefully and understand all the details. Now that we understand namespaces and XMLSchema, we can go back to learning SOAP. ###SOAP structure The SOAP XML structure is defined in this section. It is, of course, defined in an XMLSchema. In listing 4.5, we see that it is defined in the namespace http://www.w3.org/2001/12/soap-envelope and since it is a standard schema, it points to the actual schema for SOAP where all the tags for SOAP are defined. All SOAP documents consist of an envelope and inside that tag is the optional header and required body tags. Lisiting 4.4 shows an RPC-style request for the procedure GetProductPrice with the parameter of productId. The header includes a requirement that whatever provider gets this message, it must understand the transaction ID. < ?xml version="1.0" encoding="utf-8"?> <soap:Envelope xmlns:soap="http://www.w3.org/2001/12/soap-envelope"> <soap:Header> <tx:Trans xmlns:tx="http://www.example.org/transaction/" soap:mustUnderstand="1"> 234 </tx:Trans> </soap:Header> <soap:Body xmlns:m="http://www.example.org/product-prices"> <m:GetProductPrice> <m:productId>450R</m:productId> </m:GetProductPrice> </soap:Body> </soap:Envelope> Listing 4.4. An RPC-style SOAP document. Anything in the header modifies or offers services to the main payload in the body. We will see many more examples of the header elements in chapter 6. There are two styles for SOAP messages. We have seen the RPC-style and the other is the document style. The document style sends a complete XML document rather than a procedure call as in listing 4.5. It is a much more loosely-coupled style than RPC and more congruent with SOA. Figure 4.5 shows a document-style SOAP message that uses our PO. Note that all the PO tags have been prefixed and that we do not include the information about the XMLSchema location. This is because it is given in the WSDL and we will see this in chapter 5. The Body element is a generic container in that it can contain any number of elements from any namespace. This is ultimately where the data goes that you"re trying to send. Since there are unknown foreign namespaces in SOAP, the schema for SOAP must allow these unknown and unpredictable namespaces. It does this by using wildcard attributes that begin with a ##. For example, ##any is for any namespace. We will take a look at the schema for SOAP in one of the exercises. < ?xml version="1.0" encoding="utf-8"?> <soap:Envelope xmlns:soap="http://www.w3.org/2001/12/soap-envelope"> <soap:Header> <tx:Trans xmlns:tx="http://www.example.org/transaction/" soap:mustUnderstand="1"> 234 </tx:Trans> </soap:Header> <soap:Body> <po:shiporder po:orderid="889923" xmlns:po="http://userpages.umbc.edu/~canfield/po"> <po:orderperson>John Smith</po:orderperson> <po:shipto> <po:name>Ola Nordmann</po:name> <po:address>Langgt 23</po:address> <po:city>4000 Stavanger</po:city> <po:country>Norway</po:country> </po:shipto> <po:item> <po:title>Empire Burlesque</po:title> <po:note>< Special Edition > </po:note> <po:quantity>1</po:quantity> <po:price>10.90</po:price> </po:item> <po:item> <po:title>Hide your heart</po:title> <po:quantity>1</po:quantity> <po:price>9.90</po:price> </po:item> </po:shiporder> </soap:Body> </soap:Envelope> Listing 4.5. A Document-style SOAP document. SOAP is now in version 1.2 and people usually refer to that as version 2. SOAP 1.1 is still the dominant version in use, however, and is called version 1. For XML web services, a requestor creates a SOAP message and typically sends it to a provider using an HTTP binding as in figure 4.3. The XML is standard for all web services participants, but of course, the actual program doing the web service is written in some programming language that interfaces with the XML - typically J2EE, .NET, or a web scripting framework. ![Distributed messaging with SOAP](is651-images/f4-8_opt.png) Figure 4.3. Distributed messaging with SOAP. A SOAP message may go to many different hosts in a distributed system in order to, for example, complete a workflow. There can be only one ultimate receiver, however, that can operate on the body. All the intermediaries can only operate on the header as we will see in chapter 6. The intermediaries typically do things like encryption, authentication, and authorization. If a service provider gets an error in processing a SOAP document, there is a special fault tag defined in the SOAP schema that goes in the body of the SOAP response and gives information on the error. An example is shown in listing 4.6. You can see the SOAP fault codes at w3schools. Let"s do a case-study of a working SOAP web service available on-line at xmlme.com. Go to this URL and choose Web Services and then the Shakespeare web service. This shows a form where you can copy a line from a Shakespeare play (given in the form instructions) and submit it to the web service using SOAP. The SOAP response will come back that tells which play and in what speech the line occurred. < ?xml version="1.0" encoding="UTF-8"?> <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" xmlns:xsd="http://www.w3.org/1999/XMLSchema"> <soap:Body> <soap:Fault> <faultcode xsi:type="xsd:string"> soap:Client </faultcode> <faultstring xsi:type="xsd:string"> Failed to locate method. </faultstring> </soap:Fault> </soap:Body> </soap:Envelope> Listing 4.6. SOAP fault. For example, if I put in My kingdom for a horse, I get back: ![Shakespeare Service Response](is651-images/shakes_opt.png) Figure 4.4. Shakespeare Service Response. This is just a toy web service, but it will illustrate all the basics. The HTML form actually calls a server-side program that is the SOAP client, but this web application hides all that from us as the users. To see the details, choose the link More details about the Shakespeare Web Service can be found here…. from the left side of the page. This page shows a link to the service description which is the WSDL and we will cover that in the next chapter. We will concentrate on the SOAP details here. Choose the link GetSpeech which is the procedure that will be called in the SOAP message. This page shows the SOAP request and response messages for SOAP versions 1.1 and 1.2. We will use version 1.1 which shows the complete HTTP request and response. The SOAP is in the entity body. I repeat the request here for convenience in figure 4.5. ![SOAP request](is651-images/f4-10_opt.png) Figure 4.5. The SOAP request. Note: - The POST HTTP method must be used since we transport the SOAP document in the entity body. - The pathname to the web service file is given and and host HTTP header is used, so we know the URL to the web service is http://www.xmlme.com/WSShakespeare.asmx. As an interesting aside, we know that this service is implemented in Microsoft .NET due to the .asmx extension. - The last HTTP header is SOAPAction. This SOAP action gives a namespace for the called procedure GetSpeech. The SOAPAction HTTP request header field can be used to indicate the intent of the SOAP HTTP request to the web listener. The presence and content of the SOAPAction header field can be used by servers such as firewalls to appropriately filter SOAP request messages in HTTP. The SOAPAction value can be an empty string (""). - The SOAP message itself is what we would expect. It makes a call to the GetSpeech procedure with a parameter of Request. Note the namespace for GetSpeech. Since the default namespace is defined in the GetSpeech tag, all the children tags (Request) also have it. So what happens in the web form we used to access this web service is: 1. The HTML form uses a POST method and an action of some server-side program like the ones we learned in chapter 3. In this case, the action calls an ASP program since this is obviously a Microsoft shop. 2. The server-side ASP program is actually the SOAP client. It takes our input (the line string) and creates the SOAP message in figure 4.10 and sends it to the web service at the relative URL /WSShakespeare.asmx since the client and service programs are both at xmlme.com. 3. The client ASP program receives the SOAP result from the web service and formats it for a web page which is returned to the user. The user sees no SOAP. In an exercise for this chapter, we will use a different client that will show us more details. Figure 4.6 shows this process. Always remember, however, that SOAP is not designed just for user interaction but more importantly for interactions of distributed computer programs. ![Web application](is651-images/f4-11_opt.png) Figure 4.6. The web application. ###Chapter 4 Exercises Do the end-of-chapter exercises for each chapter of the book by following the link in the on-line syllabus.