XML is a W3C recommendation for creating special-purpose markup languages. It is a simplified subset of SGML, capable of describing many different kinds of data. Its primary purpose is to facilitate the sharing of structured text and information across the Internet. Languages based on XML (for example, RDF, RSS, MathML, XSIL and SVG) are themselves described in a formal way, allowing programs to modify and validate documents in these languages without prior knowledge of their form.
Before the advent of generalised data description languages such as SGML and XML, software designers had to define special file formats or small languages to share data between programs. This required writing detailed specifications and special-purpose parsers and writers.
XML's regular structure and strict parsing rules allows software designers to leave parsing to standard tools, and since XML provides a general, data model-oriented framework for the development of application-specific languages, software designers need only concentrate on the development of schemas for their data, at relatively high levels of abstraction.
An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic constraints imposed by XML itself. A number of standard and proprietary XML schema languages have emerged for the purpose of formally expressing such schemas, and some of these languages are XML-based, themselves.
Well-tested tools exist to validate XML files against a schema in order to automatically verify whether the document conforms to constraints expressed in the schema. Other usages of schemas exist: XML editors, for instance, can use schemas to support the editing process.
The oldest XML schema format is the DTD (Document Type Definition), which is inherited from SGML. While DTD support is ubiquitous due to its inclusion in the XML 1.0 standard, it is seen as limited for the following reasons:
* No support for newer features of XML, most importantly namespaces.
* Lack of expressivity. Certain formal aspects of an XML document cannot be captured in a DTD.
* Custom non-XML syntax to describe the schema, inherited from SGML.
A newer XML schema language, described by the W3C as the successor of DTDs, is simply called XML Schema, also referred to as XML Schema Definition (XSD). XSD schemas are far more powerful than DTDs in describing XML languages. Additionally XSD uses an XML based format, which makes it possible to use the XML toolset to help process XML schema. It also becomes possible to write a schema for the schema language itself. Criticisms of XSD are:
* Standard is very large, which makes it difficult to understand and implement.
* XML-based syntax leads to verbosity in schema description, which makes XSDs harder to read and write.
Another XML popular schema language is RELAX NG. Initially standardized by OASIS and now also a ISO international standard (as part of DSDL), RELAX NG comes in two formats, an XML based syntax and a non-XML compact syntax. The compact syntax aims to increase readability and writability, but since there is a well-defined way to translate compact syntax to the XML syntax and back again the advantage of using standard XML tools is not lost. RELAX NG has a more compact definition which makes it easier to implement than XSD.
Some schema languages not only describe the structure of a particular XML format but also offer limited facilities to influence processing of individual XML files that conform to this format. DTDs and XSDs both have this ability; they can for instance provide attribute defaults. RELAX NG intentionally does not provide these facilities.
