What is XML?
An XML file is a text file with a particular structure.
XML is a notation, ie a way to write the information.
It uses tags to define the information, for example:
<TITLE> 20,000 Leagues Under the Sea </ TITLE><AUTHOR> Jules Vernes </ AUTHOR>
As you see the <TITLE> or <AUTHOR> used to delimit the information corresponding to the title and the author.
<TITLE> announces the start of title information.
</ TITLE> announces the end of the title information (note the slash "/").
Those who have already HTML are in familiar territory: a few differences, XML is very similar to HTML, but in XML you can define your own tags.
hierarchy
XML allows nested tags. That is, a tag can contain information, but other tags.
Take a small example: I will create a library.
<LIBRARY>
<ROMAN>
<TITLE> Imajica </ TITLE>
<AUTHOR> Clive Barker </ AUTHOR>
<PRICE> 6 </ PRICE>
</ ROMAN>
<ROMAN>
<TITLE> Dune </ TITLE>
<AUTHOR> Frank Herbert </ AUTHOR>
<PRICE> 7 </ PRICE>
</ ROMAN>
<MAGAZINE>
<TITLE> Science and Life </ TITLE>
<DATEPARUTION> 2005-02-01 </ DATEPARUTION>
</ MAGAZINE>
<ROMAN>
<TITLE> Christine </ TITLE>
<AUTHOR> Stephen King </ AUTHOR>
<PRICE> 5 </ PRICE>
</ ROMAN>
</ LIBRARY>
Your library contains various things: three novels and magazine.
Each novel has a title, an author and a price.
Each magazine has a title and a release date.
It is thus nest as one wants the information.
XML is very well suited to the representation of hierarchical information.
What interest to use XML?
You will remaqué our example above is humanly understandable.
With tags, the computer is also able to deal with the content (to separate the information.)
This is one of the advantages of XML: it is one of the few formats that can be read by both a human and a computer.
Moreover, by agreeing on tags to use, XML can be used to exchange information between different people and software.
XML, through the use of UTF-8 encoding, supports very well all the alphabets of the world.
But on top of that, XML is surrounded by a bunch of tools to manipulate XML documents: XSD, XSLT, XQuery ...
XSD: Check
Above, I invented a special XML format for my library.
I would have to give a precise definition of its structure (which tag should contain any other, what kind of information can contain a tag (number, text ...), which tags are mandatory or not, etc.).
This is what allows the XSD format.
I'll write an XSD file that will contain the definition of my library structure.
Anyone can then use the XSD file to verify that quelquonque XML file is the same size as my library (even if it contains other novels and magazines). With the XSD file, the computer will be able to tell whether an XML file corresponds or not to the library structure.
If I get an XML file and the computer (through XSD) said he is in the right format, I know I will be able to understand the content (since it will not be an unknown structure).
(Note: XSD is itself an XML file!)
Here is an example (do not focus too much on the content, that is, for example, purely optional):
<? Xml version = "1.0" encoding = "UTF-8"?>
<Xs: schema xmlns: xs = "http://www.w3.org/2001/XMLSchema" elementFormDefault = "qualified" attributeFormDefault = "unqualified">
<Xs: element name = "LIBRARY">
<Xs: complexType>
<Xs: choice maxOccurs = "unbounded">
<Xs: element name = "MAGAZINE">
<Xs: complexType>
<Xs: sequence>
<Xs: element name = "TITLE" type = "xs: string" />
<Xs: element name = "DATEPARUTION" type = "xs: date '/>
</ Xs: sequence>
</ Xs: complexType>
</ Xs: element>
<Xs: element name = "ROMAN">
<Xs: complexType>
<Xs: sequence>
<Xs: element name = "TITLE" type = "xs: string" />
<Xs: element name = "AUTHOR" type = "xs: string" />
<Xs: element name = "PRICE" type = "xs: integer" />
</ Xs: sequence>
</ Xs: complexType>
</ Xs: element>
</ Xs: choice>
</ Xs: complexType>
</ Xs: element>
</ Xs: schema>
I set that may contain LIBRARY choice (xs: choice) quelquonque a number (maxOccurs = "unbounded") and elements of ROMAN MAGAZINE.
Each MAGAZINE must include a sequence (xs: sequence) of two components: a TITLE and DATEPARUTION.
The DATEPARUTION must contain a date (type = "xs: date," year-month-day format)
etc.
This allows to precisely define the format of any XML file "Library".
XSLT: Transforming
Imagine that I want to publish my library on the Web (HTML), and Excel (CSV) file. He'll have to manually set an HTML file and the CSV file.
And the content of my library changes, I will have to update manually the HTML file and the CSV file.
It's tedious.
XSLT is will automate this.
I'll write an XSLT file that will describe the transformations to apply to my XML library into a HTML file.
Similarly, I will write a second XSLT file that will describe the transformations to apply to my XML into a CSV file.
I start by writing an XSD that will transform HTML.
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>Ma bibliotheque</h2>
<hr/>
<h4>Magazines</h4>
<table border="1">
<tr bgcolor="#ffe4b5"><th align="left">Titre</th><th align="left">Date de parution</th></tr>
<xsl:for-each select="/BIBLIOTHEQUE/MAGAZINE">
<tr><td><xsl:value-of select="TITRE"/></td><td><xsl:value-of select="DATEPARUTION"/></td></tr>
</xsl:for-each>
</table>
<hr/>
<h4>Romans</h4>
<table border="1">
<tr bgcolor="#ffe4b5"><th align="left">Titre</th><th align="left">Auteur</th></tr>
<xsl:for-each select="/BIBLIOTHEQUE/ROMAN">
<tr><td><xsl:value-of select="TITRE"/></td><td><xsl:value-of select="AUTEUR"/></td></tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
This will create the following HTML file:
(I have deliberately omitted the PRICE of my XLS file, because I do not want to publish it in my HTML page.)
And I proceeded in the same way to create an XSLT file to CSV (Excel):
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">Type,Titre, Auteur, Date de parution
<xsl:for-each select="/BIBLIOTHEQUE/MAGAZINE">Magazine,<xsl:value-of select="TITRE"/>,,<xsl:value-of select="DATEPARUTION"/>,
</xsl:for-each>
<xsl:for-each select="/BIBLIOTHEQUE/ROMAN">Roman,<xsl:value-of select="TITRE"/>,<xsl:value-of select="AUTEUR"/>,
</xsl:for-each></xsl:template>
</xsl:stylesheet>
This will create a .CSV file that can then be directly opened in Excel or OpenOffice.
What's great is that if I change my library, simply re-apply the XSLT to automatically generate HTML and CSV files!
I will have nothing to do by hand.
And if I give these XSLT to everyone, everyone can publish his library in the same way.
XML allows - in theory - to store information independently of their representation.
We can then, through XSLT, represent this information in one way or another, as needed.
XQuery and XPath
XQuery and XPath can extract the information you are interested in an XML document (A bit like SQL queries, for those who know.)
For example, to extract only the novels of our library, we would do:
/BIBLIOTHEQUE/ROMANwhich gives:
<ROMAN>
<TITRE>Imajica</TITRE>
<AUTEUR>Clive Barker</AUTEUR>
<PRIX>6</PRIX>
</ROMAN>
<ROMAN>
<TITRE>Dune</TITRE>
<AUTEUR>Frank Herbert</AUTEUR>
<PRIX>7</PRIX>
</ROMAN>
<ROMAN>
<TITRE>Christine</TITRE>
<AUTEUR>Stephen King</AUTEUR>
<PRIX>5</PRIX>
</ROMAN>
Or, if you want all authors whose novels are more than 5 euros:
/BIBLIOTHEQUE/ROMAN[PRIX>5]/AUTEUR
which gives:
<AUTEUR>Clive Barker</AUTEUR>
<AUTEUR>Frank Herbert</AUTEUR>
This is exactly what we wanted: only the authors of novels whose price is greater than 5 euros.
XQuery and XPath therefore possible to extract just the information you want from any XML file.


No comments:
Post a Comment