XML

8 minute read

Published:

This lesson covers XML

XML

<?xml version="1.0" encoding="UTF-8"?>
<note>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>
  • XML stands for eXtensible Markup Language.
  • XML was designed to store and transport data.
  • XML was designed to be both human- and machine-readable.
  • XML is often used for distributing data over the Internet.
<?xml version="1.0" encoding="UTF-8"?>
<breakfast_menu>
<food>
    <name>Belgian Waffles</name>
    <price>$5.95</price>
    <description>
   Two of our famous Belgian Waffles with plenty of real maple syrup
   </description>
    <calories>650</calories>
</food>
<food>
    <name>Strawberry Belgian Waffles</name>
    <price>$7.95</price>
    <description>
    Light Belgian waffles covered with strawberries and whipped cream
    </description>
    <calories>900</calories>
</food>
<food>
    <name>Berry-Berry Belgian Waffles</name>
    <price>$8.95</price>
    <description>
    Belgian waffles covered with assorted fresh berries and whipped cream
    </description>
    <calories>900</calories>
</food>
<food>
    <name>French Toast</name>
    <price>$4.50</price>
    <description>
    Thick slices made from our homemade sourdough bread
    </description>
    <calories>600</calories>
</food>
<food>
    <name>Homestyle Breakfast</name>
    <price>$6.95</price>
    <description>
    Two eggs, bacon or sausage, toast, and our ever-popular hash browns
    </description>
    <calories>950</calories>
</food>
</breakfast_menu>

XML vs HTML

  • XML was designed to carry data - with focus on what data is
  • HTML was designed to display data - with focus on how data looks
  • XML tags are not predefined like HTML tags are
  • XML Does Not Use Predefined Tags
    • The XML language has no predefined tags.
    • The tags in the example above (like and ) are not defined in any XML standard. These tags are "invented" by the author of the XML document.
  • HTML works with predefined tags like <p>, <h1>, <table>, etc.
  • With XML, the author must define both the tags and the document structure.
  • XML is Extensible
    • Most XML applications will work as expected even if new data is added (or removed).
    • Imagine an application designed to display the original version of note.xml ( <body>).
    • Then imagine a newer version of note.xml with added and elements, and a removed.
    • The way XML is constructed, older version of the application can still work
  • XML Simplifies Things
    • It simplifies data sharing
    • It simplifies data transport
    • It simplifies platform changes
    • It simplifies data availability
  • Many computer systems contain data in incompatible formats. Exchanging data between incompatible systems (or upgraded systems) is a time-consuming task for web developers. Large amounts of data must be converted, and incompatible data is often lost.
  • XML stores data in plain text format. This provides a software- and hardware-independent way of storing, transporting, and sharing data.
  • XML also makes it easier to expand or upgrade to new operating systems, new applications, or new browsers, without losing data.
  • With XML, data can be available to all kinds of “reading machines” like people, computers, voice machines, news feeds, etc.

XML became a W3C Recommendation as early as in February 1998

XML Uses

  • XML Separates Data from Presentation
  • XML does not carry any information about how to be displayed.
  • The same XML data can be used in many different presentation scenarios.
  • Because of this, with XML, there is a full separation between data and presentation.
  • In many HTML applications, XML is used to store or transport data, while HTML is used to format and display the same data.
  • When displaying data in HTML, you should not have to edit the HTML file when the data changes.
  • With XML, the data can be stored in separate XML files.
  • With a few lines of JavaScript code, you can read an XML file and update the data content of any HTML page.
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>

  <book category="cooking">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30.00</price>
  </book>

  <book category="children">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>

  <book category="web">
    <title lang="en">XQuery Kick Start</title>
    <author>James McGovern</author>
    <author>Per Bothner</author>
    <author>Kurt Cagle</author>
    <author>James Linn</author>
    <author>Vaidyanathan Nagarajan</author>
    <year>2003</year>
    <price>49.99</price>
  </book>

  <book category="web" cover="paperback">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book>

</bookstore>

Example

XML News

<?xml version="1.0" encoding="UTF-8"?>
<nitf>
  <head>
    <title>Colombia Earthquake</title>
  </head>
  <body>
    <headline>
      <hl1>143 Dead in Colombia Earthquake</hl1>
    </headline>
    <byline>
      <bytag>By Jared Kotler, Associated Press Writer</bytag>
    </byline>
    <dateline>
      <location>Bogota, Colombia</location>
      <date>Monday January 25 1999 7:28 ET</date>
    </dateline>
  </body>
</nitf>

XML Weather Service

<?xml version="1.0" encoding="UTF-8"?>
<current_observation>

<credit>NOAA's National Weather Service</credit>
<credit_URL>http://weather.gov/</credit_URL>

<image>
  <url>http://weather.gov/images/xml_logo.gif</url>
  <title>NOAA's National Weather Service</title>
  <link>http://weather.gov</link>
</image>

<location>New York/John F. Kennedy Intl Airport, NY</location>
<station_id>KJFK</station_id>
<latitude>40.66</latitude>
<longitude>-73.78</longitude>
<observation_time_rfc822>Mon, 11 Feb 2008 06:51:00 -0500 EST
</observation_time_rfc822>

<weather>A Few Clouds</weather>
<temp_f>11</temp_f>
<temp_c>-12</temp_c>
<relative_humidity>36</relative_humidity>
<wind_dir>West</wind_dir>
<wind_degrees>280</wind_degrees>
<wind_mph>18.4</wind_mph>
<wind_gust_mph>29</wind_gust_mph>
<pressure_mb>1023.6</pressure_mb>
<pressure_in>30.23</pressure_in>
<dewpoint_f>-11</dewpoint_f>
<dewpoint_c>-24</dewpoint_c>
<windchill_f>-7</windchill_f>
<windchill_c>-22</windchill_c>
<visibility_mi>10.00</visibility_mi>

<icon_url_base>http://weather.gov/weather/images/fcicons/</icon_url_base>
<icon_url_name>nfew.jpg</icon_url_name>
<disclaimer_url>http://weather.gov/disclaimer.html</disclaimer_url>
<copyright_url>http://weather.gov/disclaimer.html</copyright_url>

</current_observation>

XML Tree

  • XML documents are formed as element trees.

  • An XML tree starts at a root element and branches from the root to child elements.

  • All elements can have sub elements (child elements)

  • The terms parent, child, and sibling are used to describe the relationships between elements.

  • Parents have children. Children have parents. Siblings are children on the same level (brothers and sisters).

  • All elements can have text content (Harry Potter) and attributes ( or ).

  • <root>
      <child>
        <subchild>.....</subchild>
      </child>
    </root>
    
  • The XML Prolog

    • The XML prolog is optional. If it exists, it must come first in the document.

    • XML documents can contain international characters, like Norwegian øæå or French êèé.

    • To avoid errors, you should specify the encoding used, or save your XML files as UTF-8.

    • UTF-8 is the default character encoding for XML documents.

    • <?xml version="1.0" encoding="UTF-8**"**?>
      
  • All elements must have a closing tag

  • XML tags are case sensitive. The tag is different from the tag .

  • XML Elements Must be Properly Nested

  • XML Attribute Values Must Always be Quoted

    • <note date="12/11/2007">
        <to>Tove</to>
        <from>Jani</from>
      </note>
      
  • replace the “<” character with an entity reference

    • Only < and & are strictly illegal in XML, but it is a good habit to replace > with > as well.

    • <message>salary &lt; 1000</message>
      
  • Comments in XML

    • <!-- This is a comment -->
      
  • White-space is Preserved in XML

    • XML does not truncate multiple white-spaces (HTML truncates multiple white-spaces to one single white-space):
      • XML: Hello World
  • XML Stores New Line as LF

    • Windows applications store a new line as: carriage return and line feed (CR+LF).

      Unix and Mac OSX use LF.

      Old Mac systems use CR.

      XML stores a new line as LF.

<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
  <book category="cooking">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30.00</price>
  </book>
  <book category="children">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
  <book category="web">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book>
</bookstore>
https://www.w3schools.com/xml/nodetree.gif

XML Element

  • An XML element is everything from (including) the element’s start tag to (including) the element’s end tag.

    • <price>29.99</price>
      
  • An element can contain:

    • text
    • attributes
    • other elements
    • or a mix of the above
  • <bookstore>
      <book category="children">
        <title>Harry Potter</title>
        <author>J K. Rowling</author>
        <year>2005</year>
        <price>29.99</price>
      </book>
      <book category="web">
        <title>Learning XML</title>
        <author>Erik T. Ray</author>
        <year>2003</year>
        <price>39.95</price>
      </book>
    </bookstore>
    
  • Text content:, <author>, <year>, and<price></price></year></author>

  • Element contents: and

  • Attribute: has an **attribute** (category="children")

  • Empty XML Elements

    • <element></element>
          
      <element />
      
    • Empty elements can have attributes
  • XML Naming Rules

    • Element names are case-sensitive
    • Element names must start with a letter or underscore
    • Element names cannot start with the letters xml (or XML, or Xml, etc)
    • Element names can contain letters, digits, hyphens, underscores, and periods
    • Element names cannot contain spaces
    • Any name can be used, no words are reserved (except xml).
  • XML Elements are Extensible

    • <note>
        <to>Tove</to>
        <from>Jani</from>
        <body>Don't forget me this weekend!</body>
      </note>
      
    • can be improved later without breaking existing applications

    • <note>
        <date>2008-01-10</date>
        <to>Tove</to>
        <from>Jani</from>
        <heading>Reminder</heading>
        <body>Don't forget me this weekend!</body>
      </note>
      

XML Attributes

  • XML elements can have attributes, just like HTML.

  • Attributes are designed to contain data related to a specific element.

  • XML Attributes Must be Quoted

    • Attribute values must always be quoted. Either single or double quotes can be used.

    • <person gender="female">
      <person gender='female'>
            
      <gangster name='George "Shotgun" Ziegler'>
      <gangster name="George &quot;Shotgun&quot; Ziegler">
      
  • XML Elements vs. Attributes

    • <note date="2008-01-10">
        <to>Tove</to>
        <from>Jani</from>
      </note>
          
      <note>
        <date>2008-01-10</date>
        <to>Tove</to>
        <from>Jani</from>
      </note>
          
          
      <note>
        <date>
          <year>2008</year>
          <month>01</month>
          <day>10</day>
        </date>
        <to>Tove</to>
        <from>Jani</from>
      </note>
      
  • Avoid XML Attributes?

    • attributes cannot contain multiple values (elements can)
    • attributes cannot contain tree structures (elements can)
    • attributes are not easily expandable (for future changes)
    • Incorrect
  • XML Attributes for Metadata

    • Sometimes ID references are assigned to elements. These IDs can be used to identify XML elements in much the same way as the id attribute in HTML

    • metadata should be stored as attributes, and the data itself should be stored as elements.

    • <messages>
            
        <note id="501">
          <to>Tove</to>
          <from>Jani</from>
          <heading>Reminder</heading>
          <body>Don't forget me this weekend!</body>
        </note>
            
        <note id="502">
          <to>Jani</to>
          <from>Tove</from>
          <heading>Re: Reminder</heading>
          <body>I will not</body>
        </note>
      </messages>