7.1 XML documents, DTD
Definiton
Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The design goals of XML focus on simplicity, generality, and usability across the Internet. It is a textual data format with strong support via Unicode for different human languages. Although the design of XML focuses on documents, the language is widely used for the representation of arbitrary data structures such as those used in web services.
∙ XML stands for extensible Markup Language
∙ XML is a markup language like HTML
∙ XML is designed to store and transport data
∙ XML is designed to be self-descriptive
Significance/ Importance/ Benefits of XML in Web:
1. Ease
Simplicity is the biggest advantage of using XML. Any computer can process the information and it is simple to read and comprehend. XML follows the standards of W3C and the market leaders in the software industry endorse it. Therefore, its openness is something to reckon with.
2. No limitation of tags XML is not limited to the fixed set of tags. Whenever it is needed, new tags can be developed.
3. Self-description In case of the customary databases, the data administrator sets up schemas for maintaining data records. There is no need of such definitions with XML documents as there are meta data with tags and other features. XML present a foundation for author recognition and versioning at the basic level. Any XML tag can hold numerous characteristics as in version or author.
4. Highly readable context information One of the biggest advantages of XML over the plain text format of HTML is its context information. Attributes, Tags, and element structure are present context information that can be utilized for interpreting the significance of content, clever data mining, agents, creating latest possibilities for extremely competent search engines, etc.
5. Content is important- not how it is presented XML’s motto is to elaborate the meaning of the content and not the presentation of the same. If HTML stands for “how it appears” then XML means “what it signifies and how it should appear.” To change and control the look and feel of a document or a website created with XML, there is no need to alter the content of the document. It is possible to easily render numerous presentations or views of the similar content. XML is supportive to Unicode and multilingual documents, which is essential for betterment of the applications as per the international standard of web development.
6. Assists in data assessment and aggregation XML document structure is designed in such a way that the documents can be efficiently assessed and aggregated part by part. Another prolific advantage XML is its ability to feature any possible type of data. The data might range from active components such as ACTIVEX and Java applets or multimedia data such as video, image and sound.
Differences between XML and HTML
XML and HTML were designed with different goals:
∙ XML is designed to carry data emphasizing on what type of data it is.
∙ HTML is designed to display data emphasizing on how data looks
∙ XML tags are not predefined like HTML tags.
∙ HTML is a markup language whereas XML provides a framework for defining markup languages. ∙ HTML is about displaying data,hence it is static whereas XML is about carrying information,which makes it dynamic.
EXAMPLE :
XML code for a note is given below
XML documents
An XML document is a basic unit of XML information composed of elements and other markup in an orderly package. An XML document can contains wide variety of data. For example, database of numbers, numbers representing molecular structure or a mathematical equation.
XML Document Example
A simple document is shown in the following example −
<?xml version = "1.0"?> <contact-info> <name>Tanmay Patil</name> |
<company>TutorialsPoint</company> <phone>(011) 123-4567</phone> </contact-info> |
The following image depicts the parts of XML document.
XML Elements
The XML elements are the basic building block of the XML document. It is used as a container to store text elements, attributes, media objects etc. Every XML documents contain at least one element whose scopes are delimited by start and end tags or in case of empty elements it is delimited by an empty tag.
Syntax:
<element-name attributes> Contents...</element-name>
∙ element-name: It is the name of element.
∙ attributes: The attributes are used to define the XML element property and these attributes are separated by white space. It associates the name with a value, which is a string of characters.
Example:
1. name="Geeks" |
Here, Geeks represents the value of attribute
Rules to define XML elements: There are some rules to create XML elements which are given below:
∙ An element an contain alphanumeric values or characters. But only three special characters are required in the names these are hyphen, underscore and period.
∙ Names are case sensitive. It means lower case letters have different meaning and upper case characters have different meaning. For example address, Address, aDDress are different names.
∙ Both start and end tags for elements need to be same.
∙ An element, which is a container, can contain text or elements
Empty Elements: An element in XML document which does not contains the content is known as Empty Element. The basic syntax of empty element in XML as follows:
Example 1: Following is the example of an XML document describing the address of a college student using XML elements.
1. <?xml version = “1.0”?> 2. <contactinfo> 3. <address category = “college”> 4. <name>G4G</name> 5. <College>Geeksforgeeks</College> 6. <mobile>2345456767</mobile> 7. </address> 8. </contactinfo> 9. |
Output:
1. G4G 2. Geeksforgeeks 3. 2345456767 4. |
Example 2:
1. <?xml version = "1.0"?> 2. <student> 3. <_personal_details = "Personal Details"> 4. <name>xyz</name> 5. <father_name>abc</father_name> 6. </personal_details> 7. <edu_details = "Educational Details"> 8. <hsc_perc>80%</hsc_perc> 9. <ssc_perc>98%</ssc_perc> 10. </edu_details> 11. </student> |
Output:
1. xyz 2. abc 3. 80% 4. 98% |
DTD:
DTD stands for Document Type Definition. It is a document that defines the structure of an XML document. It is used to describe the attributes of the XML language precisely. It can be classified into two types namely internal DTD and external DTD. It can be specified inside a document or outside a document. DTD mainly checks the grammar and validity of an XML document. It checks that an XML document has a valid structure or not.
Characteristics
∙ It defines the compulsory and optional elements in the XML document. ∙ It validates the structure of the XML document.
∙ It check for the grammar of the XML document.
∙ It describes the order in which the element occurs.
Advantages
∙ We can define our own format for the XML files by DTD.
∙ It helps in validation of XML file.
∙ It provides us with a proper documentation.
∙ It enables us to describe a XML document efficiently.
Disadvantages
∙ DTDs are hard to read and maintain if they are large in size.
∙ It is not object oriented.
∙ The documentation support is limited.
∙ DTD doesn’t support namespaces.