7.1 XML documents, DTD 

Definiton 

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding  documents in a format that is both human-readable and machine-readable. The design goals of  XML focus on simplicity, generality, and usability across the Internet. It is a textual data format  with strong support via Unicode for different human languages. Although the design of XML  focuses on documents, the language is widely used for the representation of arbitrary data  structures such as those used in web services. 

∙ XML stands for extensible Markup Language 

∙ XML is a markup language like HTML 

∙ XML is designed to store and transport data 

∙ XML is designed to be self-descriptive 

Significance/ Importance/ Benefits of XML in Web: 

1. Ease 

Simplicity is the biggest advantage of using XML. Any computer can process the  information and it is simple to read and comprehend. XML follows the standards of W3C  and the market leaders in the software industry endorse it. Therefore, its openness is  something to reckon with. 

2. No limitation of tags XML is not limited to the fixed set of tags. Whenever it is needed,  new tags can be developed. 

3. Self-description In case of the customary databases, the data administrator sets up schemas  for maintaining data records. There is no need of such definitions with XML documents as  there are meta data with tags and other features. XML present a foundation for author  recognition and versioning at the basic level. Any XML tag can hold numerous  characteristics as in version or author. 

4. Highly readable context information One of the biggest advantages of XML over the plain  text format of HTML is its context information. Attributes, Tags, and element structure are  present context information that can be utilized for interpreting the significance of content,  clever data mining, agents, creating latest possibilities for extremely competent search  engines, etc.

5. Content is important- not how it is presented XML’s motto is to elaborate the meaning of  the content and not the presentation of the same. If HTML stands for “how it appears” then  XML means “what it signifies and how it should appear.” To change and control the look  and feel of a document or a website created with XML, there is no need to alter the content  of the document. It is possible to easily render numerous presentations or views of the  similar content. XML is supportive to Unicode and multilingual documents, which is  essential for betterment of the applications as per the international standard of web  development. 

6. Assists in data assessment and aggregation XML document structure is designed in such a  way that the documents can be efficiently assessed and aggregated part by part. Another  prolific advantage XML is its ability to feature any possible type of data. The data might  range from active components such as ACTIVEX and Java applets or multimedia data such  as video, image and sound. 

Differences between XML and HTML 

XML and HTML were designed with different goals: 

∙ XML is designed to carry data emphasizing on what type of data it is. 

∙ HTML is designed to display data emphasizing on how data looks 

∙ XML tags are not predefined like HTML tags. 

∙ HTML is a markup language whereas XML provides a framework for defining markup languages. ∙ HTML is about displaying data,hence it is static whereas XML is about carrying  information,which makes it dynamic. 

EXAMPLE : 

XML code for a note is given below 

XML documents 

An XML document is a basic unit of XML information composed of elements and other markup in an  orderly package. An XML document can contains wide variety of data. For example, database of  numbers, numbers representing molecular structure or a mathematical equation. 

XML Document Example 

A simple document is shown in the following example −

<?xml version = "1.0"?> 

<contact-info> 

 <name>Tanmay Patil</name>

 

<company>TutorialsPoint</company> 

 <phone>(011) 123-4567</phone> 

</contact-info>

 

The following image depicts the parts of XML document. 

XML Elements 

The XML elements are the basic building block of the XML document. It is used as a container  to store text elements, attributes, media objects etc. Every XML documents contain at least one  element whose scopes are delimited by start and end tags or in case of empty elements it is  delimited by an empty tag. 

Syntax: 

<element-name attributes> Contents...</element-name> 

element-name: It is the name of element. 

attributes: The attributes are used to define the XML element property and these attributes  are separated by white space. It associates the name with a value, which is a string of  characters. 

Example: 

1. name="Geeks"


 

Here, Geeks represents the value of attribute 

Rules to define XML elements: There are some rules to create XML elements which are given  below: 

∙ An element an contain alphanumeric values or characters. But only three special characters  are required in the names these are hyphen, underscore and period. 

∙ Names are case sensitive. It means lower case letters have different meaning and upper case  characters have different meaning. For example address, Address, aDDress are different  names. 

∙ Both start and end tags for elements need to be same. 

∙ An element, which is a container, can contain text or elements 

Empty Elements: An element in XML document which does not contains the content is known  as Empty Element. The basic syntax of empty element in XML as follows: 

Example 1: Following is the example of an XML document describing the address of a college  student using XML elements.

1. <?xml version = “1.0”?> 

2. <contactinfo> 

3. <address category = “college”> 

4. <name>G4G</name> 

5. <College>Geeksforgeeks</College> 

6. <mobile>2345456767</mobile> 

7. </address> 

8. </contactinfo> 

9.


 

Output: 

1. G4G 

2. Geeksforgeeks 

3. 2345456767 

4.


 

Example 2: 

1. <?xml version = "1.0"?> 

2. <student> 

3. <_personal_details = "Personal Details">  

4. <name>xyz</name> 

5. <father_name>abc</father_name> 

6. </personal_details> 

7. <edu_details = "Educational Details"> 

8. <hsc_perc>80%</hsc_perc> 

9. <ssc_perc>98%</ssc_perc> 

10. </edu_details> 

11. </student>


 

Output: 

1. xyz  

2. abc 

3. 80% 

4. 98%

 

DTD: 

DTD stands for Document Type Definition. It is a document that defines the structure of an  XML document. It is used to describe the attributes of the XML language precisely. It can be  classified into two types namely internal DTD and external DTD. It can be specified inside a  document or outside a document. DTD mainly checks the grammar and validity of an XML  document. It checks that an XML document has a valid structure or not.

Characteristics 

∙ It defines the compulsory and optional elements in the XML document. ∙ It validates the structure of the XML document. 

∙ It check for the grammar of the XML document. 

∙ It describes the order in which the element occurs. 

Advantages 

∙ We can define our own format for the XML files by DTD. 

∙ It helps in validation of XML file. 

∙ It provides us with a proper documentation. 

∙ It enables us to describe a XML document efficiently. 

Disadvantages 

∙ DTDs are hard to read and maintain if they are large in size. 

∙ It is not object oriented. 

∙ The documentation support is limited. 

∙ DTD doesn’t support namespaces.