3/13/2015

What is the Document Object Model (DOM)

What is the Document Object Model?


Introduction


The Document Object Model (DOM) is an application programming interface (API) for HTML and XML documents. It defines the logical structure of documents and the way a document is accessed and manipulated. In the DOM specification, the term "document" is used broadly - XML ​​is increasingly used to represent any type of information stored on all types of systems. Most of them have traditionally been seen as data rather than documents. However, XML is data such as documents, and DOM can be used to manage the data.

With the Document Object Model, programmers can build documents, navigate their structure, and add, change, or delete either elements or content. All that can be found in an HTML or XML document can be accessed, changed, destroyed, or added using the Document Object Model, with a few exceptions - in particular, the DOM interface for internal XML subsets and external has not yet been defined.

As a W3C specification, one important objective of the Document Object Model is to provide a standard programming interface that can be used in a variety of environments and applications. DOM is designed to be used with any programming language. To provide a precise specification of the DOM interfaces and independent of any language, we have chosen to use the OMG IDL formalism, as defined in the CORBA specification 2.2. In addition to the OMG IDL specification, we provide equivalence in Java and ECMAScript (standard scripting language industry based on JavaScript and JScript). Note: OMG IDL is used only as a means to specify interfaces independently of any specific language or particular implementation. Other IDL versions could be used. In general, IDL are designed for specific computing environments. The Document Object Model can be implemented in any computing environment, and does not require the presence of royalty free programs ("runtime") providing objects of correspondence generally associated with these IDL.


What the Document Object Model


DOM is a programming interface (API) for documents. This interface closely resembles the structure of the documents it models. For example, if one considers the following table from an HTML document:


      <TABLE>
      <TBODY> 
      <TR> 
      <TD>Shady Grove</TD>
      <TD>Aeolian</TD> 
      </TR> 
      <TR>
      <TD>Over the River, Charlie</TD>        
      <TD>Dorian</TD> 
      </TR> 
      </TBODY>
      </TABLE>
    

DOM present it like this:






DOM representation of a sample table



In DOM, documents have a logical structure like a tree; to be more precise, it is like a forest or grove, which can contain more than one tree. However, DOM does not specify that documents must be represented by trees or groves, it does not specify how must be carried object relationships. DOM is a logical model that can be implemented by any suitable means to developers. In this specification, we use the term structure model to describe the representation of a document as a tree; We deliberately avoid terms like "tree" or "grove" in order to avoid influencing specific implementations. An important property of DOM structure models is structural isomorphism: if two different implementations of the Document Object Model is used to create a representation of the same document, they will create the same structural model, with precisely the same objects and the same relations between these objects.

The name "Document Object Model" was chosen because it is an "object model" in the traditional sense of the object-oriented design: documents are modeled using objects, and the model does not contain only the document structure but also its behavior and that of objects which it is composed. In other words, the nodes of the above design does not represent a data structure, they represent objects with functions and identity. As an object model, DOM identifies:
  • the interfaces and objects used to represent and manipulate a document
  • the semantics of these interfaces and objects - including the behavior and attributes
  • the relationships and collaborations among these interfaces and objects

The structure of SGML documents has traditionally been represented by an abstract data model, not by an object model. In an abstract data model, the model is centered around the data. In object-oriented programming languages, the data itself is encapsulated in objects that hide the data, protecting it from direct external manipulation. The functions associated with these objects determine how the objects can be manipulated, and they are part of the object model.

The Document Object Model currently consists of two parts, the DOM core and a part dedicated to HTML. The DOM Core represents the functionality used for XML documents, it is also the basis for DOM HTML. A conforming implementation of DOM must implement all the core kernel interfaces with the semantics defined in the relevant chapter. In addition, the implementation must at least include HTML DOM and extended interfaces (XML) with the semantics as defined.

What the Document Object Model is not


  • This section is designed to give a better understanding of the DOM to distinguish the differences with other systems that could be like him:
  • Although the Document Object Model was strongly influenced by the "Dynamic HTML", in Level 1, it does not implement any "Dynamic HTML". In particular, the events have not yet been defined. Level 1 is designed to establish a solid foundation for this kind of functionality by providing a robust yet flexible model for the document itself.
  • The Document Object Model is not a binary specification. DOM programs written in the same language will be source code compatible on all platforms, but DOM defined any sort of binary interoperability.
  • The Document Object Model is not a way of persisting XML or HTML objects. Instead of specifying how objects may be represented in XML, DOM specifies how XML and HTML documents are represented as objects, so that it can be used in object-oriented programs.
  • The Document Object Model is not a set of data structures, it is object model that specifies interfaces. Although this document contains diagrams showing parent / child relationships, these are logical relationships defined by the programming interfaces, not representations of any internal data structure.
  • The Document Object Model does not define "internal semantics" of XML or HTML. The semantics of each language is defined by the corresponding recommendation published by the W3C. DOM is a programming model designed to respect these semantics. DOM has no branching on how XML and HTML documents are written; any document can be written in these languages ​​can be represented by DOM.
  • Despite its name, the Document Object Model is not a competitor to the Component Object Model (COM). COM, like CORBA, to specify interfaces and objects independent languages; DOM is a set of interfaces and objects designed to manage XML and HTML documents. DOM may be implemented using independent systems languages ​​like COM or CORBA; it can also be implemented using specific gateways languages ​​such as Java or ECMAScript specified herein.

What is the origin of the Document Object Model


The origin of DOM as a specification is to enable the portability of JavaScript scripts and Java programs between web browsers. The "Dynamic HTML" was the immediate ancestor of the Document Object Model, and was, originally, designed primarily for browsers. However, when the DOM Working Group was formed at W3C, he was joined by specialized publishers in other domains, including HTML or XML editors and document repositories. Many of them had worked on SGML before that XML is developed; As a result, the DOM has been influenced by the concept of "grove" of SGML and HyTime standard. Some of them had also developed their own object model documentation to provide an API for SGML / XML editors or document repositories, and these object models have also influenced the DOM.

Entities and the DOM Core


In basic DOM interface, there is no object to represent the entities. References to characters (numerals) and references to HTML entities and predefined XML, are replaced by a single character substitute for the entity. For example, in:

        <p>This is a dog &amp; a cat</p>        
      

The entity "& amp;" will be replaced by the character "&", and the text of the P element will consist of a continuous sequence of characters. The numerical references to characters and pre-defined entities are not interpreted as such in areas CDATA, the SCRIPT elements and HTML STYLE, they are then not replaced by their alternate characters. If the example above was framed by a CDATA section, the entity "& amp;" would be replaced by the "&" character; of the same character sequence <p> is not recognized as the start tag. The representation of general entities, both internal and external, is defined in the chapter on extended interfaces (XML) specification level 1.
Note: when the DOM representation of a document is serialized as XML or HTML text, applications must control each character in the text to determine whether it should be represented using a numeric or pre-defined entity. The omission of this audit can result in invalid HTML or XML instances. Also, implementations should they be taken to ensure that the representation of characters is done according to a set of characters that completely covers ISO 10646; otherwise the serialization might fail if the characters found in the tags or CDATA sections have no equivalent in the character set covering the document encoding.

Interfaces and DOM Implementations


DOM specifies interfaces which can be used to manage XML or HTML documents. It is important to understand that these interfaces are an abstraction - near "abstract classes base" of C ++, they are a means to specify how to access and manipulate the internal representation of a document by an application. The interfaces do not agree on a particular concrete implementation. Each DOM application is free to maintain documents in any convenient representation, as long as the interfaces specified in this document are supported. Some DOM implementation will certainly programs that use the DOM interfaces to access software written before the DOM specification exists. Consequently, DOM is designed to avoid dependencies on its implementation; in particular,

The attributes defined in the IDL do not imply concrete objects must have specific data members - in representations of language, they are transformed into a pair of get () / set (), not data members of the object. (Read-only functions have only a get () function in the representation of language).
DOM applications may provide additional interfaces and objects that do not exist in this specification and still be considered in compliance with DOM.
Because we specify interfaces and not the actual objects that must be created, the DOM can not know what manufacturers call objects for a given implementation .. In general, DOM users call méthodecreateXXX () on the classeDocument to create document structures, and lesimplémentations DOM create their own internal representations of these structures in their implementation of the createXXX () function.