What’s External Data Representation and Marshaling
In a computer programming language, you may refer to data that is stored in data structures using the language’s internal syntax.
For instance, you might have objects like: “linked list” or “hash map” or “binary tree” or “string”. or “integer”. or “set”.
Outside of a computer program you have things like: file system, file.
Now the file format is independent of whatever your favorite programming language is.
Most files are just a big bunch of bytes or characters or whatnot and you can’t tell what’s in them without extra information (like a filename extension or the first few bytes of the file might be magic).
But in general you can come up with way more ways to represent data in a computer program than you can in a file on disk.
According to Heterogeneity (Everybody is different) we have to think about Different operation systems, different programming languages, different hardware architectures.
For example, at TCP/UDP level data are communicated as ‘messages’ or streams of bytes. So conversion is needed for that. The need is to convert to sequence of bytes.
· Integers: big-endian and little-endian order
· float-type: representation differs between architectures
· char codes: ASCII, Unicode
According to that need we have to do Either both machines agree on a format type (included in parameter list) or an intermediate external standard is used.
So that’s when this comes to the big picture. an agreed standard for the representation of
data structures and primitive values is called as external data representation
Marshaling. We can get CORBA’s common data representation, Java’s object serialization, XML(Extensible Markup Language) as examples to describe this.
COBRA Common Data Representation
CORBA CDR is the external data representation defined with CORBA 2.0. COBRA’s common data representation (CDR) can be used with a variety of programming technologies. CDR can represent all of the data types that can be used as arguments and return values in remote invocations in CORBA. These consist of 15 primitive types, which include short (16-bit), long (32-bit), unsigned short, unsigned long, float (32-bit), double (64-bit), char, boolean (TRUE, FALSE), octet (8-bit), and any (which can represent any basic or constructed type); together with a range of composite types. Both little and bigendian support is provided senders indicate in which ordering the message is transmitted. Floatingpoint numbers use the IEEE standard. Characters are represented in a codeset agreed between the sender and receiver. Data type information is NOT transmitted.
represented by a sequence of bytes in the invocation or result message.
CORBA CDR’s Composite Types
Marsheling in COBRA
CORBA’s Interface Definition Language (IDL) is used to “automatically” produce marshalling and unmarshalling operations. The CORBA IDL compiler enables the generation of the required components.
For example,
Java’s Object Serialization
It works only within the java environment.
The term “serialization” refers to the activity of flattening an object or a connected set of objects into a serial form that is suitable for storing on disk or transmitting in a message. In Java RMI, both objects and primitive data values may be passed as arguments and results of method invocations. An object is an instance of a Java class.
To serialize an object:
(1) its class info is written out: name, version number
(2) types and names of instance variables
- If an instance variable belong to a new class, then new class info must be written out, recursively
- Each class is given a handle
(3) values of instance variables
Example is here,
Person p = new Person ( “Smith” , “London” , 1934 );
Serialization: flattening objects into a serial form for storing on disk or transmitting in a message.
Deserialization: restoring the state of objects from serialized form Assumed has no prior knowledge of the types of the objects in the serialized form Some information about the class of each object is included in the serialized form.
Java objects can contain references to the other objects. All objects it references are serialized. These references are serialized as handles.
· A handle is a reference to an object within the serialized form.
· Each object is written only once.
· Handle is written in subsequent occurrences.
XML (Extensible Markup Language)
XML is a markup language that was defined by the World Wide Web Consortium (W3C) for general use on the Web. In general, the term markup language refers to a textual encoding that represents both a text and details as to its structure or its appearance.
A textual format for representing structure data that works with any programming technology.
A “markup language” refers to a textual encoding that represents both a text and details as to its structure or its appearance. HTML was designed to describe the appearance of web pages. XML was designed to describe structured documents and markup languages.
XML characteristics
· XML is self-describing.
· XML was intended to be used by multiple applications for different purposes.
· XML is extensible in the sense that users can define their own tags.
· XML is “textual”, so can be easily read by humans and computers.
XML data items are tagged with ‘markup’ strings. The tags are used to describe the logical structure of the data and to associate attribute-value pairs with logical structures. XML is used to enable clients to communicate with web services and for defining the interfaces and other properties of web services. However, XML is also used in many other ways, including in archiving and retrieval systems — although an XML archive may be larger than a binary one, it has the advantage of being readable on any computer.
For example,
Conclusion
COBRA’s common data representation which is concerned with an external representation for the structured and primitive types that can be passed as arguments and results of remote method invocations in COBRA.
Java’s object sterilization, which is concerned with the flattening and external data representation of any single object that may need to be transmitted in a message. It is for only use by Java.
XML which defines a textual format for representing structured data. it was intended for documents. But now, use for represent data send in messages exchanged by clients and servers in a web service.
Marshalling and unmarshalling activities are intended to be carried out by a middleware layer without any involvement of the part of the application programmer. even in the case of XML, more accessible to hand encoding, software for marshalling and unmarshalling is available for all commonly used platforms and programming environment. Because marshalling requires the consideration of all the finest details of the representation of the primitive components of composite objects, the process is likely to be error-prone if carried out by hand. Compactness is another issue that can be addressed in the design of automatically generated marshalling procedures.