Saturday, October 31, 2015

Java's serialization algorithm



http://www.javaworld.com/article/2072752/the-java-serialization-algorithm-revealed.html

In general the serialization algorithm does the following:

It writes out the metadata of the class associated with an instance.
It recursively writes out the description of the superclass until it finds java.lang.object.
Once it finishes writing the metadata information, it then starts with the actual data associated with the instance. But this time, it starts from the topmost superclass.
It recursively writes the data associated with the instance, starting from the least superclass to the most-derived class.

An outline of the serialization algorithm








































AC ED 00 05 73 72 00 0A 53 65 72 69 61 6C 54 65
73 74 05 52 81 5A AC 66 02 F6 02 00 02 49 00 07
76 65 72 73 69 6F 6E 4C 00 03 63 6F 6E 74 00 09
4C 63 6F 6E 74 61 69 6E 3B 78 72 00 06 70 61 72
65 6E 74 0E DB D2 BD 85 EE 63 7A 02 00 01 49 00
0D 70 61 72 65 6E 74 56 65 72 73 69 6F 6E 78 70
00 00 00 0A 00 00 00 42 73 72 00 07 63 6F 6E 74
61 69 6E FC BB E6 0E FB CB 60 C7 02 00 01 49 00
0E 63 6F 6E 74 61 69 6E 56 65 72 73 69 6F 6E 78
70 00 00 00 0B

AC ED: STREAM_MAGIC. Specifies that this is a serialization protocol.
00 05: STREAM_VERSION. The serialization version.
0x73: TC_OBJECT. Specifies that this is a new Object.
The first step of the serialization algorithm is to write the description of the class associated with an instance. The example serializes an object of type SerialTest, so the algorithm starts by writing the description of the SerialTest class.

0x72: TC_CLASSDESC. Specifies that this is a new class.
00 0A: Length of the class name.
53 65 72 69 61 6c 54 65 73 74: SerialTest, the name of the class.
05 52 81 5A AC 66 02 F6: SerialVersionUID, the serial version identifier of this class.
0x02: Various flags. This particular flag says that the object supports serialization.
00 02: Number of fields in this class.
Next, the algorithm writes the field int version = 66;.

0x49: Field type code. 49 represents "I", which stands for Int.
00 07: Length of the field name.
76 65 72 73 69 6F 6E: version, the name of the field.

And then the algorithm writes the next field, contain con = new contain();. This is an object, so it will write the canonical JVM signature of this field.

0x74: TC_STRING. Represents a new string.
00 09: Length of the string.
4C 63 6F 6E 74 61 69 6E 3B: Lcontain;, the canonical JVM signature.
0x78: TC_ENDBLOCKDATA, the end of the optional block data for an object.
The next step of the algorithm is to write the description of the parent class, which is the immediate superclass of SerialTest.

0x72: TC_CLASSDESC. Specifies that this is a new class.
00 06: Length of the class name.
70 61 72 65 6E 74: SerialTest, the name of the class
0E DB D2 BD 85 EE 63 7A: SerialVersionUID, the serial version identifier of this class.
0x02: Various flags. This flag notes that the object supports serialization.
00 01: Number of fields in this class.
Now the algorithm will write the field description for the parent class. parent has one field, int parentVersion = 100;.

0x49: Field type code. 49 represents "I", which stands for Int.
00 0D: Length of the field name.
70 61 72 65 6E 74 56 65 72 73 69 6F 6E: parentVersion, the name of the field.
0x78: TC_ENDBLOCKDATA, the end of block data for this object.
0x70: TC_NULL, which represents the fact that there are no more superclasses because we have reached the top of the class hierarchy.
POPULAR RESOURCES

WHITE PAPER
10 Best Practices for Log Management

WHITE PAPER
Coding with JRebel: Java Forever Changed
SEE ALL 
Search Resources
  Go
So far, the serialization algorithm has written the description of the class associated with the instance and all its superclasses. Next, it will write the actual data associated with the instance. It writes the parent class members first:

00 00 00 0A: 10, the value of parentVersion.
Then it moves on to SerialTest.

00 00 00 42: 66, the value of version.
The next few bytes are interesting. The algorithm needs to write the information about the contain object, shown in Listing 8.

Listing 8. The contain object


contain con = new contain();

Remember, the serialization algorithm hasn't written the class description for the contain class yet. This is the opportunity to write this description.

0x73: TC_OBJECT, designating a new object.
0x72: TC_CLASSDESC.
00 07: Length of the class name.
63 6F 6E 74 61 69 6E: contain, the name of the class.
FC BB E6 0E FB CB 60 C7: SerialVersionUID, the serial version identifier of this class.
0x02: Various flags. This flag indicates that this class supports serialization.
00 01: Number of fields in this class.

Next, the algorithm must write the description for contain's only field, int containVersion = 11;.

0x49: Field type code. 49 represents "I", which stands for Int.
00 0E: Length of the field name.
63 6F 6E 74 61 69 6E 56 65 72 73 69 6F 6E: containVersion, the name of the field.
0x78: TC_ENDBLOCKDATA.
Next, the serialization algorithm checks to see if contain has any parent classes. If it did, the algorithm would start writing that class; but in this case there is no superclass for contain, so the algorithm writes TC_NULL.

0x70: TC_NULL.
Finally, the algorithm writes the actual data associated with contain.

00 00 00 0B: 11, the value of containVersion.

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts