Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Use CompatibleFieldSerializer and have everything out of the box
  2. Create custom serializers and deal with the versioning control on our own. This is much more work, but should provide at least 20%-30% faster solution. This is actually very close to Externalizable Interface, but in my opinion much easier because serializers for Java classes already exist in the library.
Backward/forward compatibility problems with Kyro

As already said Kyro provides the CompatibleFieldSerializer class that can provide limited backward/forward compatibility. Limited because the type of the field can not be changed. However the way the Kyro handles the compatibility is quite problematic.

So what this serializer does is attaching the names of all fields that will be serialized before the data. Thus for every object there is first a list of field names, and after the real data. This is not problematic for a huge objects, where this "schema" is just a small part of whole serialized data, however our case it would influence the size of serialized objects too much (simple example, TimerData has 12 double and 2 long fields, with out names it can maximally be cca. 56 bytes, to include the field names it is cca. 100+ bytes more, thus raise in size of 200%).

The solution can be to externalize the schema out of the serialized data, where I can see two possible solutions:

  1. Save schema of each class as a part of stored data. Thus, when we need to de-serialize we can refer to the schema that comes with the data. This is a huge constraint, but can work in our case. Since we are serializing a large amount of data, adding schema information would not influence the size of the data at all. On the other hand, for every "recording" we will have a schema created even if nothing has been changed.
  1. Create a general schema for a class, and provide it as a "configuration file". Thus every class will have a schema in a specific directory that can be loaded/created when the CMR is loading. The schema structure can be very simple, and we can reuse the idea that Protocol buffers and Thrift had about connecting the field name with a unique integer number:
Code Block

- 1: someFieldName
- 2: someOtherFieldName

Then when serializing, instead of adding the names of the fields to the data, we can add a list of integers that will define the fields that will be serialized. Since for such a small integer only one byte is needed this would not be huge overhead. The only constraint is that the unique number given to the fields can not be changed, so in example place number 1 is forever reserved for "someFieldName". If this field is removed form the class, place number 1 can not be obtained by any other field. When de-serializing if we step on a field number that is not present in the current schema, we simply skip it.

The pros and cons from these two possibilities are quite clear. First one has less data because it does not have to write also which fields are serialized, but on the other hand de-serialization is not possible with out schema provided.

Externalizable Interfaces

...