Kryo serialization - usage and info
InspectIT uses Kryo for serializing objects that need to be stored on the disk. This library has been chosen during the analysis of several currently available serialization possibilities (Java serialization possibilities). This page will provide a short overview how Kryo should be used and more important how to update to the newest version if necessary.
CustomCompatibleFieldSerializer
We have developed a new Serializer for Kryo. This serializer read a schema provided for each class. The schema defines integer numbers for each field that should be serialized:
1: someFieldName 2: someOtherFieldName
When serializing the object instead of the fields names we write the ids of the fields. This has several advantages:
- Smaller size of serialized data
- Faster serialization
- Backward-forward compatibility
This serializer extends the FieldSerializer and in addition is mixture of TaggedFieldSerializer and CompatibleFieldSerializer. We have use similar approach for defining which fields need to be serialized as in the TaggedFieldSerializer by overriding the initializeCachedFields() method:
/** * {@inheritDoc} */ @Override protected void initializeCachedFields() { if (null != schema) { CachedField<?>[] fields = getFields(); // Remove unwanted fields for (int i = 0, n = fields.length; i < n; i++) { Field field = fields[i].getField(); if (null == schema.getFieldMarker(field.getName())) { super.removeField(field.getName()); } } // Cache markers fields = getFields(); fieldMarkers = new int[fields.length]; for (int i = 0, n = fields.length; i < n; i++) { fieldMarkers[i] = schema.getFieldMarker(fields[i].getField().getName()).intValue(); } } }
From the CompatibleFieldSerializer we used the way data is read and written:
/** * {@inheritDoc} */ @Override public void write(Kryo kryo, Output output, T object) { CachedField[] fields = getFields(); ObjectMap context = kryo.getGraphContext(); if (!context.containsKey(this)) { context.put(this, null); if (TRACE) { trace("kryo", "Write " + fields.length + " field names."); } output.writeInt(fields.length, true); for (int i = 0, n = fields.length; i < n; i++) { // Changed by ISE output.writeInt(fieldMarkers[i], true); } } OutputChunked outputChunked = new OutputChunked(output, 1024); for (int i = 0, n = fields.length; i < n; i++) { fields[i].write(outputChunked, object); outputChunked.endChunks(); } } /** * {@inheritDoc} */ @Override public T read(Kryo kryo, Input input, Class<T> type) { T object = kryo.newInstance(type); kryo.reference(object); ObjectMap context = kryo.getGraphContext(); CachedField[] fields = (CachedField[]) context.get(this); if (fields == null) { int length = input.readInt(true); if (TRACE) { trace("kryo", "Read " + length + " field names."); } // Changed by ISE int[] markers = new int[length]; for (int i = 0; i < length; i++) { markers[i] = input.readInt(true); } fields = new CachedField[length]; CachedField[] allFields = getFields(); outer: for (int i = 0, n = markers.length; i < n; i++) { int fieldMarker = markers[i]; for (int ii = 0, nn = allFields.length; ii < nn; ii++) { if (fieldMarkers[ii] == fieldMarker) { fields[i] = allFields[ii]; continue outer; } } if (TRACE) { trace("kryo", "Ignoring obsolete field with marker: " + fieldMarker); } } context.put(this, fields); } InputChunked inputChunked = new InputChunked(input, 1024); for (int i = 0, n = fields.length; i < n; i++) { CachedField cachedField = fields[i]; if (cachedField == null) { if (TRACE) { trace("kryo", "Skip obsolete field."); } inputChunked.nextChunks(); continue; } cachedField.read(inputChunked, object); inputChunked.nextChunks(); } return object; }
How to update to newer version
If you need to update to the new version, there is a need that CustomCompatibleFieldSerializer is checked for possible updates in the mentioned two serializer where we used the implemention from. Follow this instructions:
- Check if the initializeCachedFields() has changed in the TaggedFieldSerializer and if so adapt our method to fit the changes
- Check if the read() and write() methods of the CompatibleFieldSerializer have changed, if so:
Copy the methods code in the class removing the old methods
Alter the write()method to support writing of IDs instead of names
output.writeInt(fields.length, true); for (int i = 0, n = fields.length; i < n; i++) { // Changed by ISE output.writeInt(fields[i].fieldMarker, true); }
Change the read() method to support reading of int values for fields instead of names
// Changed by ISE int[] markers = new int[length]; for (int i = 0; i < length; i++) { markers[i] = input.readInt(true); } fields = new CachedField[length]; CachedField[] allFields = getFields(); outer: for (int i = 0, n = markers.length; i < n; i++) { int fieldMarker = markers[i]; for (int ii = 0, nn = allFields.length; ii < nn; ii++) { if (allFields[ii].fieldMarker == fieldMarker) { fields[i] = allFields[ii]; continue outer; } } if (TRACE) { trace("kryo", "Ignoring obsolete field with marker: " + fieldMarker); } } context.put(this, fields);
Run all tests to assure that everything is working properly
Bugs
Latest versions of Kryo had a lot of bugs. Please test carefully and confirm that the new version does not have a bug that can influence the inspectIT.
Registration of classes
Registration of classes is done in the SerializationManager class. Read carefully the following instructions:
ATTENTION!
Please do not change the order of the registered classes. If new classes need to be registered, please add this registration at the end. Otherwise the old data will not be able to be de-serialized. If some class is not need to be register any more, do not remove the registration. If the class is not available any more, add arbitrary class to its position, so that the order can be maintained. Do not add unnecessary classes to the registration list.
NOTE: By default, all primitives (including wrappers) and java.lang.String are registered. Any other class, including JDK classes like ArrayList and even arrays such as String[] or int[] must be registered.
NOTE: If it is known up front what classes need to be serialized, registering the classes is ideal. However, in some cases the classes to serialize are not known until it is time to perform the serialization. When setRegistrationOptional is true, registered classes are still written as an integer. However, unregistered classes are written as a String, using the name of the class. This is much less efficient, but can't always be avoided.
Reference Resolver
The Kryo allows to define the implementation of ReferenceResolver to use. The RefereceResolver can define when the serialization should "use references" and when not.
By default, each appearance of an object in the graph after the first is stored as an integer ordinal. This allows multiple references to the same object and cyclic graphs to be serialized. This has a small amount of overhead and can be disabled to save space if it is not needed.
We have concluded that for all DefaultData objects (that are serialized the most) there is no need to use the references, except for the InvocationSequenceData. Thus, our extended MapReferenceResolver always disables the reference appraoch if passed class to be serialized is DefaultData. To solve the problem with InvocationSequenceData parent relation, we have a spacial serializer for invocations: InvocationSequenceCustomCompatibleFieldSerializer. This serializer just sets the correct parent relations after objects has been de-serialized:
/** * {@inheritDoc} */ @Override public InvocationSequenceData read(Kryo kryo, Input input, Class<InvocationSequenceData> type) { InvocationSequenceData invocation = super.read(kryo, input, type); connectChildren(invocation); return invocation; } /** * Sets the parent to all nested sequences of the invocation to the correct one. * * @param parent * Parent to start from. */ private void connectChildren(InvocationSequenceData parent) { if (null != parent.getNestedSequences()) { for (InvocationSequenceData child : parent.getNestedSequences()) { child.setParentSequence(parent); connectChildren(child); } } }