Indexing structure
Requirements
A supporting structure for indexing of buffer data needs to be created. This structure needs to be able to index elements and provide an easy querying system, that can support current use cases of inspectIT. Furthermore, the indexing structure needs to hold week references to the elements, so that these elements are garbage collected when they are evicted from the buffer. Structure also has to be able to clean it self, meaning that references to weak references have to be destroyed.
Approach
The indexing has a tree structure. It is consisted of several levels, where each level has several branches and indexes the element based on element's properties. On the last level the weak references are mapped, thus from the last level objects are access-able. This would allow fast searching for the elements, especially if a searching query has defined a property for each level. An example is shown in the picture below:
Realization
The model used for realization is same as the Composite design pattern (Composite pattern). ITreeComponent defines operations that are needed for the indexing structure, so that all tree components are able to perform them.
Branch and IBranchIndexer
Branch is a tree component that has references to other tree components (same as Composite in Composite pattern). This means that branch represents one level of indexing. To be able to correctly index elements and perform queries, branch uses functionality of IBranchIndexer. Different implementation of this interface index elements on different properties. Thus, the branch indexer is used when branch needs to determine right "next level" branch for the indexing element. The same stands for the searching of elements.
IBranchIndexer is also responsible to know which indexer has to be used on the next level. This is needed when tree is empty, and new elements should be indexed into branches/leafs that have not yet been created. Currently implemented indexers are:
- Platfrom indexer - creates one tree component for each agent (each platfrom ident)
- Sensor type indexer - creates one tree component for each sensor type (each sensor type ident)
- Method indexer - creates one tree component for each method (each method ident)
- Object type indexer - creates one tree component per class, meaning that all objects of the same class will be in one branch/leaf
- Timestamp indexer - creates one tree component for every minute (only when needed), and indexes the elements by their time stamp
Leaf
Leaf is the element on the deepest level in the tree. All objects that are referenced in one leaf have the same indexing path.
LeafingBranch
Removed
Leafing branch was removed due of the conclusions that memory space it takes are too expensive. The speed in query gained was very small on the other hand.
LeafingBranch is the special type of branch that has an extended functionality to hold references to all objects that are indexed in this branch. This means that same element can be referenced from one leaf, but also many leafing branches. The advantage is speeding up the queries that need to traverse through whole branch because query does not define property for lower level branching. In this case, leafing brunch can just return results from its own leaf, with no need to look into all tree components below it. The disadvantage is the memory size needed for the indexing structure, because it rises linearly depending on the number of levels that use leaf branch.
During tests it was concluded that the best trade-off between query speed and indexing structure size is achieved by following:
- Normal branches need to be used for branches in the upper indexing levels and in branches that have a small number of referring tree components. A good example would be branch that would use platform indexer.
- Leafing branches need to be used for branches in the lower indexing levels and in branches that have a big number of referring tree components. A good example would be branch that would use time stamp indexer.
Updates to the indexing structure from version 1.4
The indexing structure has heavily increased and is more complicated in version 1.4 of inspectIT. This was need to support the storing of the storage data in the same tree as we use for the buffer. The main difference is that the indexing for the buffer is indexing elements passed, while indexing for the storage indexes the information where will indexed elements be saved to the disk, like file, position and size.
The updates are following:
- The interface ITreeComponent defines general tree operations like put, get, etc. The buffer and storage specific operations are in the IBufferTreeComponent and IStorageTreeComponent respectively. The ITreeComponent define two generic values, one for the type of the element that is indexed in the tree, and the second is type that will be actually saved in the tree. IBufferTreeComponent can index DefaultData elements and returns also DefaultData elements, while IStorageTreeComponent can also index DefaultData objects, but returns the IStorageDescriptors as the result of indexing.
- The AbstractBranch class was introduced to hold the general branch implementation. Branch and StorageBranch extend the mention abstract class, and implement the proper IBufferTreeComponent and IStorageTreeComponent respectively.
- The buffer indexing side has only one leaf type - called Leaf. On the other hand we have two types of leafs. The ArrayBasedStorageLeaf is the one that creates one StorageDescriptor for every element indexed, keeping complete information about position and size of the element in the file. On the other hand the LeafWithNoDescriptors keeps only the track of the total size of indexed elements. The trade back is following: the ArrayBasedStorageLeaf can provide the precise information where the element of the given ID is saved, but the size of the leaf in memory is much bigger (on disk also) and grows with the number of indexed elements; the LeafWithNoDescriptors does not grow at all and has a constant small size, but can only provide information about total size of all elements in the leaf, meaning that for finding the element with given ID we need to de-serialize every object in the leaf and find the one with given ID.
- The class CombinedStorageBranch is utility class that actually delegates all operations to the set of IStorageTreeComponent. This class should be used for read-only purposes. When element is requested the first tree component that has the indexed element will be asked to provide the information. When query is executed, the result from all tree component will be combined and returned.
- The branch indexers have also been re-factored. Storage branches use IStorageTreeIndexer and old buffer Branches use IBufferTreeIndexer. Both types of indexers delegate the basic ITreeIndexer operations to the delegate indexer. Currently there is a set of 7 possible delegate indexers (SQL string indexer is not shown in the UML diagram) that can index objects based on the: class, time-stamp, platform ident, method ident, sensor type ident, has children property (only InvocationSequenceData), SQL string (only SQLStatementData).
Querying the indexed tree
Please refer to Querying the indexed tree for information and specification about indexed tree querying system.
Custom root branch proposal
This root branch proposal should provide the best results when buffer is responsible for holding all inspectIT default data object types.