This page provides the general information about the storage for the developers, thus trying to explain in details the structure of storage and processes of their creation and manipulation.
General overview
The purpose of storages development was to allow the performance data to be persisted to disk. Thus, all data contained in a storage is placed on disk. Storages are always created on the CMR and later on can be exported or downloaded via User interface. By default, the root storage folder on both CMR and UI is [root]/storage. Note that the root storage folder can be changed by altering the storage configuration.
When a storage is created the unique ID is given to the storage that will serve as the property storages will be distinguished by. Every storage is placed in the separate folder inside root storage folder. The name of the sub-folder for the storage is same as the ID of the Storage. For example, if we have the storage with the ID=17aa8c9, then it's folder will be [rooStorageFolder]/17aa8c9.
Storage Configuration
On the Storage configuration page it is described what can be configured for storages by every user.
Storage file types
There are several file types that exist for each storage. Here is the overview on those file types:
File type | Extension | One file per storage | Mandatory | Description |
---|---|---|---|---|
Storage info file | .storage | Yes | Yes (for storages on CMR) | This file contains the serialized StorageData object that exists for each storage. In this object the general information about the Strage is kept, like ID, name, size on disk, etc. The name of the file is random.. |
Local storage info file | .local | Yes | Yes (for storages on UI) | This file contains the serialized LocalStorageData object that exists for every storage that is "mounted" or downloaded on the User intreface. The local information file holds some additional information about the storage that are only relating to the local machine. |
Agent file | .agent | No | No | Each data that is written to the storage belongs to specific agent. For all the agents which data is stored in one storage we create one Agent file during the storage finalization. In one file one complete Agent tree will be serialized. The file will be named by the ID of the agent. If storage is empty then no Agent files will be created. |
Indexing file | .index | No | Yes | The indexing file contains one part (or complete) indexing tree that point to the data in the storage. There can be many indexing files created, but there will be always at least one even if storage is empty. Name of the file is random. |
Data file | .itdata | No | No | The data files are containing the objects being written to the storage. One file can contain many serialized objects. Objects are serialized one after another. File will be named by the ID of the indexing tree leaf that data belongs to. |
States
The following picture describes the states the storage can be at one moment in time:
At the moment once storage is finalized it can not be put into the writable state.
Data writing
When CMR is started, one ExecutorService is specially created for the purpose of writing to disk. This executor service is called IOExecutorService and has a purpose of writing bytes to the disk via Java NIO2.
When storage is put in the writing state, the StorageWriter for that storage is created. Since we want to do the serialization and writing in parallel, each StorageWriter creates it's own ExecutorService that will be used for the operations needed to be done before executing the actual write. Thus, if we have 3 storages in writable state there will be 3 StorageWriters each having own ExecutorService and in addition the previously mentioned IOExecutorService that executes all IO operations for all storages.
Approach
For each data that is written using the StorageWriter a new WriteTask is created and it is given to the writer's executor service for execution. The write task holds the object to be written in a soft reference. This way we ensure that there is no OutOfMemory error if too much writing tasks are created and can not be processed so fast. When executed, the tasks performs the following steps:
- Check if the data to write is still available via soft reference, if not abort
- Inform the indexing tree that the write is starting. Indexing tree will return the ID of the channel where the file should be written. This ID is the same as the leaf in the indexing tree where data to be written is indexed.
- Acquire a serializer from the queue. Wait if no serializer is available.
- Serialize the data with ExtendedByteBufferOutputStream. Note that this output stream can serialize data of any size. If the data is too large to be serialized in one ByteBuffer, the stream will require additional buffers until data can fit. Additional buffers are provided by ByteBufferProvider.
- Return the serializer to the queue.
- Submit the IO writing task to the IOExecutorService passing the channel where write should be done and providing the output stream.
There is no certainty when new IO writing task will be executed. Because of this, the WriteReadCompletionRunnable is also passed to the info.novatec.inspectit.storage.nio.write.WritingChannelManager.write(ExtendedByteBufferOutputStream, Path, WriteReadCompletionRunnable) method. The runnable will be executed when the IO operation is done. If the write was successful, the indexing tree will be informed about the position in the file and the size of written object in bytes. Otherwise, the indexing tree will be informed that the write failed, which will effectifly remove the information about writing data from the indexing tree.
Data processors
In most cases write to the storage is not used directly. Instead, the data is passed to the set of Data processor that perform additional operation on data before it is written, filter out data, etc. The currently available processors are:
Processor | Works with | Description |
---|---|---|
DataSaverProcessor | All data | Simple processor that executes write of the data. The processor can be instantiated with list of classes to take info consideration. Any object which class is not in this list will be ignored. |
InvocationClonerProcessor | Invocations | Processor that creates a clone of the invocation sequence root and writes the clone. |
DataAggregatorProcessor | TimerData and its sub-classes | Processor that aggregates the TimerData based on the time-stamp. It can be defined what period is used for aggregation. In addition, filtering based on class type can also be used as in DataSaverProcessor. |
TimeFrameDataProcessor | All data | Filters out data that is not in the wanted time-frame and passes it to the chained processors. |
InvocationExtractorDataProcessor | Invocations | Extracts all children of invocation and passes them to the chained processors. |
AgentFilterDataProcessor | All data | Filters out data based on the agent ID and passes it to the chained processors. |
Write status
Each storage writer can provide the status of the write based on the amount of created and finished writing tasks since the time writer is created. Currently the status calculation is as follow:
After
Protection from low disk space
Storage manager regularly checks for the remaining hard disk space and is suspending any write if the space is critically low.
Storage finalization
During the storage finalization following tasks are executed:
- Suspend any additional writing tasks to be created
- Wait until all submitted tasks are done
- When no task is left shut down the Executor service
- Inform the indexing tree that the write is over and that remaining part of indexing tree should be saved.
In addition to the CmrStorageWriter will write all agents who's data has been saved to stoarge.
Indexing
The indexing is needed because we need to save the start position and the size of the object in a file. Later on we can query the tree for retrieving the information about where the data we want to have is stored. This way one file can have many objects serialized and we can have many data files for one storage.
The tree is very similar to the indexing tree we use for the CMR buffer. The difference is that here tree has information about the file/position/size of the object, while in CMR buffer tree holds the direct reference to the object in memory.
There are two types of leafs that we have in the storage indexing tree:
- ArrayBasedStorageLeaf - Holds the position and start of the each object that is added to the leaf. Advantage of this leaf is that every object can be referenced later on (needed for example for Invocations). The disadvantage is the size of the leaf that grows lineray with the amount of indexed objects.
- LeafWithNoDescriptors - This leaf just holds the amount of total bytes in the file it is referring to. This means that when querying only all objects from leaf can be retrieved and there is no single object picking. Of course, the size of the leaf is very small and constant. The leaf can be used for the data types that do not have to be referenced singularly.
The IndexingTreeHandler component is responsible of managing the tree while the writing is in process. There are two main functions of the IndexingTreeHandler:
- Make sure that the tree does not get too big. If the size of the tree over time is higher than the specified limit, tree will be saved and new tree will be created. At the end we can end up having many trees, that will be reassembled on the User interface when querying is performed.
- Keep track of all writing tasks that have been started and wait until all task are done, so that information about position/size can be in the tree before the tree is saved to disk.
More about indexing and the structure of indexing tree can be found on page Indexing.
Storage on the UI
What do we need for exploring, what files are downloaded and when
Data querying and retrieving
Difference between remote and downloaded storage