Initial Storage Approach

If we consider that we have serialization technology chosen and implemented, there is problem how to save these bytes to disk, moreover how to organize them in a meaningful way, so that both disk writing and especially reading are efficient. But first we need to focus on some open questions.

Open questions

Static data

inspectIt has some data that we can call "static", namely the data that is created when agent is instrumenting the code, thus method, sensor and platform identification data. This data has to be included also in one recording, because I think that we need to provide the data to the user with out any dependency to the database. Simply, database can be deleted, or "storage file" can be moved to another CMR, or something similar. Thus, I think the best way to go is to also save all this data when the recording is finished. The problem can be the data loading. Here we have to solution, load everything when the user wants to open some stored data, or load only when data is necessary. Although load only when necessary sound "better", I think the problem then is how to locate the correct data.

Aggregated data or not

I believe that some inspectIT data can be aggregated before storing. Currently there is only one use-case where we need every single instance of the TimerData and SqlStatementData, and that is plotting. Saving only one object per method/SQL statement would bring such a performance impact (because of the numbers of object that need to be serialized) that plotting option for the user can be dismissed for the stored data.

Invocation details

It is possible that the details of each invocations have to be persisted separately, so that overview of all invocation is possible without need to de-serialize every single invocation.

Data transfer to UI

Quite interesting question is should the data from the CMR be streamed to the UI, and then serialized there, or serialized on CMR. I think that serialization on the UI comes as a easy yes, because otherwise reading the stored data would use a lot of CMR resources. But this just has to be clarified, so that we start in right direction, because if data is serialized on CMR, then some caching techniques can be used for example.

Reading of data without CMR

It also possible to totally exclude the CMR from data reading, thus UI to directly load the data on its own.

Data storing

One file for normal data, one file for static data

My first idea was concretely like this, one file for the static data described above, and one for the rest, thus actually performance data. Thus, pilling up the bytes in one file and then think what can be done for reading.
Solution for the locating an object afterwards is quite simple. Only thing needed is the location where the object is stored in the file and its length in bytes (with Kryo serialization library length is not needed, only the location). And since currently we already have a developed indexing structure for objects that are in memory, we can use the same structure here. Simple, in buffer the indexing tree points to the memory locations via Weak references, here we can just point to the locations in file. The only necessity is to also store a complete indexing tree also, and then load it completely before loading any data. So, actually this solution would then be 3 files, two already mentioned and one for the indexing tree.
However, this solution has the obvious problem with reading. Image the situation when we need all Exception data for grouped exception view for example. We can easily find the location of these objects in a file, but then what. This file can grow to several GB, thus locations that are spread all over the file will be slow to read because we always have to first load a peace of bytes to Java NIO Buffer, and then de-serialize the object. Problem is this constant loading of peaces into the buffer, because this data has to be transfer from disk to memory (although this is kind of fast in these times ).

Data spread in multiply files

When I presented the idea above to Stefan, he proposed to make one file for each data type. Then at least not so much jumping is necessary, because in most of the use cases when many objects are needed, we are going for the same object type. And this is absolutely true. But I would suggest to expand this to even more files. First of all, if data from more different agents can be stored to the same storage, then these files would include same data type but from different agents, thus again we have the jumping.
I would suggest to have a file per a indexing tree leaf. Simply, when we search and reach the leaf in the indexing tree, there is a very high possibility that all the objects in the leaf will be in a result set to be returned. Thus, I think in this situation, jumping is dismissed. Because anyway when we reach a leaf, all object there need to be checked if they pass the query criteria. So, then we would simply load all objects in the file.
However, this solution brings some implementation overhead, because many files are in game, many NIO Channels to manage, close,.. But multi threading can be used maybe to manage these channels, so that more threads can write the data in different channels.