Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Ivan Senic tried to create a small POC for using the Neo4j in order to store the information about loaded classes and their relationships on disk. The Neo4j being the graph database, seamed as the perfect candidate for storing the information we extract from the byte code.

...

They say that these crazy queries are occurring when not all nodes are saved before saving relationships. I will additionally check if this is true. Their comment on this was: "The one you see in exampleDepth1Query is the not-so-optimal query and it's being used because what you're saving does not satisfy the conditions of only new relationships (no new nodes, no updated nodes, no relationship updates, no relationship entities). Unfortunately this means that the optimisation applies to one operation only and that is "create relationships when the nodes on either end are persisted". As I mentioned earlier, work is underway to optimise all the queries and then you should not have to worry about the manner in which you save entities.".

Loading operations

A problem for us is also time needed to load some node from the database to check if the same one (FQN/hash) already exists. Also here the times are around 10ms per node and I don't get why. I even set the index on the fqn property that creates really fast execution plan for the query, but still the times did not improve. I see two possibilities here:

  1. It's the server round trip thing
  2. It's related that OGM specifies "resultDataContent" : "graph" which returns the node with it complete reachable graph. I asked them how can this be changed to just one row and I still did not get any answer. I also think that the solution here might be to load data by the entity id (because then you can specify depth of the graph), but for this we need to internally map the FQNs/hashes to the Ids.

Further steps

It's hard to define what should be our further steps. Seams like introducing Neo4j is not as easy as it sounds, with respect that OGM is not so mature. This especially relates to the saving as no bulk saving is possible in the moment.

I also see the problem with not being able to test everything with the embedded database as I would expect everything to be faster. 

Still I believe that this should be our goal, although we need to invest a lot of time to align everything to having data in Neo4j (we just test saving/updating of structure for now, but what about all the other things, like checking what should be instrumented, etc). Also we need to come to some kind of better design, see how we wanna deal with situations when we have more agents (do we go database per agent or we include agent information in structure).

As we have the working class cache thingy in memory I would advise to continue working on the memory-based implementation at least until the version 2.0 of neo4j-ogm is officially released. As Stefan Siegl said memory should not be a problem these days, so we can clearly say if you run CMR make sure you give it enough RAM as we will store complete class cache structure in memory.