Updated: Nov 18, 2022
Why this post?
This post is for geeks who want to know what's under the hood in a Neo4j database. I consider myself a limited expert, so it's not the gospel truth. There are references sited in this post with more erudite discussions. There is also a free O'Reilly eBook on Neo4j.
Why use Neo4j?
NoSQL is not the same as graph database. See this link for an explanation. More critical reviews and comparisons to other databases are also available. NoSQL databases have no tables and thus no relationships. Metadata thus needs to be subsumed within a object ... each an every one, even if it is the same in numerous objects (e.g., inefficient storage and queries). Neo4j can create a new node set and edges to reduce this redundancy. For a discussion of how Neo4j differs from a NoSQL database in managing metadata, see this link.
Neo4j uses an index to find a start node, then navigates the relationships (edges) between nodes. The latter replaces the joins in a conventional SQL database; this strategy reduces query time and CPU and greatly enhances scalability.
The database is in a directory with 26 folders and 89 files totaling 56mB in size. Most of this seems to be “infrastructure” as the actual user data is seemingly in one directory and about 15 mB in size.
My install is local, so this writing is based on reading rather than experience. Neo4j can be installed on almost any platform. There are Python resources for connectivity either through the HTTP-API or a binary interface. Another set of tools are available for .NET developers.
My database is relatively small. It is easiest to rebuild it as things change. This is done using some VB.NET code to process the following:
Items in an SQL Server 2016 database are read, converted to Neo4j compatible json and imported to Neo4j
Items in text files are read in and converted to Neo4j compatible json and imported
While this strategy works in development on a small scale, more robust systems will be required at scale. You want real time updates and little down time.
Neo4j has specific functions for adding, updating and deleting both nodes and edges/relationships.