Migrating from Neo4j to KGLite¶
This page is for a developer with an existing Neo4j database and/or
neo4j-driver code who wants to evaluate or adopt KGLite. It covers
where KGLite fits, the two migration paths (Bolt drop-in vs native
Python), how to lift your data across, and — the core of the guide —
where the Cypher dialect diverges.
KGLite ships a focused openCypher subset, not a Neo4j drop-in replacement. Most read queries port unchanged; the divergences below are the ones worth knowing before you commit.
When KGLite fits — and when it doesn’t¶
KGLite |
Neo4j |
|
|---|---|---|
Deployment |
Embedded, in-process ( |
Server (JVM) or embedded driver |
Query language |
Cypher (subset, see below) |
Cypher (full) |
Storage |
|
Server store directory |
Auth |
None in-process; basic only via Bolt server |
Full RBAC |
Multi-database |
No — one graph per process / per server |
Yes ( |
Clustering / routing |
No (single server) |
Causal cluster, routing |
Transactions |
Snapshot isolation + OCC |
Full ACID |
Data model |
One primary type per node + optional secondary labels |
Arbitrary label sets |
KGLite fits when you want Cypher + Python ergonomics in one wheel:
analytics over a graph that fits on one machine, embedding a graph in
a Python app or notebook, shipping a queryable .kgl artifact, or
serving a read-mostly graph to LLM agents (KGLite bundles an MCP
server and a describe() schema). See the
README comparison table
for the side-by-side against Kuzu / NetworkX / rustworkx / Neo4j
Embedded.
KGLite does not fit when you need server-mode RBAC, multiple databases per instance, a causal cluster with routing, or full ACID across long-lived multi-client write sessions. Those are Neo4j’s domain — KGLite is deliberately single-graph and embedded.
For positioning detail see Core Concepts and the concepts index.
Two migration paths¶
Path A — Bolt server (drop-in for driver code)¶
kglite-bolt-server is a pure-Rust binary that speaks the
Bolt v5 wire protocol. Any
Neo4j-aware client — the official Python/JS/Java/Go drivers, Cypher
Shell, Neo4j Browser, LangChain’s Neo4jGraph — connects with no
consumer-side code changes beyond the connection URL. See the
Bolt server operator guide.
cargo install kglite-bolt-server
kglite-bolt-server --graph my-graph.kgl --bind 127.0.0.1 --port 7687
Your driver code stays almost identical — just re-point the URI:
# Before — against Neo4j
from neo4j import GraphDatabase
driver = GraphDatabase.driver("neo4j://prod-db:7687", auth=("neo4j", "secret"))
# After — against kglite-bolt-server
from neo4j import GraphDatabase
driver = GraphDatabase.driver("bolt://127.0.0.1:7687", auth=None)
with driver.session() as session:
result = session.run(
"MATCH (p:Person)-[:KNOWS]->(f) WHERE p.age > $min RETURN f.name",
min=30,
)
for record in result:
print(record["f.name"])
The query path is the same Cypher engine the Python API uses;
differential tests confirm row-for-row equivalence
(tests/test_bolt_server_differential.py). The Python driver is the
only client with automated regression coverage — other drivers use
the same Bolt v5 protocol and should work, but exercise them manually
first.
What carries over, and what does not¶
Neo4j feature |
Bolt server status |
|---|---|
|
Supported — use this |
|
Single-server routing table only; set |
Auth |
|
TLS ( |
Supported via |
Read-only enforcement |
|
Auto-commit mutations |
Not supported — wrap |
OCC on writes |
Supported — stale-snapshot commits get |
Multi-database ( |
Not supported — single graph; |
Causal consistency / bookmarks |
Not supported — the |
Multi-statement queries ( |
Not supported — one statement per |
|
Yield |
The in-process Python API also exposes
kglite.to_neo4j(graph, uri, ...)if you want to push a KGLite graph into a real Neo4j instance (batchedUNWIND, optionalmerge=Trueupsert).
Path B — native Python (cypher() directly)¶
If you control the calling code, skip the wire protocol entirely and
call cypher() in-process. No server, no socket, no driver — the
result is a ResultView you iterate, index, or convert with
to_df=True. See Getting Started.
The same query, both ways:
# Bolt path — neo4j driver
with driver.session() as session:
rows = list(session.run(
"MATCH (p:Person) WHERE p.age > $min RETURN p.name AS name",
min=30,
))
# Native path — kglite in-process
import kglite
graph = kglite.load("my-graph.kgl")
rows = list(graph.cypher(
"MATCH (p:Person) WHERE p.age > $min RETURN p.name AS name",
params={"min": 30},
))
Note the parameter syntax difference: the driver takes **kwargs
(or a parameters= dict); cypher() takes a params= dict. The
$name placeholders in the query string are identical.
Getting data out of Neo4j into KGLite¶
The practical recipe: query Neo4j with the driver, pull rows into a
pandas DataFrame, and bulk-load with add_nodes / add_connections.
import pandas as pd
import kglite
from neo4j import GraphDatabase
src = GraphDatabase.driver("neo4j://prod-db:7687", auth=("neo4j", "secret"))
graph = kglite.KnowledgeGraph()
# Nodes
with src.session() as s:
people = pd.DataFrame([
dict(r["p"]) for r in s.run("MATCH (p:Person) RETURN p")
])
graph.add_nodes(people, node_type="Person", unique_id_field="id",
node_title_field="name")
# Relationships — return the endpoint ids, not the whole nodes
with src.session() as s:
knows = pd.DataFrame([
{"src": r["a"], "tgt": r["b"]}
for r in s.run("MATCH (a:Person)-[:KNOWS]->(b:Person) "
"RETURN a.id AS a, b.id AS b")
])
graph.add_connections(knows, connection_type="KNOWS",
source_type="Person", source_id_field="src",
target_type="Person", target_id_field="tgt")
graph.save("my-graph.kgl")
add_nodes auto-detects string vs integer ids from the column dtype
and supports a column_types= override for spatial/temporal columns;
add_connections can take a Cypher query= instead of a DataFrame.
See the data-loading guide.
Alternate route — APOC CSV export. For large graphs, export from
Neo4j with apoc.export.csv.all('graph.csv', {}) (or per-label
queries), then pd.read_csv → add_nodes / add_connections. KGLite
has no LOAD CSV — by design, pandas/csv give you better control
over typing and cleanup.
Cypher dialect divergence¶
The tables below are the heart of the guide. KGLite’s supported
surface is documented in full in
CYPHER.md;
this section lists only where it diverges from Neo4j. Conformance is
spot-checked against a live Neo4j via scripts/cypher_conformance.py
(see Cypher Conformance — On-Demand Check Against Neo4j).
Data model — labels and node identity¶
Neo4j form |
KGLite status |
Workaround / note |
|---|---|---|
Arbitrary label sets |
One primary type + optional secondary labels (since 0.10.5) |
|
Retype a node by swapping labels |
Primary type is immutable via label ops |
|
Per-row label assignment at load |
|
|
|
|
See identity note below |
Note: Neo4j docs and some older KGLite material describe KGLite as “single-label” with
labels(n)returning a string. That changed in 0.10.5 — multi-label is native andlabels(n)returns a list.
Node identity (id) — the 0.10.10 model¶
As of 0.10.10, n.id is the node’s unique identity and behaves
identically in every storage mode.
Aspect |
KGLite behaviour |
|---|---|
|
Honours |
Prefixed-id datasets (Wikidata |
Loader stores the integer as |
Lookup by string id |
|
Duplicate ids |
|
This is a breaking change from earlier releases for prefixed-id data — see the 0.10.10 CHANGELOG entry.
Missing language constructs¶
Verified absent against 0.10.14:
Neo4j construct |
KGLite status |
Workaround |
|---|---|---|
|
Not supported |
|
|
Not supported (v1) |
Do writes in a separate top-level clause; read subqueries are supported (see below) |
|
Not supported (v1) |
Top-level |
Unit |
Not supported (v1) |
Body must end in |
|
Not supported |
Server batching; no in-memory analogue |
Pattern comprehensions |
Not supported |
|
Quantified path patterns |
Not supported |
Variable-length paths |
|
Not supported |
|
|
Not supported (by design) |
pandas / |
|
Not supported |
|
|
Not supported as a |
|
|
Not supported |
Python |
Constructs that DO work (worth confirming)¶
These port unchanged from Neo4j and are easy to assume missing:
MERGE ... ON CREATE SET ... ON MATCH SET— match-or-create.Variable-length paths
-[:KNOWS*1..3]->,shortestPath(...).WHERE EXISTS { pattern WHERE ... }(pattern-existence), inline pattern predicates,any/all/none/single(x IN list WHERE ...).CALL { ... }read subqueries — both uncorrelated (CALL { MATCH ... RETURN ... }, cartesian-combined with the outer rows) and correlated (CALL { WITH p MATCH (p)-->... RETURN ... }, run per outer row). The importingWITHlists bare variables only. Aggregating bodies preserve the outer row with a zero value; non- aggregating bodies inner-join (zero matches drops the row). v1 caveats: no writes /UNION/ unit subqueries in the body, noIN TRANSACTIONS. See CYPHER.md →CALL { ... }Subqueries.List comprehensions
[x IN list WHERE p \| expr],reduce(...), list slicingxs[1..3], map projectionsn {.a, .b}, map literals.Map subscript
m['key']and dynamic property accessn[key]wherekeyis a variable.Window functions
row_number()/rank()/dense_rank() OVER (...),UNION/INTERSECT/EXCEPT,HAVING.
Recently added functions (new — verify your version ≥ 0.10.x)¶
These work today and may not appear in older comparison material:
Function |
Form |
|---|---|
Trig family |
|
|
Quadrant-aware arctangent |
|
RFC 4122 v4 UUID string |
|
Return ISO-8601 strings ( |
|
Map subscript |
|
Dynamic property access (variable key) |
localdatetime()/localtime()/time()return strings, not a temporal Value, because KGLite’sValue::DateTimecarries no time-of-day component. The 1-arg form validates/normalises a string and returnsNULLon bad input.
Function coverage¶
KGLite covers the common scalar / string / math / aggregation / temporal / spatial families. Rather than duplicate them, see the function tables in CYPHER.md (Built-in, String, Math, Spatial, Temporal, Timeseries, Text predicates, plus the openCypher compatibility matrix).
Notable absent functions a Neo4j user will miss (verified against 0.10.14):
Neo4j |
KGLite status |
Note / workaround |
|---|---|---|
|
Not supported |
No APOC library; use Python or built-in functions |
|
Map form not supported |
KGLite uses |
|
Use top-level |
Geodesic (WGS84); also |
|
Map form only |
|
|
Not supported |
|
|
Not supported |
|
Calendar-aware month diffs |
Approximated (months ≈ 30 days in |
Use literal dates for exact month arithmetic — see CYPHER.md “Duration semantics” |
KGLite also adds functions Neo4j lacks — semantic search
(text_score/vector_score), timeseries (ts_*), fuzzy text
predicates (text_edit_distance, text_jaccard), and graph-algorithm
procedures (CALL pagerank/louvain/...). See CYPHER.md.
EXPLAIN / PROFILE¶
Both are supported but the shape differs from Neo4j’s plan tree:
EXPLAIN <query>returns aResultViewwith rows[step, operation, estimated_rows]— a flat, ordered step list, not a nested operator tree.PROFILE <query>executes the query (you get the real results) and attaches per-clause stats onresult.profile([clause, rows_in, rows_out, elapsed_us]).
Every cypher() call also attaches lightweight result.diagnostics
(elapsed_ms, timed_out, timeout_ms) with no prefix required.
Operational differences¶
Concern |
Neo4j |
KGLite |
|---|---|---|
Persistence |
Live server store directory |
A |
Backup |
|
Copy the |
Concurrency |
Server-managed sessions, ACID |
Reads parallelize (GIL released via |
Cross-process access |
Native (server) |
Embedded — use the Bolt server as the coordination point for multi-process |
Schema DDL |
|
No DDL Cypher. Programmatic only: |
Migrations |
Versioned migration tools |
None — you own schema evolution in Python load code |
Indexes are maintained automatically across Cypher mutations
(CREATE/SET/REMOVE/DELETE/MERGE). On disk-backed graphs
property indexes are persisted next to the store; on in-memory graphs
they live in a HashMap. See the Indexes section of
CYPHER.md.
For the transaction model (snapshot isolation, OCC, last-writer-wins, per-call cost) see Transactions and Bolt; for the concurrency contract see Concurrency.
See also¶
Getting Started — install, build a graph, run Cypher.
Core Concepts — nodes, relationships, storage modes.
Transactions and Bolt —
begin()/commit()/ OCC.Cypher Conformance — On-Demand Check Against Neo4j — how the Neo4j oracle works.
CYPHER.md — full supported Cypher reference.