Migrating from Neo4j to KGLite¶

This page is for a developer with an existing Neo4j database and/or neo4j-driver code who wants to evaluate or adopt KGLite. It covers where KGLite fits, the two migration paths (Bolt drop-in vs native Python), how to lift your data across, and — the core of the guide — where the Cypher dialect diverges.

KGLite ships a focused openCypher subset, not a Neo4j drop-in replacement. Most read queries port unchanged; the divergences below are the ones worth knowing before you commit.

When KGLite fits — and when it doesn’t¶

	KGLite	Neo4j
Deployment	Embedded, in-process (`pip install kglite`)	Server (JVM) or embedded driver
Query language	Cypher (subset, see below)	Cypher (full)
Storage	`.kgl` file — in-mem · mmap · disk	Server store directory
Auth	None in-process; basic only via Bolt server	Full RBAC
Multi-database	No — one graph per process / per server	Yes (`USE db`)
Clustering / routing	No (single server)	Causal cluster, routing
Transactions	Snapshot isolation + OCC	Full ACID
Data model	One primary type per node + optional secondary labels	Arbitrary label sets

KGLite fits when you want Cypher + Python ergonomics in one wheel: analytics over a graph that fits on one machine, embedding a graph in a Python app or notebook, shipping a queryable .kgl artifact, or serving a read-mostly graph to LLM agents (KGLite bundles an MCP server and a describe() schema). See the README comparison table for the side-by-side against other embedded graph engines, NetworkX, rustworkx, and Neo4j Embedded.

KGLite does not fit when you need server-mode RBAC, multiple databases per instance, a causal cluster with routing, or full ACID across long-lived multi-client write sessions. Those are Neo4j’s domain — KGLite is deliberately single-graph and embedded.

For positioning detail see Core Concepts and the concepts index.

Two migration paths¶

Path A — Bolt server (drop-in for driver code)¶

kglite-bolt-server is a pure-Rust binary that speaks the Bolt v5 wire protocol. Any Neo4j-aware client — the official Python/JS/Java/Go drivers, Cypher Shell, Neo4j Browser, LangChain’s Neo4jGraph — connects with no consumer-side code changes beyond the connection URL. Clients on the Java driver additionally need the server started with --neo4j-compat (see the note below); the change is server-side, so their code is still untouched. See the Bolt server operator guide.

cargo install kglite-bolt-server
kglite-bolt-server --graph my-graph.kgl --bind 127.0.0.1 --port 7687

Note

JVM clients need --neo4j-compat. The official Java driver refuses to talk to a server whose handshake agent does not begin with Neo4j/, and fails at connect time with UntrustedServerException: Server does not identify as a genuine Neo4j instance — before any query runs. The Python and JavaScript drivers do not perform this check, so they work against the default identity.

Start the server with compatibility mode (or set KGLITE_BOLT_NEO4J_COMPAT=1) and the agent becomes Neo4j/5.26.0 (kglite-bolt-server/<version>), which the driver accepts while still naming the real product:

kglite-bolt-server --graph my-graph.kgl --neo4j-compat

It is opt-in because presenting as another product is the operator’s decision. Connect a driver that enforces the check with compatibility off and the server log tells you exactly this, naming both activation routes. See “Driver identity” in the Bolt server operator guide.

Your driver code stays almost identical — just re-point the URI:

```python
# Before — against Neo4j
from neo4j import GraphDatabase
driver = GraphDatabase.driver("neo4j://prod-db:7687", auth=("neo4j", "secret"))

# After — against kglite-bolt-server
from neo4j import GraphDatabase
driver = GraphDatabase.driver("bolt://127.0.0.1:7687", auth=None)

with driver.session() as session:
    result = session.run(
        "MATCH (p:Person)-[:KNOWS]->(f) WHERE p.age > $min RETURN f.name",
        min=30,
    )
    for record in result:
        print(record["f.name"])

The query path is the same Cypher engine the Python API uses; differential tests confirm row-for-row equivalence (tests/test_bolt_server_differential.py). The official Python, JavaScript, and Java drivers all have automated regression coverage in CI — session and explicit-transaction lifecycle, managed executeWrite retry, PackStream type round-trips, Node/Relationship/Path values, Neo.* error codes, and OCC conflict detection (tests/conformance/). Go and .NET use the same Bolt v5 protocol and should work, but are untested — exercise them yourself first.

What carries over, and what does not¶

Neo4j feature	Bolt server status
`bolt://` direct connections	Supported — use this
`neo4j://` routing URIs	Single-server routing table only; set `--advertise-addr` for reverse-proxy deployments. No real cluster.
Auth	`--auth basic` with `--auth-user` / `--auth-pass`; default `--auth none` accepts any LOGON. No RBAC, users, or roles.
TLS (`bolt+s://` / `neo4j+s://`)	Supported via `--tls-cert` + `--tls-key`
Read-only enforcement	`--readonly` rejects all mutations
Auto-commit mutations	Not supported — wrap `CREATE`/`SET`/`DELETE`/`MERGE` in explicit `BEGIN`/`COMMIT`. Auto-commit reads work. (Drivers wrap writes in a tx anyway.)
OCC on writes	Supported — stale-snapshot commits get `Neo.ClientError.Transaction.ConflictDetected`; retry client-side
Multi-database (`USE db`)	Not supported — single graph; `USE` is accepted but ignored
Causal consistency / bookmarks	Not supported — the `bookmark` field is not returned on COMMIT
Multi-statement queries (`;`-separated)	Not supported — one statement per `session.run`, or group with `BEGIN`/`COMMIT`
`db.labels()` / `db.relationshipTypes()`	Yield `label` / `relationshipType` (Neo4j-conventional names) over Bolt

The in-process Python API also exposes kglite.to_neo4j(graph, uri, ...) if you want to push a KGLite graph into a real Neo4j instance (batched UNWIND, optional merge=True upsert).

Path B — native Python (`cypher()` directly)¶

If you control the calling code, skip the wire protocol entirely and call cypher() in-process. No server, no socket, no driver — the result is a ResultView you iterate, index, or convert with to_df=True. See Getting Started.

The same query, both ways:

# Bolt path — neo4j driver
with driver.session() as session:
    rows = list(session.run(
        "MATCH (p:Person) WHERE p.age > $min RETURN p.name AS name",
        min=30,
    ))

# Native path — kglite in-process
import kglite
graph = kglite.load("my-graph.kgl")
rows = list(graph.cypher(
    "MATCH (p:Person) WHERE p.age > $min RETURN p.name AS name",
    params={"min": 30},
))

Note the parameter syntax difference: the driver takes **kwargs (or a parameters= dict); cypher() takes a params= dict. The $name placeholders in the query string are identical.

Getting data out of Neo4j into KGLite¶

There are three routes; pick by what you already have.

Route 1 — query the source database directly, bulk-load via pandas¶

Query the source database with the neo4j driver, pull rows into a pandas DataFrame, and bulk-load with add_nodes / add_connections.

import pandas as pd
import kglite
from neo4j import GraphDatabase

src = GraphDatabase.driver("neo4j://prod-db:7687", auth=("neo4j", "secret"))
graph = kglite.KnowledgeGraph()

# Nodes
with src.session() as s:
    people = pd.DataFrame([
        dict(r["p"]) for r in s.run("MATCH (p:Person) RETURN p")
    ])
graph.add_nodes(people, node_type="Person", unique_id_field="id",
                node_title_field="name")

# Relationships — return the endpoint ids, not the whole nodes
with src.session() as s:
    knows = pd.DataFrame([
        {"src": r["a"], "tgt": r["b"]}
        for r in s.run("MATCH (a:Person)-[:KNOWS]->(b:Person) "
                       "RETURN a.id AS a, b.id AS b")
    ])
graph.add_connections(knows, connection_type="KNOWS",
                      source_type="Person", source_id_field="src",
                      target_type="Person", target_id_field="tgt")

graph.save("my-graph.kgl")

add_nodes auto-detects string vs integer ids from the column dtype and supports a column_types= override for spatial/temporal columns; add_connections can take a Cypher query= instead of a DataFrame. See the data-loading guide.

Route 2 — dump to CSV, then `LOAD CSV` (no pandas needed)¶

KGLite runs LOAD CSV, so the standard export/import pair ports unedited. Export with APOC (apoc.export.csv.all('graph.csv', {}), or per-label queries for a clean node/edge split), then load with the same Cypher you already have:

import kglite

graph = kglite.KnowledgeGraph()

# Nodes. Fields arrive as strings — CSV carries no types — so convert
# explicitly, as the source script already does.
graph.cypher(
    "LOAD CSV WITH HEADERS FROM 'file:///export/people.csv' AS row "
    "CREATE (:Person {id: toInteger(row.id), name: row.name})"
)

# Relationships, by endpoint id. Pattern properties take variables
# rather than function calls, so convert in a WITH first.
graph.cypher(
    "LOAD CSV WITH HEADERS FROM 'file:///export/knows.csv' AS row "
    "WITH toInteger(row.src) AS src, toInteger(row.dst) AS dst "
    "MATCH (a:Person {id: src}), (b:Person {id: dst}) "
    "CREATE (a)-[:KNOWS]->(b)"
)

graph.save("my-graph.kgl")

Prefer this route if you have no pandas (a Bolt client, a Rust or JVM consumer) or if you already have import scripts you would rather not rewrite. Four differences are worth knowing before you run it:

	KGLite	Neo4j
Sources	`file://` URLs and plain local paths	`file://` plus `http(s)://`
Batching	Automatic — 1000 rows at a time for row-local pipelines, so file size does not drive memory	`CALL { … } IN TRANSACTIONS` (formerly `USING PERIODIC COMMIT`), which you declare
Whole-result clauses	An aggregate, `ORDER BY`, `SKIP`/`LIMIT`, `DISTINCT`, `UNION`, or `CALL` after `LOAD CSV` cannot be batched, so the file is read in one capped pass and fails past 1,000,000 rows naming the clause responsible	Same memory characteristic, no explicit ceiling
Position	Must be the first clause	Anywhere in the pipeline

http(s):// is rejected with a message rather than a syntax error: the engine carries no HTTP client at all (network dependencies were removed in 0.14.x), so there is nothing to fetch a URL with. Download the file first, or fetch it in your own code and pass the rows in as a parameter.

Over Bolt, LOAD CSV is off by default. file:// means the server’s filesystem, and a Bolt client is a remote caller, so serving it ungated would publish an arbitrary-file-read primitive. In-process callers (this Python API, the Rust library, the CLI) are allowed because they already have the host process’s filesystem access; a Bolt client gets nothing unless the server was started with `–allow-csv-import

`, which confines imports to `DIR` after symlink resolution — the same shape as the import-directory setting you will have configured on the server side already. See [CYPHER.md → LOAD CSV](https://github.com/kkollsga/kglite/blob/main/CYPHER.md#load-csv).

Route 3 — pandas between export and load¶

Still the best fit when you want typing control, column renaming, or cleanup in between: pd.read_csv → add_nodes / add_connections, using the same calls as Route 1.

Cypher dialect divergence¶

The tables below are the heart of the guide. KGLite’s supported surface is documented in full in CYPHER.md; this section lists only where it diverges from Neo4j. Conformance is spot-checked against a live Neo4j via scripts/cypher_conformance.py (see Cypher Compatibility — Independent Differential Checks).

Data model — labels and node identity¶

Neo4j form	KGLite status	Workaround / note
Arbitrary label sets	One primary type + optional secondary labels (since 0.10.5)	`CREATE (n:A:B)`, `SET n:B`, `REMOVE n:B`, `MATCH (n:A:B)` all work; `labels(n)` returns a list, primary first
Retype a node by swapping labels	Primary type is immutable via label ops	Recreate/migrate the node under the new primary type; `SET n.type` only writes a property
Per-row label assignment at load	`add_nodes(labels=[...])` applies uniform secondary labels to the batch	`g.add_label(node_type, ids, label)` for batches after load
`id(n)` returns an internal integer	`id(n)` / `n.id` returns the node’s identity	See identity note below

Note: Neo4j docs and some older KGLite material describe KGLite as “single-label” with labels(n) returning a string. That changed in 0.10.5 — multi-label is native and labels(n) returns a list.

Node identity (`id`) — the 0.10.10 model¶

As of 0.10.10, n.id is the node’s unique identity and behaves identically in every storage mode.

Aspect	KGLite behaviour
`CREATE (n {id: X})`	Honours `X` as the identity (string / int / float; survives save → load)
Prefixed-id datasets (Wikidata `Q42`)	Loader stores the integer as `id` (`n.id == 42`) and the string form as the `nid` property (`n.nid == 'Q42'`)
Lookup by string id	`{nid: 'Q42'}` (a plain indexed string-property lookup); `{id: 'Q42'}` does not match — ids are integers
Duplicate ids	`MATCH (n {id: X})` returns one node per id; a rate-limited warning is emitted at index build. Use `MERGE` or dedupe input.

This is a breaking change from earlier releases for prefixed-id data — see the 0.10.10 CHANGELOG entry.

Missing language constructs¶

Verified absent against 0.10.14:

Neo4j construct	KGLite status	Workaround
`FOREACH (x IN list \| ...)`	Supported	Updating bodies, including nested `FOREACH`
`CALL { ... CREATE/SET/DELETE ... }` (writes in body)	Not supported (v1)	Do writes in a separate top-level clause; read subqueries are supported (see below)
`CALL { ... UNION ... }` (UNION inside body)	Not supported (v1)	Top-level `UNION`, or combine separate `cypher()` results
Unit `CALL { ... }` (no terminal `RETURN`)	Not supported (v1)	Body must end in `RETURN`
`CALL { ... } IN TRANSACTIONS`	Not supported	Server batching; no in-memory analogue
Pattern comprehensions `[(n)-->(m) \| m]`	Not supported	`MATCH`/`OPTIONAL MATCH` + `collect()`
Quantified path patterns `((a)-->(b))+`	Not supported	Variable-length paths `-[:R*1..3]->` (supported)
`allShortestPaths(...)`	Not supported	`shortestPath(...)` (supported) returns one path
`LOAD CSV`	Supported — `file://` and local paths, leading position only	`http(s)://` needs a prior download; off by default for Bolt clients (see above)
`exists(n.prop)` (property existence)	Not supported	`WHERE n.prop IS NOT NULL` / `IS NULL`
`exists((pattern))` in `RETURN`	Not supported as a `RETURN` expression	`EXISTS { pattern }` / inline pattern predicate in `WHERE`
`CREATE TEXT / FULLTEXT / POINT / VECTOR / LOOKUP INDEX`	Not supported — rejected with the route that applies	`CONTAINS`/`STARTS WITH` need no text index; `create_vector_index(...)` for ranked retrieval; label lookup is automatic
`CREATE CONSTRAINT ... IS :: TYPE` / `IS TYPED TYPE`	Parses, then rejected — there is no write-time property-type constraint, so accepting it would report success while enforcing nothing	`lock_schema()` rejects writes disagreeing with the recorded property type; `validate_schema()` audits existing data. (`IS UNIQUE` / `IS NOT NULL` / `IS NODE KEY` are supported — see below)
`CREATE CONSTRAINT ... FOR ()-[r:T]-() ...`	Not supported	KGLite constrains node properties only

Constructs that DO work (worth confirming)¶

These port unchanged from Neo4j and are easy to assume missing:

MERGE ... ON CREATE SET ... ON MATCH SET — match-or-create.
Variable-length paths -[:KNOWS*1..3]->, shortestPath(...).
WHERE EXISTS { pattern WHERE ... } (pattern-existence), inline pattern predicates, any/all/none/single(x IN list WHERE ...).
CALL { ... } read subqueries — both uncorrelated (CALL { MATCH ... RETURN ... }, cartesian-combined with the outer rows) and correlated (CALL { WITH p MATCH (p)-->... RETURN ... }, run per outer row). The importing WITH lists bare variables only. Aggregating bodies preserve the outer row with a zero value; non- aggregating bodies inner-join (zero matches drops the row). v1 caveats: no writes / UNION / unit subqueries in the body, no IN TRANSACTIONS. See CYPHER.md → CALL { ... } Subqueries.
List comprehensions [x IN list WHERE p \| expr], reduce(...), list slicing xs[1..3], map projections n {.a, .b}, map literals.
Map subscript m['key'] and dynamic property access n[key] where key is a variable.
Window functions row_number()/rank()/dense_rank() OVER (...), UNION/INTERSECT/EXCEPT, HAVING.

Recently added functions (new — verify your version ≥ 0.10.x)¶

These work today and may not appear in older comparison material:

Function	Form
Trig family	`sin`/`cos`/`tan`/`asin`/`acos`/`atan`/`cot`/`haversin`/`degrees`/`radians` (radians)
`atan2(y, x)`	Quadrant-aware arctangent
`randomUUID()`	RFC 4122 v4 UUID string
`localdatetime()` / `localtime()` / `time()`	Return ISO-8601 strings (`Value::DateTime` is date-only — see note)
`m['key']`	Map subscript
`n[key]`	Dynamic property access (variable key)

localdatetime()/localtime()/time() return strings, not a temporal Value, because KGLite’s Value::DateTime carries no time-of-day component. The 1-arg form validates/normalises a string and returns NULL on bad input.

Function coverage¶

KGLite covers the common scalar / string / math / aggregation / temporal / spatial families. Rather than duplicate them, see the function tables in CYPHER.md (Built-in, String, Math, Spatial, Temporal, Timeseries, Text predicates, plus the openCypher compatibility matrix).

Notable absent functions a Neo4j user will miss (verified against 0.10.14):

Neo4j	KGLite status	Note / workaround
`apoc.*` (all)	Not supported	No APOC library; use Python or built-in functions
`point({latitude, longitude})`	Map form not supported	KGLite uses `point(lat, lon)` (latitude-first); WKT strings are longitude-first per OGC
`point.distance(a, b)`	Use top-level `distance(a, b)`	Geodesic (WGS84); also `contains`, `intersects`, `centroid`, `area`, `perimeter`, geometry primitives (`geom_*`) — all present
`duration('P1Y2M')` (ISO-8601)	Map form only	`duration({years: 1, months: 2})`; `duration.between(d1, d2)` fills `days` only (date-only `DateTime`)
`timestamp()`	Not supported	`datetime()` (date-only); `localdatetime()` for a wall-clock string
`toBoolean(...)`	Not supported	`CASE` / Python-side coercion
Calendar-aware month diffs	Approximated (months ≈ 30 days in `DateTime ± Duration`)	Use literal dates for exact month arithmetic — see CYPHER.md “Duration semantics”

KGLite also adds functions Neo4j lacks — semantic search (text_score/vector_score), timeseries (ts_*), fuzzy text predicates (text_edit_distance, text_jaccard), and graph-algorithm procedures (CALL pagerank/louvain/...). See CYPHER.md.

`EXPLAIN` / `PROFILE`¶

Both are supported but the shape differs from Neo4j’s plan tree:

EXPLAIN <query> returns a ResultView with rows [step, operation, estimated_rows] — a flat, ordered step list, not a nested operator tree.
PROFILE <query> executes the query (you get the real results) and attaches per-clause stats on result.profile ([clause, rows_in, rows_out, elapsed_us]).

Every cypher() call also attaches lightweight result.diagnostics (elapsed_ms, timed_out, timeout_ms) with no prefix required.

Operational differences¶

Concern	Neo4j	KGLite
Persistence	Live server store directory	A `.kgl` file; explicit `graph.save(path)` / `kglite.load(path)`
Backup	`neo4j-admin dump` / online backup	Copy the `.kgl` file (or `save_subset(path)` for a slice)
Concurrency	Server-managed sessions, ACID	Reads parallelize (GIL released via `py.detach()`); mutations serialize via copy-on-write; OCC on transactions
Cross-process access	Native (server)	Embedded — use the Bolt server as the coordination point for multi-process
Schema DDL	`CREATE INDEX` / `CREATE CONSTRAINT` Cypher	Index DDL is supported — `CREATE [RANGE] INDEX`, `DROP INDEX`, `SHOW INDEXES`. What each statement builds differs from Neo4j (KGLite has separate equality, composite, and range structures) and index names are canonical, not user-assigned: see CYPHER.md → Cypher index DDL. The equivalent APIs remain — `create_index(type, prop)`, `create_range_index(...)`, `list_indexes()`, `drop_index(...)`. Type indices are automatic.
Constraint DDL	`CREATE CONSTRAINT ... IS UNIQUE / IS NOT NULL / IS NODE KEY`	Supported and enforced on every write path, including the bulk loader. Composite tuples (`REQUIRE (n.a, n.b) IS UNIQUE`) work; `IS NODE KEY` is uniqueness plus presence, installed atomically. Unlike index names, constraint names are stored, so `DROP CONSTRAINT <name>` works as written in a Neo4j script; unnamed constraints are addressable by their canonical descriptor. Declaring a constraint the existing data already violates is rejected and changes nothing. `IS :: TYPE` is refused (see above). See CYPHER.md → Cypher constraint DDL. `define_schema({"nodes": {...}})` declares the same constraints from Python.
Migrations	Versioned migration tools	None — you own schema evolution in Python load code

Indexes are maintained automatically across Cypher mutations (CREATE/SET/REMOVE/DELETE/MERGE). On disk-backed graphs property indexes are persisted next to the store; on in-memory graphs they live in a HashMap. See the Indexes section of CYPHER.md.

For the transaction model (snapshot isolation, OCC, last-writer-wins, per-call cost) see Transactions and sessions; for the concurrency contract see Concurrency.