Mar 23, 2026

Graph Database Query Optimization: Patterns That Actually Matter

Graph databases promise to make relationship queries that are slow in relational databases fast and natural. This is true, but only when you write queries that align with how graph databases actually work. Bad graph queries can be slower than equivalent relational queries, and the patterns that kill performance aren’t always obvious.

The Cartesian Product Problem

The most common performance killer is accidentally creating Cartesian products in your traversals. This happens when you match multiple patterns without connecting them through shared nodes or relationships.

Consider finding products and customers for analysis:

MATCH (p:Product)
MATCH (c:Customer)
WHERE p.category = 'electronics'
  AND c.region = 'APAC'
RETURN p, c

This looks reasonable but is disastrous. You’re matching every electronics product against every APAC customer without any relationship between them. If you have 10,000 products and 50,000 customers, you’re generating 500 million rows.

The fix is ensuring your patterns connect:

MATCH (p:Product)<-[:PURCHASED]-(c:Customer)
WHERE p.category = 'electronics'
  AND c.region = 'APAC'
RETURN p, c

Now you’re only returning products actually purchased by APAC customers, which is probably what you meant in the first place.

Variable Length Path Explosions

Variable length paths are powerful but dangerous. They let you find paths of any length between nodes, which sounds great until you realize how many possible paths exist in a dense graph.

Finding all paths of any length between two nodes:

MATCH path = (a:Person {name: 'Alice'})-[*]-(b:Person {name: 'Bob'})
RETURN path

In a social network where people average 200 connections, this query might evaluate trillions of possible paths. The database will struggle or time out.

Limiting path length helps:

MATCH path = (a:Person {name: 'Alice'})-[*..4]-(b:Person {name: 'Bob'})
RETURN path

Limiting to paths of four hops or fewer dramatically reduces the search space. For many use cases, paths beyond a certain length aren’t useful anyway.

Being more specific about relationship types helps further:

MATCH path = (a:Person {name: 'Alice'})-[:KNOWS|:WORKS_WITH*..4]-(b:Person {name: 'Bob'})
RETURN path

Now you’re only traversing specific relationship types, eliminating large portions of the graph from consideration.

Missing or Ineffective Indexes

Graph databases still benefit enormously from indexes on frequently queried properties. Starting a query by scanning all nodes of a type without an index is slow.

If you frequently query by email:

MATCH (u:User {email: '[email protected]'})
RETURN u

You need an index on User.email:

CREATE INDEX user_email FOR (u:User) ON (u.email)

Composite indexes help when you filter on multiple properties:

CREATE INDEX user_email_status FOR (u:User) ON (u.email, u.status)

Full-text indexes are appropriate for text search needs that don’t match exact values. Regular indexes require exact matches; full-text indexes support tokenization and text search operators.

Starting Points Matter Enormously

Graph traversals should start from specific nodes when possible, not scan entire node labels. Starting from specific nodes and traversing outward is fast. Scanning all nodes of a type to find starting points is slow.

Bad approach:

MATCH (c:Customer)-[:PURCHASED]->(p:Product)
WHERE c.region = 'Europe'
  AND p.category = 'electronics'
RETURN c, p

This scans all customers, filters to Europe, then traverses to products and filters again.

Better approach with index on region:

MATCH (c:Customer {region: 'Europe'})-[:PURCHASED]->(p:Product {category: 'electronics'})
RETURN c, p

If you have indexes on both properties, the database can use them to find specific starting points and only traverse relationships between indexed nodes.

Best approach if you know specific customers:

MATCH (c:Customer)
WHERE c.id IN $customerIds
MATCH (c)-[:PURCHASED]->(p:Product {category: 'electronics'})
RETURN c, p

Starting from specific known nodes is always fastest when possible.

Aggregation Placement

Where you place aggregations in your query significantly impacts performance. Aggregating early reduces the amount of data processed in subsequent operations.

Inefficient aggregation:

MATCH (c:Customer)-[:PURCHASED]->(p:Product)
WITH c, p
MATCH (p)-[:IN_CATEGORY]->(cat:Category)
RETURN cat.name, count(DISTINCT c) AS customers

This carries individual customer-product-category rows through the entire query before aggregating.

More efficient:

MATCH (c:Customer)-[:PURCHASED]->(p:Product)-[:IN_CATEGORY]->(cat:Category)
RETURN cat.name, count(DISTINCT c) AS customers

Doing all matching before aggregating allows the database to optimize better.

Sometimes aggregating mid-query is beneficial:

MATCH (c:Customer)-[:PURCHASED]->(p:Product)
WITH c, count(p) AS purchases
WHERE purchases > 10
MATCH (c)-[:LIVES_IN]->(region:Region)
RETURN region.name, count(c)

Aggregating purchase counts before the second match reduces the number of customers carried forward.

Using OPTIONAL MATCH Carefully

OPTIONAL MATCH is like SQL’s LEFT JOIN—it returns null when the pattern doesn’t match. But like LEFT JOINs, it can significantly impact performance if used carelessly.

Multiple OPTIONAL MATCHes can create explosions:

MATCH (p:Product)
OPTIONAL MATCH (p)<-[:PURCHASED]-(c:Customer)
OPTIONAL MATCH (p)<-[:REVIEWED]-(r:Review)
RETURN p, collect(c), collect(r)

Each OPTIONAL MATCH potentially multiplies the number of rows. If a product has 100 purchasers and 50 reviews, you get 5,000 rows before aggregation.

Separating the optional patterns helps:

MATCH (p:Product)
WITH p
OPTIONAL MATCH (p)<-[:PURCHASED]-(c:Customer)
WITH p, collect(c) AS customers
OPTIONAL MATCH (p)<-[:REVIEWED]-(r:Review)
RETURN p, customers, collect(r) AS reviews

Aggregating after each OPTIONAL MATCH prevents row explosion.

Direction Matters

Relationships have direction, even if your data model treats them as bidirectional. Query performance can vary significantly based on whether you traverse with or against relationship direction.

Generally, traversing in the direction relationships were created is faster. If you create (:Person)-[:KNOWS]->(:Person) relationships, querying with that direction is typically faster than querying against it.

If you need to query both directions regularly, consider creating relationships in both directions or using undirected patterns:

MATCH (a:Person {name: 'Alice'})-[:KNOWS]-(b:Person)
RETURN b

The pattern without arrow direction matches relationships in either direction. This is convenient but potentially slower than directed traversal if indexes and database internals optimize for specific directions.

Limiting and Pagination

Always limit results during development to avoid accidentally returning massive result sets. Add LIMIT even to queries you think will return small results—you might be wrong, and limiting prevents disasters.

For pagination, use SKIP and LIMIT:

MATCH (p:Product)
RETURN p
ORDER BY p.name
SKIP 100
LIMIT 10

This gets the 11th page of 10 results. Be aware that large SKIP values can be slow because the database still has to process the skipped rows.

For better pagination on large datasets, use cursor-based pagination with indexed properties:

MATCH (p:Product)
WHERE p.name > $lastSeenName
RETURN p
ORDER BY p.name
LIMIT 10

This is more efficient for deep pagination than SKIP because it doesn’t require processing all skipped rows.

Profiling Queries

Use EXPLAIN and PROFILE to understand query execution:

EXPLAIN
MATCH (p:Product)<-[:PURCHASED]-(c:Customer)
WHERE p.category = 'electronics'
RETURN c.name, count(p)

EXPLAIN shows the execution plan without running the query. PROFILE actually executes and shows row counts at each step.

Look for:

Scans without index usage (NodeByLabelScan instead of NodeIndexSeek)
High db hits relative to rows returned
Cartesian products (CartesianProduct operator)
Large row counts early in the plan that get filtered later

These indicate optimization opportunities.

Write Query Performance

Write performance has different considerations. Batching writes is crucial—inserting one row at a time is slow.

Bad:

for row in data:
    session.run("CREATE (p:Product {name: $name})", name=row['name'])

Better:

session.run("UNWIND $rows AS row CREATE (p:Product {name: row.name})", rows=data)

Processing multiple rows in a single query reduces transaction overhead and allows better optimization.

MERGE operations are slower than CREATE because they must check for existence. If you know data doesn’t exist, use CREATE. If you’re unsure, use MERGE, but be aware of the performance cost.

Creating indexes before bulk loading speeds up the load because the database doesn’t need to maintain indexes during insertion. Drop indexes, load data, recreate indexes.

When to Use Relational Instead

Sometimes relational databases are better choices. If your queries don’t involve traversing relationships—you’re just filtering and aggregating on properties—a relational database might perform better.

Graph databases excel at:

Traversing relationships of unpredictable depth
Finding paths between nodes
Queries where relationship existence matters more than properties
Exploring connected data interactively

Relational databases often excel at:

Aggregating large amounts of data based on property values
Simple filtering without relationship traversal
Highly structured data with fixed schema
SQL-based reporting and BI tools

Real Performance Tuning

Most graph database performance issues resolve through:

Adding appropriate indexes
Fixing Cartesian products in queries
Limiting variable-length path traversals
Starting queries from specific nodes rather than label scans
Aggregating earlier in queries

These five patterns address 80% of performance problems. Focus here before exotic optimization techniques.

Actual query execution time should guide optimization. Profile queries with real data volumes. What performs well on test datasets might perform terribly at production scale, and vice versa.

Graph databases are powerful tools that enable queries impossible or impractical in relational databases. But that power requires understanding their execution model and writing queries that align with it. The patterns that make relational queries fast don’t always translate to graph queries, and that’s okay—you just need to learn the graph-specific patterns instead.