May 12, 2026

Lakehouse Table Formats in May 2026 — Where They Have Landed

The lakehouse table format race — Iceberg, Delta Lake, and Hudi — has been one of the more interesting infrastructure stories of the last five years. By May 2026 the market has settled in a pattern that was not obvious in 2022 and that is worth setting out for anyone making lakehouse architectural decisions in the current environment.

The Apache Iceberg position:

Iceberg has emerged as the de facto standard for new lakehouse implementations in mid-2026. The broad adoption across the major cloud platforms — AWS, Azure, Google Cloud, and Snowflake’s positioning — has produced a network effect that is difficult to compete with. The Iceberg ecosystem has the broadest engine support (Trino, Spark, Flink, Snowflake, DuckDB, and many others), the most diverse query engine optimisation work, and the largest community of contributors.

The 2024 announcement of Snowflake’s native Iceberg support, followed by Databricks’ Iceberg integration via the Uniform format, effectively removed the major commercial uncertainty about Iceberg’s future. The 2025 and 2026 implementations have proceeded with confidence that the format will be supported and developed for the foreseeable future.

The Delta Lake position:

Delta Lake remains the standard within the Databricks ecosystem and continues to be the format of choice for organisations that have built their data platform on Databricks. The Uniform format work — which allows Delta tables to be read by Iceberg-aware engines — has reduced the platform lock-in concern for Databricks customers and has produced a workable migration path in either direction.

The Delta Lake position is therefore strong within its core market but is no longer competing for the broader cross-platform standard role. The strategic position is “best-of-breed within the Databricks platform” rather than “open standard across all platforms.”

The Apache Hudi position:

Hudi has retained an active developer community and is the strongest of the three for specific workload patterns — particularly upsert-heavy workloads with incremental ingestion. The Uber-originated workloads and the streaming-heavy use cases where Hudi was designed continue to be where it excels.

The broader market position of Hudi has narrowed relative to Iceberg over the last 18 months. The new green-field implementations are more likely to choose Iceberg unless the specific workload pattern is one of Hudi’s strengths. The existing Hudi implementations are continuing to operate and the format is well-supported.

What the May 2026 implementations are doing:

New green-field lakehouse implementations are mostly choosing Iceberg. The market signals are strong enough that the choice is increasingly default rather than evaluated. The exceptions are organisations with significant Databricks commitments, who are mostly continuing with Delta Lake, and organisations with specific upsert-heavy patterns, who are continuing to evaluate Hudi.

The cross-format interoperability work has reduced the cost of the format choice. The Uniform format from Databricks and the equivalent work on the Iceberg side mean that a Delta table can be read by Iceberg-aware engines and an Iceberg table can be read by Delta-aware engines in most cases. The format choice is less of a lock-in commitment than it was in 2023.

The migrations from older formats — from raw Parquet files with Hive-style partitioning, from older proprietary formats — are mostly proceeding to Iceberg in 2026. The migration tooling has improved and the operational work is workable.

The operational maturity:

The operational considerations around lakehouse tables — file size optimisation, partition strategy, compaction scheduling, metadata management — have matured significantly through 2024 and into 2026. The engineering teams running serious lakehouse implementations have absorbed these operational disciplines as standard practice.

The query performance for analytical workloads on lakehouse tables has continued to improve. The optimised file layouts, the metadata caching improvements, and the query engine optimisations have closed much of the gap to traditional data warehouse performance for most analytical workloads.

The cost picture has continued to favour lakehouse approaches over traditional data warehouse approaches for large-volume analytical workloads. The separation of storage and compute, the open-format storage, and the choice of query engines produce a cost structure that has held up well.

The catalog question:

The catalog layer — the metadata about which tables exist where, the schema information, the partition information — has been the area of most active competitive activity in the last 12 months. The Iceberg REST catalog specification has been finalised and is being implemented broadly. The competitive catalogs from the major platforms (AWS Glue, Snowflake’s Polaris, Databricks’ Unity Catalog) are all converging on similar capability profiles with different commercial models.

The choice of catalog is now one of the more consequential lakehouse architectural decisions. The catalog effectively defines which engines and which platforms can work with your data, and the long-term portability of the architecture is shaped by the catalog choice as much as by the table format choice.

For data architects making lakehouse decisions in May 2026, the read is that the major architectural questions have clearer answers than they did even 18 months ago. Iceberg is the default for cross-platform implementations. Delta Lake remains strong inside Databricks. Hudi has a defensible niche. The catalog choice deserves careful evaluation. The interoperability has improved enough that the format choice is less consequential than it was in 2023.

The architectural pattern that is settling out is: open table format (Iceberg in most cases), cloud-native object storage, decoupled compute via multiple query engines, and a strategic catalog choice that maintains the architecture’s portability over time. The teams that have implemented to this pattern are well-positioned for the next decade of analytical workload growth.