Home > Research > Databricks Lakehouse Combines the Best of Data Lake and Data Warehouse in a Single Platform

Databricks Lakehouse Combines the Best of Data Lake and Data Warehouse in a Single Platform

Databricks users can now work with a network made up of Fivetran, Qlik, Infoworks, StreamSets, and Syncsort to automatically load data into the lakehouse. “Lakehouse” is a new term coined by Databricks to combine the best aspects of data warehouses and data lakes. This can be a significant value-add for an organization looking to combine business intelligence (BI) and machine learning (ML) use cases.

Source: Databricks, 2020

Databricks’ lakehouse provides the following key features:

  • ACID transaction support
  • Schema enforcement and governance
  • BI support
  • Storage is decoupled from compute
  • Open data, integration, and tools standards
  • Support for diverse data types ranging from unstructured to structured data
  • Support for SQL, ML, and other frameworks
  • End-to-end real-time streaming

Our Take

The data lakehouse and the idea of providing a single unified data platform is not new. Vendors like Azure Synapse, Snowflake, and Amazon Redshift try to innovate the traditional data storage and processing platform. However, many of them are not fully functional. Technology offerings from some of these vendors are a mix of strengths and weaknesses. An organization must carefully evaluate their core mandatory requirement prior to adopting such a broad platform, as some critical functions needed in ETL or SQL may be missing in these technologies for some time to come.


Want to Know More?

Architect Your Big Data Environment