test

April 23, 2024

Microsoft Fabric vs. Databricks: Comparison of the Leading Data Platforms

| April 23, 2024

Cloud computing, big data, and AI are dramatically transforming the Data Analytics field. Databricks has emerged as a leading player in recent years by offering solutions focused on big data processing, machine learning, and being available on all three major cloud platforms (Azure, GCP, and AWS). However, Microsoft joined the competition last year by integrating many of its existing data tools to create a unified data platform, Microsoft Fabric. While many organizations have already implemented Databricks, some, especially those in the Azure cloud, are beginning to look at Fabric. The purpose of this article is to give a comprehensive comparison of Microsoft Fabric vs. Databricks. 

Background 

Microsoft Fabric

Fabric is a unified data platform launched by Microsoft in 2023, designed to cover all parts of the data stack. It has consolidated and re-developed many of Microsoft’s existing data products, such as Power BI, Azure Data Factory, and Synapse, and integrated them into the new platform. Fabric has two core concepts: (1) “Compute”, and (2) “Storage”. "Storage" is where the data is stored across various Lakehouses and Warehouses in OneLake, while "Compute" is the data engineering, analytics, and science tools to work with that data. 

fabric platform

Databricks 

Databricks is a cloud-based data platform that specializes in big data analytics, artificial intelligence, and distributed computing. The startup was founded in 2013 by the creators of the popular open-source project for distributed computing, Spark. While Databrick’s core competency is in the processing and analyzing of big data, it has developed (and acquired) many solutions to address other parts of the data lifecycle, such as data visualization, data sharing, and data governance. 

databricks platform

Feature Comparison 

 

fabric logo

databricks-logo

Data Engineering 

Strong suite of tools to perform various data engineering tasks with several code-free options: 

  • Data Factory – App with GUI components to execute, orchestrate, and schedule ETL/ELT pipelines. No code necessary. Comes with prebuilt data connectors. 
  • Notebooks – Supports Python, Spark, SQL, etc... 
  • Data Flows – Low code tool to connect different data sources, transform data, and load into different database destinations. 

Working with big data using code and distributed computing is core to Databricks and they have several tools, such as: 

  • Notebooks – Supports Python, Spark, R, Scala, SQL, etc... 
  • Workflow – Orchestrate and schedule notebooks via Jobs. 
  • Cluster Management – Powerful features to configure different Spark clusters that run notebooks and jobs. 

Data Warehouse 

Data is stored in Lakehouses (Spark optimized) or Warehouses (T-SQL optimized). All data is in delta parquet files under the hood in OneLake. 

Data is stored in Lakehouses with Databricks SQL “Pro” and “Severless” options to query it. All data is in delta parquet files under the hood. 

Data Integrations 

Dozens of pre-built connectors to other data warehouses (i.e. Snowflake) and source systems (i.e. Google Analytics, Smartsheet, etc...) available in data pipelines and data flows. Additionally, Dataverse is a direct connection to Microsoft apps like Dynamics CRM and F&O. Finally, Shortcuts are a way to directly connect to cloud storage (i.e. AWS S3) in OneLake. 

Lakehouse Federation is a direct connection to other data warehouses (i.e. Snowflake) available within Databricks Lakehouses. 

Data Visualization 

Power BI is native to Fabric, and is arguably the industry leader in data visualization, beating out Tableau in recent years. 

Two options, both of which are rudimentary with basic charts and minimal customization: 

  1. SQL Dashboards – Create visualizations using SQL.
  2. Lakeview Dashboards – Create visualizations using Python/Spark. 

Data Science  

(AI, ML, and LLMs) 

Create and deploy ML models using notebooks and SynapseML. Integration with Azure OpenAI services to run LLM tasks is also available. 

Create models using ML Flow and deploy API endpoints with Model Serving. Popular open source LLMs are available on the platform. GPUs are also available for clusters and Model Serving. 

Copilots 

Copilot in Fabric is a chatbot powered by the Azure OpenAI service and is available in various parts of the Fabric ecosystem including notebooks and Power BI. Only available to customers on a F64 SKU or higher, which is >= $5.5k/mth. 

Databricks AI Assistant is a chatbot powered by DatabricksIQ and lives inside of the Databricks workspace, available in notebooks and in the SQL query editor. Available to all Databricks customers at no additional cost. 

Marketplace 

n/a 

The Databricks Marketplace hosts data sets, LLM & ML models, notebooks, etc... from third parties that organizations can bring into their platform Free and paid products are available. 

Source Control 

Git integration with Azure DevOps. 

Git integrations with GitHub, GitLab, Azure DevOps, Bitbucket, etc... 

Setup 

Simple. Create a Fabric Compute cluster in Azure, then a Workspace in Fabric, and then can create individual resources (i.e. notebooks). Much of the compute and storage work happening underneath is abstracted away so less setup is required. 

More complicated. Need to setup a "Unity Catalog", which has several steps, then create Workspaces, then Clusters, and then can finally create resources. Setup steps can also vary by cloud provider. 

Cloud Environments 

Available on Azure only. 

Available on GCP, AWS, and Azure. 

Pricing 

Three components: 

  1. Fabric Capacity (SKUs) – Fixed monthly costs with a tiered SKU system based on compute. F2 is cheapest at $170/mth.
  2. Power BI – Pay per user starting at $10/usr/mth for Pro licenses. If on F64 or higher, then only need licenses for report creators.
  3. One Lake Storage – Storage is relatively cheap. 

Two components: 

  1. Compute (DBUs) - Pay as you go. DBUs is a compute unit utilized across all Databricks products (i.e. notebooks, queries, jobs, etc...) and is charged by the hour. Only pay for what you use, no minimums. 
  2. Storage – Varies by cloud providers and is relatively cheap. 

 

For more information on these features and topics, check out their respective docs: 

Conclusion 

Overall, Fabric is simpler to use and manage, has better low code ETL/ELT tools, more prebuilt data connectors, and a much better data viz product. However, Databricks is multi-cloud, has more powerful tools for working with big data, and has a more comprehensive toolset for data science (ML/AI/LLMs). The best solution for your business ultimately depends on your use cases and requirements.

Schedule a call with Max Mershon, Principial Consultant in Data & Analytics at Pioneer, to discuss how your business can effectively utilize Microsoft Fabric and/or Databricks to meet your data needs.