Why Databricks Delta Sharing is the Future of Data Sharing

by | Mar 13, 2024 | BlogPosts, Databricks | 0 comments

Data sharing is a powerful way to collaborate with your customers, partners, and suppliers on data-driven insights and solutions. However, traditional data sharing methods often involve copying, moving, or replicating data across different platforms, which can be costly, complex, and insecure.

 

That’s why I’m excited to tell you about Databricks Delta Sharing, the first open protocol for secure data sharing across data, analytics, and AI.

Delta Sharing lets you share live data from your lakehouse with any computing platform, without replicating or moving the data. It also integrates seamlessly with Tableau & Power BI Desktop, allowing you to visualize and analyze the shared data with ease.

In this blog post, I’ll highlight three key benefits of Delta Sharing over other data sharing solutions, such as Snowflake. I’ll also show you how easy it is to set up Delta Sharing and connect it to Power BI Desktop.

Benefit #1: Easier to Set Up

One of the main advantages of Delta Sharing is that it is very easy to setup and use. You don’t need to install any software or configure any network settings. All you need is a Delta Sharing endpoint URL and a credential file that grants you access to the shared data.

To share data using Delta Sharing, you simply create a share in your Databricks workspace and add the tables or views that you want to share. You can also apply fine-grained access control policies to limit who can access the data and what they can do with it. Then, you generate a credential file for each recipient and send it to them securely.

To access data shared with you using Delta Sharing, you just need to download the credential file and use it to connect to the Delta Sharing endpoint URL.

You can use any tool that supports the Delta Sharing protocol, such as Spark, Pandas, Tableau or Power BI Desktop.

Benefit #2: Eliminates Compute Costs for Sharing Data

Another major benefit of Delta Sharing is that it eliminates the compute costs associated with data sharing. Unlike Snowflake, which charges you for the compute resources used by both the data provider and the data recipient, Delta Sharing only charges the data provider for the storage costs of the data. The data recipient can use their own compute resources to query the data, without incurring any additional charges from Databricks.

This means that you can avoid the Snowflake Compute Tax, which is the extra cost that Snowflake imposes on data share producers for every query that the data share consumers run. According to current Snowflake documentation, the compute tax can be as high as 512 credits per hour for a 6X-Large warehouse, which is equivalent to $1536 per hour. That’s a hefty price to pay for sharing data with others.

For discussion purposes, I have chosen the biggest warehouse available on Enterprise Edition, however we can all agree that it certainly is not a zero-compute cost for even an XS Snowflake Warehouse on Standard Edition. Regardless of just Snowflake compute costs, someone needs to think about sizing the Snowflake Warehouse and then managing it going forwards! This administrative burden should not be underestimated.

Now I know all the Snowflake Advanced Architects (of which I am one 😇 ) will scream the words Apache Iceberg Tables to me! However, this little note (from Getting Started with Iceberg Tables) needs to be pointed out:

Cross-cloud and cross-region sharing of Iceberg Tables is not currently supported. The provider’s external volume, Snowflake account, and consumer’s Snowflake account must all be in the same cloud region.

I’m also overlooking the fact that Power BI does not have native Apache Iceberg functionality baked into it.

Delta Sharing enables you to share data with an unlimited number of recipients, without needing to be concerned about the specific cloud provider region, compute costs, or performance usage profile.

You only pay for the storage costs of the data, which as we all know are much lower than the compute costs. For example, according to current Azure Databricks’ pricing, the storage cost for Databricks Data Lake is approximately $0.023 per GB per month, which is less than a cent per hour.

Benefit #3: Compatible with Power BI Desktop

The third benefit of Delta Sharing is that it is compatible with Power BI Desktop, the popular business intelligence tool from Microsoft. Power BI Desktop allows you to create stunning reports and dashboards with interactive visualizations and analytics.

To connect Power BI Desktop to Delta Sharing, you just need to use the Delta Sharing connector that is available in the November 2021 release or later. Then, you can use the credential file and the Delta Sharing endpoint URL to access the shared data. You can also refresh the data source to get the latest updates from the data provider.

Once you have connected to the shared data, you can use all the features of Power BI Desktop to explore, transform, and visualize the data. You can also publish your reports and dashboards to Power BI Service and share them with others.

Conclusion

Delta Sharing is a game-changer for data sharing. 

It enables you to share live data from your lakehouse with any computing platform, without replicating or moving the data. It also reduces the compute costs of data sharing and integrates seamlessly with Power BI Desktop.

If you want to learn more about Delta Sharing, Databricks, Snowflake, and Power BI please contact me!

You can also check out the official website, the documentation, or the demo. You can also sign up for a free trial of Databricks and start sharing data today.

This blog is written by Sunny Sharma

Source(s)

  1. Delta Sharing connector — Power Query | Microsoft Learn
  2. Power BI November 2021 Feature Summary | Microsoft Power BI Blog
  3. Power Query Updates in Power BI Desktop November 2021 Release
  4. Power BI November 2022 Feature Summary

Disclaimer: Please note the opinions above are the author’s own and not necessarily my current employer’s opinion. This blog article is intended to generate discussion and dialogue with the audience. If I have inadvertently hurt your feelings in anyway, then I’m sorry.

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *