OneLake: Microsoft Fabric’s Open Storage Architecture
OneLake: Microsoft Fabric’s Open Storage Architecture
- August 9, 2023
- 5 Min Read
- Venkataramana
Data practitioners encounter various challenges every day. Data is typically dispersed across several sources, in a variety of file types, and too often with questionable degrees of quality. It can take a lot of time to determine where data might be located and what the access rights are. Consequently, less time is spent on what really counts, namely using the data to make decisions.
To address the issues surrounding data fragmentation, Microsoft introduced a lake-centric and open architecture in their new Fabric offering. While each tool within Microsoft Fabric caters to specific requirements, they all share a common data foundation: OneLake, based on the increasingly popular “data lakehouse” paradigm.
The Data Lakehouse, A Pattern Aiming for the Best of Data Lakes and Warehouses
The lakehouse term and pattern were initially made popular by Databricks in the past few years as a way to bring governance and structure to the vast amounts of unstructured raw data typically loaded in data lakes. Lakehouses follow an Extract Load Transform (ELT) pattern also known as “schema on read”, as opposed to the ETL pattern associated with “schema on write” in traditional data warehouses.
This architectural shift has been driven by the explosion in the volume and variety of data and enabled by massive public cloud infrastructure investments. The lakehouse aims to combine cheap data lake storage with the sense of structure and the ability to run efficient aggregate queries associated with data warehouses.
Microsoft’s Take on the Lakehouse
“OneLake is the OneDrive for data and like OneDrive, OneLake is provisioned automatically with every Fabric tenant with no infrastructure to manage.” -Microsoft
Microsoft already started supporting lakehouses years ago with Azure Synapse Analytics. OneLake now underpins Microsoft’s expanded vision of the lakehouse architecture. This unified, logical data lake supports lakehouses, warehouses and all other workloads.
comes automatically provisioned for each tenant and is the home for all your data, moving away from siloes and the need to copy data to multiple places where you need it. Under the hood, it is still Azure Data Lake Storage (ADLS) Gen2 where Parquet files are managed with Delta metadata.
This means that Fabric does not have vendor lock-in and that the data in your OneLake may be used with a wide range of tools and technologies. Fabric’s many compute engines rely on Delta/Parquet format, providing you with a stable basis and reducing the need for format changes.
You can also easily bring data from the outside without copying it. Shortcuts are embedded references within OneLake that point to other storage locations. The embedded reference makes it appear as though the files and folders are stored locally but in reality, they exist only in their original storage. Shortcuts can be updated or removed, but these changes don’t affect the original data and its source.
OneLake supports existing ADLS Gen2 APIs and SDKs giving you the ability to hook up existing applications and tools to a OneLake endpoint.
There are several ways to get your data into the OneLake:
- Manually with the OneLake file explorer
- Data Factory pipelines
- Notebooks
- Shortcuts are a way of creating a symbolic link to external storage locations and file paths (currently supporting ADLS Gen 2 and AWS S3 buckets)
The Delta metadata format offers optimized storage for data engineering workflows with versioning, schema enforcement, and (see sidebar) to guarantee referential integrity. It also integrates well with Apache Spark, making it ideal for large-scale data processing application.
From Unmanaged to Managed Data Through the Medallion Architecture… And the Lasting Need for Data Modeling & Integration
Not all the data needs to be managed or is ready to be managed yet. In fact, there are many cases where data is ingested in raw form and should be preserved in that state, if only for auditing and traceability purposes. For these cases, OneLake provides the Files section where you can store and access any file format. Data stored in those files is called unmanaged data.
However, in the lakehouse architecture, tables play a vital role in managing and organizing data. Once you set up tables you have several new options for browsing, querying, and analyzing them. Data stored in those tables is called managed data.
Lakehouse menu within Fabric, showing the Tables and Files sections in Explorer.
Both managed and unmanaged data can be stored and handled using a medallion architecture where data is corralled through three layers of progressive refinement, with the aim of turning low-grade raw ore into ready-for-analytics gold. That gold layer at the end of the medallion architecture should be ready for consumption in a semantic model – typically a Power BI dataset – without requiring further transformations.
This brings up a debate that somehow never seems to get resolved in the data industry. Every few years, someone will loudly proclaim the end of dimensional modeling in favor of wide tables that blend granular facts with their descriptive attributes. “Hey, it worked in our startup and established data patterns are now obsolete!”. Let’s be clear, this approach will just not perform well for large data models in Power BI, so it can be ruled out purely on technical grounds if Power BI is part of your data stack.
Perhaps more importantly, trying to dodge the analytical work required to reconcile various data sources will lead to analytical silos for lack of conformed dimensions. The integration mantra championed by Bill Inmon since the 1990s is more relevant than ever. And the dimensional modeling following long-established models promoted by Ralph Kimball continue to be relevant today.
The star schema still rules for Power BI performance, and making sense of your business entities across the enterprise is still needed sooner or later. The good news is (and we’ll get back to it in future entries) Fabric has all the tools to ingest raw data and turn it into a data model that’s optimally stored physically and structured logically for Power BI consumption and other workloads, based on the OneLake foundation. The core pattern remains ELT, not just EL!
Contact us for a free strategy briefing to discuss how Microsoft Fabric can be part of your data infrastructure roadmap.
About Author
With over 20 years of experience, Venkat is an accomplished senior professional who excels in managing teams and driving success in Data, Analytics, Cloud Computing, and Digital Transformation. He plays a pivotal role in shaping strategy and architecture, transforming ad-hoc assessments into scalable software solutions. His expertise lies in enhancing the effectiveness and usability of analytics platforms by collaborating closely with stakeholders and strategic partners.
SHARE
Related Blog Posts
Top Technology Trends in Automotive Industry
Technological innovations like AI, autonomous vehicles, and AR are revolutionizing the automotive in...
Telecom Industry Trends: Shaping the Future in 2024
Explore the top 5 telecom trends for 2024: 5G expansion, network virtualization, edge computing, cyb...
6 Benefits of Adopting Low-Code No-Code Platforms for Businesses
Unlock business potential with low-code/no-code platforms: fast development, cost savings, accessibi...
Revolutionizing Industries with Power Platform: Case Studies and Insights
Explore transformative technologies like AI, Quantum Computing, and Industry Cloud Platforms, set to...
Top 6 Emerging Technologies in 2024: A Glimpse into the Future
Explore transformative technologies like AI, Quantum Computing, and Industry Cloud Platforms, set to...
Top Technology Trends of 2023: A Year in Review
Explore 2023's pivotal tech trends: Generative AI's impact, Blockchain's trust-building, Low/No Code...
How Top Industries can benefit most from Data Science & AI
Explore the revolutionary role of Data Science and AI in propelling industries forward. From reimagi...
6 Guided Strategies for Microsoft Power Platform Implementation
The Microsoft Power Platform offers organizations the ability to accelerate digital transformation w...
Choosing the Right Cybersecurity Services Partner: Step-by-Step Guide
In this blog, we'll guide you through the crucial process of selecting the perfect cybersecurity all...
The Value of Regular Security Audits: Safeguarding Your Digital Fortress
Imagine your company's digital infrastructure as a castle and its data as your most treasured posses...
Cybersecurity Awareness Training: Arm Your Team Against Digital Threats
While most organizations invest in state-of-the-art security solutions, there’s often an overlooked...
The Financial Impact of Cyber Breaches on Businesses: Direct & Hidden Expenses
Cyber breaches cost businesses millions, with both immediate and long-term financial impacts. Beyond...
How Technology can help to Bolster Employee Engagement and Happiness
Unlock employee happiness and engagement with technology. Discover strategies like flexible work, co...
Why is Beak the Ultimate AI-Based Solution for Your IT Infrastructure Challenges
Discover Beak - An Intelligent GPS for Infrastructure Monitoring, SOC, NOC & RMM. Streamline ope...
Microsoft Fabric: Unleashing the Power of Next-generation Data Analytics with AI Capabilities
Explore Microsoft Fabric, the cutting-edge data analytics platform that combines AI capabilities wit...
Streamlining Your Migration from Crystal Reports to Power BI
iLink Digital specializes in seamless Crystal Reports to Power BI migration. Explore feature compari...
Streamline Your Business with ServiceNow Bonding: Simplifying Integrations
In today's interconnected business landscape, seamless data exchange between systems is crucial for...
A New Way of Building Attended Automations with UiPath Apps, UiPath Forms & Triggers, and FromIo
Building attended automation is crucial for businesses seeking operational efficiency and improved u...
Conversation AI Vs. Generative AI: Decoding the Difference
In this blog post, we delve into the unique realms of conversational AI and generative AI. We explor...
5 Tips to keep your Salesforce Org Health in Top Shape
As a business leader, it's crucial to prioritize the health of your Salesforce org to ensure optimal...
Ace your Qlik to Power BI Migration in 10 Steps
Are you planning to migrate from Qlik to Power BI? The process can be challenging, requiring careful...
Why Your Business Should Migrate from Cognos to Power BI?
Learn why businesses are choosing to migrate from Cognos to Power BI and how it can maximize the val...
Aligning DevOps with AWS: Development Stage [Part 4 of 9]
Discover the power of DevOps with AWS in the Development stage! Leverage services like AWS Cloud9, C...
Modernization to Elevate IT Resilience: Answering Why & How?
Discover how modernizing your systems can significantly improve your business's IT resilience. In to...
Chatbots for Customer Service: A Must in 2023?
Driven by artificial intelligence, chatbots are shaping the future of customer service with their tr...
5 Strategies for Maximizing Business Value on Your Cloud Journey
In today’s digital era, harnessing the power of the cloud has become an indispensable element for bu...
Maximizing Revenue and Driving Growth with Salesforce Revenue Intelligence
In today's data-driven business landscape, maximizing revenue and driving growth is crucial for comp...
Why Power BI is a Game-Changer for Your Business Intelligence Needs
Power BI is a powerful business intelligence tool that enables organizations to make data-driven dec...
MULTI-TENANCY ON OUTSYSTEMS: Answering How & Why?
OutSystems is a low-code platform that offers multi-tenancy support, a critical feature for modern a...
Greening the Cloud: How Cloud Computing Can Help the Environment?
Cloud migrations have the potential to reduce energy consumption by 65% and carbon emissions by 84%...
Conversational AI in Insurance Industry: Top Use Cases to Explore
Looking to explore the potential of Conversational AI in the insurance industry? Our in-depth blog p...
5 Ways Companies can lower their Carbon Footprint and Contribute to a Greener Future
As we navigate through the climate crisis, the need for businesses to prioritize carbon management h...
Mastering Salesforce Queues: A Comprehensive Guide to Boosting Your Productivity
As a sales professional, you're always looking for ways to streamline your work and be more producti...
6 Technologies to help your Business Achieve Sustainability Goals in 2023
Many corporate leaders are also discovering that sustainability can deepen their organization’s sens...
How IoT is reinventing Manufacturing and Supply Chains Industries in 2023?
IoT has transformed manufacturing operations and supply chain management by increasing operational s...
The Future of IoT: Trends and Predictions for 2023
The Internet of Things (IoT) has come a long way since its inception over a decade ago.
Aligning DevOps with AWS: Planning Stage [Part 3 of 9]
In this article, 3rd in the series, we will discuss the Planning stage of DevOps using AWS and intro...
Cloud-Based RPA: The Next Frontier in Automation
Automation has become a buzzword in the business world, and for a good reason. Companies are embraci...
How can Businesses use ChatGPT to upgrade their Customer Services?
Businesses can leverage ChatGPT to take their clients’ experience to the next level. For example, re...
Agile Technologies: Revolutionizing Business Efficiency and Innovation
Businesses that use agile technologies have gained insights, worked faster, and built stronger relat...
Top 7 Salesforce Trends To Follow in 2023
As one of the most powerful CRM platforms, Salesforce assists businesses to build customer databases...
What is Data Mesh? | Architecture, Principles, and Benefits
What is Data Mesh? Data mesh is a decentralized data architecture that groups data according to a pa...
Understanding Data Fabric, its Key Components & Benefits.
Data fabric integrates and connects to your organization’s data while removing the complexities invo...
What is IoT Analytics and Why Business Leaders should care?
48% of companies use IoT in their business. Imagine the amount of customer data being collected. Wit...
Aligning Services with DevOps Stages [DevOps with AWS – Part 2 of 9]
One popular platform for implementing DevOps practices is Amazon Web Services (AWS). In this article...
9 Best Practices for Protecting Data Privacy in 2023 and Why they shouldn’t be disregarded.
The average cost of a data breach is currently $4.35 million, and that amount will only increase. Al...
Generative AI in Cybersecurity: A Double-Edge Sword or Cyber Shield?
Introduction According to Gartner, 80% of businesses are likely to adopt Generative AI in their di...
iLink Digital Acquires Majority Stake in Market Fusion Analytics to expand Data and AI capabilities
iLink Digital Acquires Majority Stake in Market Fusion Analytics to expand Data and AI capabilities...
How Can Businesses Combat AI-Powered Cyberattacks?
Introduction Did you know that 60% of organizations are unprepared to defend against the growing t...