Migration of Snowflake Data Warehouse to Microsoft Fabric Using ChatGPT

26th February 2024 by

Digambar Jadhav in Blog, Data & AI, Microsoft Fabric

Migration of Snowflake Data Warehouse to Microsoft Fabric Using ChatGPT

Introduction

The decision to migrate from a Snowflake Data Warehouse to Microsoft Fabric can be driven by several factors including specific business needs, technical requirements, different platform capabilities, familiarity of environment, or existing heavy investments into the Microsoft technology stack.

In this guide, we will navigate through the intricacies of the migration process, leveraging the power of ChatGPT to streamline and simplify each step and make the migration seamless and efficient.

The activity is broken into two parts:

Migrate objects
Move or copy data

Prerequisites

Before we begin, ensure you have the following in place:

Snowflake account
ChatGPT free subscription/account
Microsoft Fabric account

Part 1: Object Migration

This entry will illustrate migrating Snowflake objects, including tables, views, suser defined functions (UDFs), and SQL stored procedures.

Note: Snowflake supports stored procedures in various languages such as Java, JavaScript, Python, Scala, and SQL. However, Microsoft Fabric only supports SQL stored procedures. Additional language support may be introduced as Fabric evolves over time.

For demonstration purposes, we will consider the Snowflake sample dataset “TPCDS_SF100TCL” under the shared database “SNOWFLAKE_SAMPLE_DATA”. Let’s focus on one simple fact table (Inventory) and three dimensions (Item, Warehouse, and Date_Dim).

Since these four objects are part of the Snowflake Data Share, we cannot directly view the object Data Definition Language (DDLs). As a workaround, we will create a sample database, “POC_DB,” within snowflake and copy both the data and structure from the Snowflake shared database.

Migration Steps

Step 1: Create Sample Database and Copy Data

Execute the following SQL code within Snowflake data warehouse using Snowsightto prepare the data:

--create DB

create database POC_DB;

--copy 4 tables from share to EMP_DB

create or replace table warehouse as select * from SNOWFLAKE_SAMPLE_DATA.TPCDS_SF100TCL.warehouse;

create or replace table inventory as select * from SNOWFLAKE_SAMPLE_DATA.TPCDS_SF100TCL.inventory;

create or replace table item as select * from SNOWFLAKE_SAMPLE_DATA.TPCDS_SF100TCL.item;

create or replace table date_dim as select * from SNOWFLAKE_SAMPLE_DATA.TPCDS_SF100TCL.date_dim;

--Since Example database has only tables we will create one view on top of base tables & one Snowflake procedure too.

--Create view

create or replace view INVENTORY_VW as

SELECT

S_INVENTORY."INV_DATE_SK" AS INV_DATE_SK,

S_INVENTORY."INV_ITEM_SK" AS INV_ITEM_SK,

WAREHOUSE."W_WAREHOUSE_SK" AS INV_WAREHOUSE_SK,

S_INVENTORY."INV_QUANTITY_ON_HAND" AS INV_QUANTITY_ON_HAND

FROM

INVENTORY as S_INVENTORY

LEFT OUTER JOIN WAREHOUSE ON (S_INVENTORY."INV_WAREHOUSE_SK" = WAREHOUSE."W_WAREHOUSE_SK")

LEFT OUTER JOIN ITEM ON (S_INVENTORY."INV_ITEM_SK" = ITEM."I_ITEM_SK" AND ITEM."I_REC_END_DATE" IS NULL)

LEFT OUTER JOIN DATE_DIM ON (DATE_DIM."D_DATE_SK" = S_INVENTORY."INV_DATE_SK");

--Create PROCEDURE

CREATE OR REPLACE PROCEDURE output_message(message VARCHAR)

RETURNS VARCHAR NOT NULL

LANGUAGE SQL

AS

BEGIN

RETURN message;

END;

CALL output_message('Hello World');

--Function

create or replace function EMP_DB.PUBLIC.salary_for_employee(EMP_ID varchar)

returns table (EMPLOYEE_NAME varchar, SALARY numeric(11, 2))

as

$$

select top 1 EMPLOYEE_NAME, SALARY

from EMP_DB.PUBLIC.EMPLOYEES

where EMPLOYEE_ID = EMP_ID

$$

;

select * from table(EMP_DB.PUBLIC.SALARY_FOR_EMPLOYEE('3'));

--End Of Object Creation

Step 2: Get Source Objects Structure

Use the following command to download the source object structure in CSV or TSV format:

Select get_ddl('DATABASE','POC_DB');

Caution: Provide the schema name “POC_SCHEMA” as a prefix for tables accessed in views. The Get_DDL function does not add that prefix. Also, check for case-sensitive column names and table names.

Step 3: Use ChatGPT for Conversion

Now, utilize ChatGPT to convert the Snowflake DDLs into SQL Server-compatible DDLs. If you’re using ChatGPT for the first time, sign up here.

Use the prompt “Convert the below Snowflake SQL code to T-SQL compatible SQL code” and copy-paste each DDL one at a time or all DDLs at once. You will receive a detailed output indicating the changes made in the resultant query.

Sample ChatGPT Output:

A Sample ChatGPT output script is attached herewith.

Open Microsoft Fabric and create a new warehouse or use an existing one. Create a new SQL query and run the converted SQL scripts to create the corresponding objects in the Fabric Data Warehouse.

All tables, views, and SQL stored procedures will be created using this approach.

Part 2: Copy Data

Additional Prerequisites:

S3 bucket
IAM User with Access Key/Secret Key.

Data Copying Steps:

Step 1: Unload Snowflake Tables to S3 Bucket

There are various ways to copy or move data from Snowflake to Microsoft Fabric. In this demo, we’ll follow the conventional approach of unloading data from Snowflake and then copying it into Fabric. My snowflake account is hosted on AWS & unloading data from Snowflake to cloud provider where its hosted is seamless, cheap & fast.

To unload Snowflake tables onto an external stage, you’ll need an S3 bucket and IAM user credentials. Run the following unload commands on Snowsight console within snowflake account to move Snowflake data to the S3 bucket:

COPY INTO 's3://poc-snowflake-171003/files/warehouse/'

FROM SNOWFLAKE_SAMPLE_DATA.TPCDS_SF100TCL.warehouse

CREDENTIALS = (AWS_KEY_ID='XXXXXXX' AWS_SECRET_KEY='XXXXXXX')

FILE_FORMAT = (TYPE = PARQUET) overwrite =TRUE;

COPY INTO 's3://poc-snowflake-171003/files/date_dim/'

FROM SNOWFLAKE_SAMPLE_DATA.TPCDS_SF100TCL.date_dim

CREDENTIALS = (AWS_KEY_ID='XXXXXXX' AWS_SECRET_KEY='XXXXXXX')

FILE_FORMAT = (TYPE = PARQUET);

COPY INTO 's3://poc-snowflake-171003/files/ITEM/'

FROM SNOWFLAKE_SAMPLE_DATA.TPCDS_SF100TCL.ITEM

CREDENTIALS = (AWS_KEY_ID='XXXXXXX' AWS_SECRET_KEY='XXXXXXX')

FILE_FORMAT = (TYPE = PARQUET);

COPY INTO 's3://poc-snowflake-171003/files/INVENTORY/'

FROM SNOWFLAKE_SAMPLE_DATA.TPCDS_SF100TCL.INVENTORY

CREDENTIALS = (AWS_KEY_ID='XXXXXXX' AWS_SECRET_KEY='XXXXXXX')

FILE_FORMAT = (TYPE = PARQUET)

Screenshot showing S3 Bucket where snowflake data is unloaded.

Snowflake Data Warehouse to Microsoft Fabric

Screenshot showing parquet data files unloaded for Inventory table.

Step 2: Create a Lakehouse & S3 Shortcut in Microsoft Fabric

Steps to create a lakehouse

Sign in to Microsoft Fabric.
Switch to the Data Engineering experience using the experience switcher icon at the lower left corner of your homepage.
Select Workspaces from the left-hand menu.
To open your workspace, enter its name in the search textbox located at the top and select it from the search results.
In the upper left corner of the workspace home page, select New and then choose Lakehouse.
Give your lakehouse a name and select Create.
A new lakehouse is created and if this is your first OneLake item, OneLake is provisioned behind the scenes.

What are Fabric shortcuts

Shortcuts in Microsoft OneLake allow you to unify your data across domains, clouds, and accounts by creating a single virtual data lake for your entire enterprise

Create an Amazon S3 shortcut

Prerequisites

If you don’t have a lakehouse, create one by following these steps: Creating a lakehouse with OneLake.

Ensure your chosen S3 bucket and IAM user meet the access and authorization requirements for S3 shortcuts.

Steps:

Open a lakehouse.
Right-click on a directory within the Lake view of the lakehouse.
Select New shortcut.Under External sources, select Amazon S3.
Enter the Connection settings like Access Key, Secreat keys etc.
Enter a name for your shortcut.
Select Create.
The lakehouse automatically refreshes. The shortcut appears under Files in the Explorer pane

Create an S3 shortcut within Fabric Lakehouse to access the S3 location where the files have been unloaded.

Step 3: Copy Data to Microsoft Fabric

Use a Fabric notebook or Fabric Data Factory pipeline copy activity to copy data from the S3 shortcut into the Fabric Warehouse tables.

Screenshot showcasing Copy Activity Source Connections

Screenshot showcasing Copy Activity Destination Connections

Migration of Snowflake Data Warehouse to Microsoft Fabric

Screenshot showcasing Copy Activity Execution details

Screenshot showcasing one copy activity for each table in sequence

Conclusion:

This demonstration serves as a foundational proof of concept, offering a glimpse into the preliminary stages of a comprehensive migration project. Should you require expert guidance or assistance in executing a full-scale migration, we invite you to reach out to our technical team for personalized support and solutions.

Get Started

Explore a seamless transition from Snowflake data warehouse to Microsoft Fabric with our expert assistance. Contact us to learn more about our comprehensive data migration solutions.

Your data transformation journey starts here!