Autonomous Data Catalog Enrichment

Introduction

In this guide, you’ll see how to supplement the Harvester’s catalog with additional data from Alation. This process involves uploading a CSV file containing Alation metadata, creating a new table in Snowflake to store that metadata, and then updating the Harvester’s data with the newly loaded information.

IMPORTANT: The CSV from Alation has headers that include colons in their names, and its values are delimited by semicolons. This format must be handled carefully when uploading and loading into Snowflake to avoid alignment issues.

Key Objectives

Upload Alation CSV
Upload the Alation CSV (which has unique header formatting) via Streamlit and save it into a Snowflake stage.
Create the ALATION_EXTRACT Table
Make a new table in Eve’s workspace schema to store the extracted Alation metadata, preserving column definitions tailored for wide text fields.
Load the CSV Data
Instruct Eve to load the staged CSV file into the new ALATION_EXTRACT table.
Verify Alignment
Confirm that each column from the CSV aligns properly with the new table’s columns, troubleshooting any issues if necessary.
Create a MAPPING_DATABASE_SCHEMA_VIEW
Join the newly loaded Alation data with existing Harvester results to enrich your catalog.
Update the Harvester
Update rows in the HARVEST_RESULTS table with the DESCRIPTION from Alation data and set catalog_supplement_loaded = 'FALSE' to trigger the re-harvest process.
Wait for Harvester Reprocessing
Allow time for the Harvester to reprocess and integrate the new supplement data, after which future queries will include this enriched information.

Step-by-Step Walkthrough

Upload the Alation CSV to Streamlit

Use your Streamlit interface to upload the Alation CSV file. Ensure you specify the special delimiters (columns separated by semicolons, headers with colons) so it is parsed correctly. Confirm that each column is recognized and that there are no formatting errors during upload.

Save the CSV in a Snowflake Stage

Once uploaded, ask Eve to save this CSV file into a Snowflake stage. This temporary stage location allows Snowflake to later reference the file for loading into a table.

Create the ALATION_EXTRACT Table in Eve's Workspace

Ask Eve to run the following SQL in her workspace schema:

CREATE OR REPLACE TABLE ALATION_EXTRACT (
    AL_DATADICT_ITEM_PROPERTIES VARCHAR(16777216),
    AL_DATADICT_ITEM_COLUMN_DATA_TYPE VARCHAR(16777216),
    KEY VARCHAR(16777216),
    TITLE VARCHAR(16777216),
    DESCRIPTION VARCHAR(16777216),
    CDE VARCHAR(16777216),
    CLASSIFICATION VARCHAR(16777216),
    DG_ENDORSEMENT_DATE VARCHAR(16777216),
    OWNERS_GROUPPROFILE VARCHAR(16777216),
    OWNERS_USER VARCHAR(16777216),
    PRESENTATION VARCHAR(16777216),
    PS VARCHAR(16777216),
    REGIONAL_OFFICE VARCHAR(16777216),
    SCHEMA_ROLE_NAME VARCHAR(16777216),
    SHARED_DATA VARCHAR(16777216),
    SMES_GROUPPROFILE VARCHAR(16777216),
    SMES_USER VARCHAR(16777216),
    TRANSFORMATION VARCHAR(16777216),
    UPDATE_FREQUENCY VARCHAR(16777216)
);

Load the CSV Into ALATION_EXTRACT

Instruct Eve to load the data from the Snowflake stage into the ALATION_EXTRACT table. Make sure the file format matches the CSV’s semicolon delimiters and handles quoted headers with colons. If any alignment issues arise, ask Eve to adjust the file format or column mapping accordingly.

Validate the Data

After loading, ask Eve to query a few rows, for example:

SELECT * 
FROM ALATION_EXTRACT 
LIMIT 5;

Confirm that each column from the CSV is in the correct corresponding column in ALATION_EXTRACT. If there are discrepancies, ask Eve to correct them and reload.

Create the Mapping Database Schema View

Next, create a view to merge the newly loaded Alation metadata with existing Harvester results. Ask Eve to run the following:

CREATE OR REPLACE VIEW GENESIS_TEST_JL.EVE_WORKSPACE.MAPPING_DATABASE_SCHEMA_VIEW AS
SELECT 
    h.SOURCE_NAME,
    h.DATABASE_NAME,
    h.SCHEMA_NAME,
    h.TABLE_NAME,
    ('"' || h.DATABASE_NAME || '"."' || h.SCHEMA_NAME || '"."' || h.TABLE_NAME || '"') AS FULLY_QUALIFIED_NAME,
    SPLIT_PART(a.AL_DATADICT_ITEM_PROPERTIES, ';', 1) AS OID,
    SPLIT_PART(a.AL_DATADICT_ITEM_PROPERTIES, ';', 2) AS OTYPE,
    a.AL_DATADICT_ITEM_PROPERTIES,
    a.AL_DATADICT_ITEM_COLUMN_DATA_TYPE,
    a.KEY,
    a.TITLE,
    a.DESCRIPTION,
    a.CDE,
    a.CLASSIFICATION,
    a.DG_ENDORSEMENT_DATE,
    a.OWNERS_GROUPPROFILE,
    a.OWNERS_USER,
    a.PRESENTATION,
    a.PS,
    a.REGIONAL_OFFICE,
    a.SCHEMA_ROLE_NAME,
    a.SHARED_DATA,
    a.SMES_GROUPPROFILE,
    a.SMES_USER,
    a.TRANSFORMATION,
    a.UPDATE_FREQUENCY
FROM GENESIS_TEST_JL.DENG_DEMO.HARVEST_RESULTS h
JOIN GENESIS_TEST_JL.EVE_WORKSPACE.ALATION_EXTRACT a
    ON a.KEY = ('"' || LOWER(h.DATABASE_NAME) || '.' || LOWER(h.SCHEMA_NAME) || '".' || LOWER(h.TABLE_NAME))
WHERE h.DATABASE_NAME = 'BRONZE'
  AND a.AL_DATADICT_ITEM_PROPERTIES LIKE '%otype=table%';

Update the HARVEST_RESULTS with Alation Descriptions

Instruct Eve to update the Harvester’s HARVEST_RESULTS table with the newly loaded Alation descriptions. For example:

UPDATE GENESIS_BOTA.APP1.HARVEST_RESULTS AS h
SET h.CATALOG_SUPPLEMENT = v.DESCRIPTION,
    h.CATALOG_SUPPLEMENT_LOADED = 'FALSE',
    h.INITIAL_CRAWL_COMPLETE = False
FROM GENESIS_TEST_JL.EVE_WORKSPACE.MAPPING_DATABASE_SCHEMA_VIEW AS v
WHERE h.QUALIFIED_TABLE_NAME = v.FULLY_QUALIFIED_NAME;

This step populates the CATALOG_SUPPLEMENT field with Alation’s DESCRIPTION, marking them to be reprocessed by the Harvester.

Wait for Harvester Reprocessing

Give the Harvester approximately 5 minutes to detect the changes and begin reprocessing the data. Once the new supplement data is integrated, the search_metadata and explore_data functions will include these enriched fields when querying tables.

NOTE: Why Wait 5 Minutes?

The Harvester regularly polls its data for updates. By waiting a few minutes, you give it time to detect and process the newly loaded Alation metadata, ensuring the updated catalog is available for further queries and exploration.

Outcome

By following these steps, you’ll have successfully:

Ingested Alation CSV data into Snowflake.
Created a dedicated table (ALATION_EXTRACT) to store Alation metadata.
Joined Alation’s table properties to the Harvester results.
Updated the Harvester’s catalog with supplemental descriptions.
Enabled the Harvester to refetch these new fields for better search and exploration.

Genesis Tip

This workflow showcases how external catalog data (in this case, from Alation) can be integrated into the Harvester. By leveraging Eve’s ability to create and manage tables, stages, and views, you can unify multiple sources of metadata, leading to richer search and exploration for your data environment.

On this page

Introduction
Key Objectives
Step-by-Step Walkthrough
Outcome

Introduction

Key Objectives

Upload Alation CSV
Upload the Alation CSV (which has unique header formatting) via Streamlit and save it into a Snowflake stage.
Create the ALATION_EXTRACT Table
Make a new table in Eve’s workspace schema to store the extracted Alation metadata, preserving column definitions tailored for wide text fields.
Load the CSV Data
Instruct Eve to load the staged CSV file into the new ALATION_EXTRACT table.
Verify Alignment
Confirm that each column from the CSV aligns properly with the new table’s columns, troubleshooting any issues if necessary.
Create a MAPPING_DATABASE_SCHEMA_VIEW
Join the newly loaded Alation data with existing Harvester results to enrich your catalog.
Update the Harvester
Update rows in the HARVEST_RESULTS table with the DESCRIPTION from Alation data and set catalog_supplement_loaded = 'FALSE' to trigger the re-harvest process.
Wait for Harvester Reprocessing
Allow time for the Harvester to reprocess and integrate the new supplement data, after which future queries will include this enriched information.

Step-by-Step Walkthrough

Upload the Alation CSV to Streamlit

Save the CSV in a Snowflake Stage

Once uploaded, ask Eve to save this CSV file into a Snowflake stage. This temporary stage location allows Snowflake to later reference the file for loading into a table.

Create the ALATION_EXTRACT Table in Eve's Workspace

Ask Eve to run the following SQL in her workspace schema:

CREATE OR REPLACE TABLE ALATION_EXTRACT (
    AL_DATADICT_ITEM_PROPERTIES VARCHAR(16777216),
    AL_DATADICT_ITEM_COLUMN_DATA_TYPE VARCHAR(16777216),
    KEY VARCHAR(16777216),
    TITLE VARCHAR(16777216),
    DESCRIPTION VARCHAR(16777216),
    CDE VARCHAR(16777216),
    CLASSIFICATION VARCHAR(16777216),
    DG_ENDORSEMENT_DATE VARCHAR(16777216),
    OWNERS_GROUPPROFILE VARCHAR(16777216),
    OWNERS_USER VARCHAR(16777216),
    PRESENTATION VARCHAR(16777216),
    PS VARCHAR(16777216),
    REGIONAL_OFFICE VARCHAR(16777216),
    SCHEMA_ROLE_NAME VARCHAR(16777216),
    SHARED_DATA VARCHAR(16777216),
    SMES_GROUPPROFILE VARCHAR(16777216),
    SMES_USER VARCHAR(16777216),
    TRANSFORMATION VARCHAR(16777216),
    UPDATE_FREQUENCY VARCHAR(16777216)
);

Load the CSV Into ALATION_EXTRACT

Validate the Data

After loading, ask Eve to query a few rows, for example:

SELECT * 
FROM ALATION_EXTRACT 
LIMIT 5;

Confirm that each column from the CSV is in the correct corresponding column in ALATION_EXTRACT. If there are discrepancies, ask Eve to correct them and reload.

Create the Mapping Database Schema View

Next, create a view to merge the newly loaded Alation metadata with existing Harvester results. Ask Eve to run the following:

CREATE OR REPLACE VIEW GENESIS_TEST_JL.EVE_WORKSPACE.MAPPING_DATABASE_SCHEMA_VIEW AS
SELECT 
    h.SOURCE_NAME,
    h.DATABASE_NAME,
    h.SCHEMA_NAME,
    h.TABLE_NAME,
    ('"' || h.DATABASE_NAME || '"."' || h.SCHEMA_NAME || '"."' || h.TABLE_NAME || '"') AS FULLY_QUALIFIED_NAME,
    SPLIT_PART(a.AL_DATADICT_ITEM_PROPERTIES, ';', 1) AS OID,
    SPLIT_PART(a.AL_DATADICT_ITEM_PROPERTIES, ';', 2) AS OTYPE,
    a.AL_DATADICT_ITEM_PROPERTIES,
    a.AL_DATADICT_ITEM_COLUMN_DATA_TYPE,
    a.KEY,
    a.TITLE,
    a.DESCRIPTION,
    a.CDE,
    a.CLASSIFICATION,
    a.DG_ENDORSEMENT_DATE,
    a.OWNERS_GROUPPROFILE,
    a.OWNERS_USER,
    a.PRESENTATION,
    a.PS,
    a.REGIONAL_OFFICE,
    a.SCHEMA_ROLE_NAME,
    a.SHARED_DATA,
    a.SMES_GROUPPROFILE,
    a.SMES_USER,
    a.TRANSFORMATION,
    a.UPDATE_FREQUENCY
FROM GENESIS_TEST_JL.DENG_DEMO.HARVEST_RESULTS h
JOIN GENESIS_TEST_JL.EVE_WORKSPACE.ALATION_EXTRACT a
    ON a.KEY = ('"' || LOWER(h.DATABASE_NAME) || '.' || LOWER(h.SCHEMA_NAME) || '".' || LOWER(h.TABLE_NAME))
WHERE h.DATABASE_NAME = 'BRONZE'
  AND a.AL_DATADICT_ITEM_PROPERTIES LIKE '%otype=table%';

Update the HARVEST_RESULTS with Alation Descriptions

Instruct Eve to update the Harvester’s HARVEST_RESULTS table with the newly loaded Alation descriptions. For example:

UPDATE GENESIS_BOTA.APP1.HARVEST_RESULTS AS h
SET h.CATALOG_SUPPLEMENT = v.DESCRIPTION,
    h.CATALOG_SUPPLEMENT_LOADED = 'FALSE',
    h.INITIAL_CRAWL_COMPLETE = False
FROM GENESIS_TEST_JL.EVE_WORKSPACE.MAPPING_DATABASE_SCHEMA_VIEW AS v
WHERE h.QUALIFIED_TABLE_NAME = v.FULLY_QUALIFIED_NAME;

This step populates the CATALOG_SUPPLEMENT field with Alation’s DESCRIPTION, marking them to be reprocessed by the Harvester.

Wait for Harvester Reprocessing

NOTE: Why Wait 5 Minutes?

Outcome

By following these steps, you’ll have successfully:

Ingested Alation CSV data into Snowflake.
Created a dedicated table (ALATION_EXTRACT) to store Alation metadata.
Joined Alation’s table properties to the Harvester results.
Updated the Harvester’s catalog with supplemental descriptions.
Enabled the Harvester to refetch these new fields for better search and exploration.

Genesis Tip

On this page

Introduction
Key Objectives
Step-by-Step Walkthrough
Outcome

Autonomous Data Catalog Enrichment

Introduction

Key Objectives

Step-by-Step Walkthrough

Outcome

Getting Started

Genbot's Toolkit

Setup

Slack and Teams

Data Connections

Deployment Options

Example Use Case Deep Dives

Autonomous Data Catalog Enrichment

Introduction

Key Objectives

Step-by-Step Walkthrough

Outcome

​Introduction

​Key Objectives

​Step-by-Step Walkthrough

​Outcome

Getting Started

Genbot's Toolkit

Setup

Slack and Teams

Data Connections

Deployment Options

Example Use Case Deep Dives

​Introduction

​Key Objectives

​Step-by-Step Walkthrough

​Outcome

Introduction

Key Objectives

Step-by-Step Walkthrough

Outcome

Introduction

Key Objectives

Step-by-Step Walkthrough

Outcome