Autonomous Data Catalog Enrichment
Learn how to load Alation’s data catalog supplement from CSV into Snowflake, create and manage supplemental metadata, and integrate it into the Harvester process.
Introduction
In this guide, you’ll see how to supplement the Harvester’s catalog with additional data from Alation. This process involves uploading a CSV file containing Alation metadata, creating a new table in Snowflake to store that metadata, and then updating the Harvester’s data with the newly loaded information.
IMPORTANT: The CSV from Alation has headers that include colons in their names, and its values are delimited by semicolons. This format must be handled carefully when uploading and loading into Snowflake to avoid alignment issues.
Key Objectives
- Upload Alation CSV
Upload the Alation CSV (which has unique header formatting) via Streamlit and save it into a Snowflake stage. - Create the ALATION_EXTRACT Table
Make a new table in Eve’s workspace schema to store the extracted Alation metadata, preserving column definitions tailored for wide text fields. - Load the CSV Data
Instruct Eve to load the staged CSV file into the new ALATION_EXTRACT table. - Verify Alignment
Confirm that each column from the CSV aligns properly with the new table’s columns, troubleshooting any issues if necessary. - Create a MAPPING_DATABASE_SCHEMA_VIEW
Join the newly loaded Alation data with existing Harvester results to enrich your catalog. - Update the Harvester
Update rows in the HARVEST_RESULTS table with the DESCRIPTION from Alation data and setcatalog_supplement_loaded = 'FALSE'
to trigger the re-harvest process. - Wait for Harvester Reprocessing
Allow time for the Harvester to reprocess and integrate the new supplement data, after which future queries will include this enriched information.
Step-by-Step Walkthrough
Upload the Alation CSV to Streamlit
Use your Streamlit interface to upload the Alation CSV file. Ensure you specify the special delimiters (columns separated by semicolons, headers with colons) so it is parsed correctly. Confirm that each column is recognized and that there are no formatting errors during upload.
Save the CSV in a Snowflake Stage
Once uploaded, ask Eve to save this CSV file into a Snowflake stage. This temporary stage location allows Snowflake to later reference the file for loading into a table.
Create the ALATION_EXTRACT Table in Eve's Workspace
Ask Eve to run the following SQL in her workspace schema:
Load the CSV Into ALATION_EXTRACT
Instruct Eve to load the data from the Snowflake stage into the ALATION_EXTRACT table. Make sure the file format matches the CSV’s semicolon delimiters and handles quoted headers with colons. If any alignment issues arise, ask Eve to adjust the file format or column mapping accordingly.
Validate the Data
After loading, ask Eve to query a few rows, for example:
Confirm that each column from the CSV is in the correct corresponding column in ALATION_EXTRACT. If there are discrepancies, ask Eve to correct them and reload.
Create the Mapping Database Schema View
Next, create a view to merge the newly loaded Alation metadata with existing Harvester results. Ask Eve to run the following:
Update the HARVEST_RESULTS with Alation Descriptions
Instruct Eve to update the Harvester’s HARVEST_RESULTS table with the newly loaded Alation descriptions. For example:
This step populates the CATALOG_SUPPLEMENT
field with Alation’s DESCRIPTION, marking them to be reprocessed by the Harvester.
Wait for Harvester Reprocessing
Give the Harvester approximately 5 minutes to detect the changes and begin reprocessing the data. Once the new supplement data is integrated, the search_metadata
and explore_data
functions will include these enriched fields when querying tables.
NOTE: Why Wait 5 Minutes?
The Harvester regularly polls its data for updates. By waiting a few minutes, you give it time to detect and process the newly loaded Alation metadata, ensuring the updated catalog is available for further queries and exploration.
Outcome
By following these steps, you’ll have successfully:
- Ingested Alation CSV data into Snowflake.
- Created a dedicated table (ALATION_EXTRACT) to store Alation metadata.
- Joined Alation’s table properties to the Harvester results.
- Updated the Harvester’s catalog with supplemental descriptions.
- Enabled the Harvester to refetch these new fields for better search and exploration.
Genesis Tip
This workflow showcases how external catalog data (in this case, from Alation) can be integrated into the Harvester. By leveraging Eve’s ability to create and manage tables, stages, and views, you can unify multiple sources of metadata, leading to richer search and exploration for your data environment.