Overview

Tool Name

snowflake_tools

Purpose

The snowflake_tools group provides functionality to manage Snowflake stages, files, and advanced Snowpark Python operations. It empowers users to efficiently interact with Snowflake’s resources for staging data, leveraging Snowpark for computation, and performing full-text searches.

Key Features & Functions

  1. _list_stage_contents

    • Lists files and objects stored in a specific Snowflake stage, with optional regex filters.
  2. _add_file_to_stage

    • Enables file uploads to a Snowflake stage for use in workflows or storage.
  3. _delete_file_from_stage

    • Deletes specific files from Snowflake stages, supporting cleanup and data management operations.
  4. _read_file_from_stage

    • Reads the contents of a file stored in a Snowflake stage.
  5. _cortex_search

    • Performs a full-text search across specified indexes within a Snowflake environment.
  6. _run_snowpark_python

    • Executes Python code leveraging Snowflake’s Snowpark environment for in-database computations and workflows.

Input Parameters for Each Function

1. _list_stage_contents

Parameters

NameDefinitionFormat
databaseThe database containing the stage (string, required).String
schemaThe schema in which the stage resides (string, required).String
stageThe name of the stage to list files from (string, required).String
pattern(Optional) Regex pattern to filter matching files.String

2. _add_file_to_stage

Parameters

NameDefinitionFormat
databaseThe database containing the stage (string, required).String
schemaThe schema in which the stage resides (string, required).String
stageThe stage to which the file will be added (string, required).String
file_nameThe name/path of the file to upload (string, required).String

3. _delete_file_from_stage

Parameters

NameDefinitionFormat
databaseThe database containing the stage (string, required).String
schemaThe schema where the stage resides (string, required).String
stageThe stage from which the file will be deleted (string, required).String
file_nameThe file to be removed (string, required).String

4. _read_file_from_stage

Parameters

NameDefinitionFormat
databaseThe database containing the stage (string, required).String
schemaThe schema where the stage resides (string, required).String
stageThe name of the stage containing the file (string, required).String
file_nameThe filename to retrieve and read the content of (string, required).String
return_contents(Optional) Boolean flag to determine if the file content is returned (default: true).Boolean
is_binary(Optional) Boolean flag indicating if the file should be read as binary (default: false).Boolean

Parameters

NameDefinitionFormat
queryA short string describing the search intent (required).String
service_nameThe name of the index service to search against (required).String
top_n(Optional) Number of top results to return (max: 25, default: 15).Integer

Genbot Tip Narrow your cortex search by providing specific keywords or targeted index services to reduce irrelevant results.

6. _run_snowpark_python

Parameters

NameDefinitionFormat
purpose(Optional) A detailed explanation in English about the reason for running the code.String
codePython code to be executed in the Snowflake Snowpark environment (required).String
packages(Optional) Non-default libraries to install for code execution (list of strings).List
note_id(Optional) Reference ID for pre-saved Python code notes.String

IMPORTANT: Ensure your Snowpark Python code adheres to Snowflake’s environment constraints and doesn’t rely on local system resources.

Use Cases

  1. File Staging

    • Use _add_file_to_stage to upload datasets into Snowflake stages for ingestion or analysis.

    • Example: A company uploads daily transactional records as CSV files for downstream ETL.

  2. Content Retrieval & Validation

    • Use _read_file_from_stage to retrieve or validate files before processing.

    • Example: Checking exported logs prior to appending them to a central reporting table.

  3. Search Indexing

    • Use _cortex_search for metadata lookup or connection insights across managed indexes.

    • Example: Discovering relevant data sets by searching “customer churn rate” in an index.

  4. Dynamic Computation

    • Use _run_snowpark_python for Python-based logic within Snowflake’s environment.

    • Example: Running a clustering script on large sets directly stored in Snowflake.

Workflow/How It Works

  1. Step 1: Configure Stages

    • Identify or create the stage and assign necessary permissions.

    • Upload files using _add_file_to_stage.

  2. Step 2: Explore Stage Contents

    • Use _list_stage_contents to view all files in the stage.

    • Apply pattern filters for advanced file matching.

  3. Step 3: Read File Contents

    • Retrieve data with _read_file_from_stage to validate or further process content.
  4. Step 4: Advanced Searches

    • Utilize _cortex_search to discover relevant resources across indexes or metadata.
  5. Step 5: Execute Snowpark Python

    • Employ _run_snowpark_python for complex transformations or computations within Snowflake.
  6. Step 6: Cleanup

    • Remove files using _delete_file_from_stage to maintain tidy storage and reduce clutter.

IMPORTANT: Deleting files from a stage is irreversible—ensure backups exist before removal if data might be needed later.

Integration Relevance

  • ETL Pipelines: Combine with data_connector_tools to load staged files into target tables for transformations.

  • Semantic Models: Integrates with other Snowflake semantic tools to create reusable datasets from staged files.

  • Data Validation: Collaborates with testing or QA tools to ensure consistency and correctness of staged files.

Configuration Details

  • Correct setup of stage names, schemas, and database references is crucial.

  • When using _run_snowpark_python, ensure Python code is compatible with Snowflake’s environment and resource limits.

  • Consider role-based access to manage who can upload, read, or delete from Snowflake stages.

Limitations or Notes

  1. Large File Operations

    • Handling many or very large files can slow stage interactions—use patterns or chunking strategies.
  2. Timeouts

    • Long-running computations using _run_snowpark_python may require extended timeouts.
  3. Binary File Handling

    • Files read as binary (is_binary=true) might need decoding or additional processing steps.

Output

  • Stage Operations

    • Confirmations or error messages after listing, adding, deleting, or reading files from stages.
  • Cortex & Snowpark Execution

    • Search results from _cortex_search, or logs/output from _run_snowpark_python computations.
  • File Contents

    • If return_contents=true is set, _read_file_from_stage can provide the actual file data for further processing.