Snowflake Tools
Provides functionality for managing Snowflake stages, files, and advanced Snowpark Python operations.
Overview
Tool Name
Purpose
The snowflake_tools group provides functionality to manage Snowflake stages, files, and advanced Snowpark Python operations. It empowers users to efficiently interact with Snowflake’s resources for staging data, leveraging Snowpark for computation, and performing full-text searches.
Key Features & Functions
-
_list_stage_contents
- Lists files and objects stored in a specific Snowflake stage, with optional regex filters.
-
_add_file_to_stage
- Enables file uploads to a Snowflake stage for use in workflows or storage.
-
_delete_file_from_stage
- Deletes specific files from Snowflake stages, supporting cleanup and data management operations.
-
_read_file_from_stage
- Reads the contents of a file stored in a Snowflake stage.
-
_cortex_search
- Performs a full-text search across specified indexes within a Snowflake environment.
-
_run_snowpark_python
- Executes Python code leveraging Snowflake’s Snowpark environment for in-database computations and workflows.
Input Parameters for Each Function
1. _list_stage_contents
Parameters
Name | Definition | Format |
---|---|---|
database | The database containing the stage (string, required). | String |
schema | The schema in which the stage resides (string, required). | String |
stage | The name of the stage to list files from (string, required). | String |
pattern | (Optional) Regex pattern to filter matching files. | String |
2. _add_file_to_stage
Parameters
Name | Definition | Format |
---|---|---|
database | The database containing the stage (string, required). | String |
schema | The schema in which the stage resides (string, required). | String |
stage | The stage to which the file will be added (string, required). | String |
file_name | The name/path of the file to upload (string, required). | String |
3. _delete_file_from_stage
Parameters
Name | Definition | Format |
---|---|---|
database | The database containing the stage (string, required). | String |
schema | The schema where the stage resides (string, required). | String |
stage | The stage from which the file will be deleted (string, required). | String |
file_name | The file to be removed (string, required). | String |
4. _read_file_from_stage
Parameters
Name | Definition | Format |
---|---|---|
database | The database containing the stage (string, required). | String |
schema | The schema where the stage resides (string, required). | String |
stage | The name of the stage containing the file (string, required). | String |
file_name | The filename to retrieve and read the content of (string, required). | String |
return_contents | (Optional) Boolean flag to determine if the file content is returned (default: true ). | Boolean |
is_binary | (Optional) Boolean flag indicating if the file should be read as binary (default: false ). | Boolean |
5. _cortex_search
Parameters
Name | Definition | Format |
---|---|---|
query | A short string describing the search intent (required). | String |
service_name | The name of the index service to search against (required). | String |
top_n | (Optional) Number of top results to return (max: 25, default: 15). | Integer |
Genbot Tip Narrow your cortex search by providing specific keywords or targeted index services to reduce irrelevant results.
6. _run_snowpark_python
Parameters
Name | Definition | Format |
---|---|---|
purpose | (Optional) A detailed explanation in English about the reason for running the code. | String |
code | Python code to be executed in the Snowflake Snowpark environment (required). | String |
packages | (Optional) Non-default libraries to install for code execution (list of strings). | List |
note_id | (Optional) Reference ID for pre-saved Python code notes. | String |
IMPORTANT: Ensure your Snowpark Python code adheres to Snowflake’s environment constraints and doesn’t rely on local system resources.
Use Cases
-
File Staging
-
Use
_add_file_to_stage
to upload datasets into Snowflake stages for ingestion or analysis. -
Example: A company uploads daily transactional records as CSV files for downstream ETL.
-
-
Content Retrieval & Validation
-
Use
_read_file_from_stage
to retrieve or validate files before processing. -
Example: Checking exported logs prior to appending them to a central reporting table.
-
-
Search Indexing
-
Use
_cortex_search
for metadata lookup or connection insights across managed indexes. -
Example: Discovering relevant data sets by searching “customer churn rate” in an index.
-
-
Dynamic Computation
-
Use
_run_snowpark_python
for Python-based logic within Snowflake’s environment. -
Example: Running a clustering script on large sets directly stored in Snowflake.
-
Workflow/How It Works
-
Step 1: Configure Stages
-
Identify or create the stage and assign necessary permissions.
-
Upload files using
_add_file_to_stage
.
-
-
Step 2: Explore Stage Contents
-
Use
_list_stage_contents
to view all files in the stage. -
Apply
pattern
filters for advanced file matching.
-
-
Step 3: Read File Contents
- Retrieve data with
_read_file_from_stage
to validate or further process content.
- Retrieve data with
-
Step 4: Advanced Searches
- Utilize
_cortex_search
to discover relevant resources across indexes or metadata.
- Utilize
-
Step 5: Execute Snowpark Python
- Employ
_run_snowpark_python
for complex transformations or computations within Snowflake.
- Employ
-
Step 6: Cleanup
- Remove files using
_delete_file_from_stage
to maintain tidy storage and reduce clutter.
- Remove files using
IMPORTANT: Deleting files from a stage is irreversible—ensure backups exist before removal if data might be needed later.
Integration Relevance
-
ETL Pipelines: Combine with data_connector_tools to load staged files into target tables for transformations.
-
Semantic Models: Integrates with other Snowflake semantic tools to create reusable datasets from staged files.
-
Data Validation: Collaborates with testing or QA tools to ensure consistency and correctness of staged files.
Configuration Details
-
Correct setup of stage names, schemas, and database references is crucial.
-
When using
_run_snowpark_python
, ensure Python code is compatible with Snowflake’s environment and resource limits. -
Consider role-based access to manage who can upload, read, or delete from Snowflake stages.
Limitations or Notes
-
Large File Operations
- Handling many or very large files can slow stage interactions—use patterns or chunking strategies.
-
Timeouts
- Long-running computations using
_run_snowpark_python
may require extended timeouts.
- Long-running computations using
-
Binary File Handling
- Files read as binary (
is_binary=true
) might need decoding or additional processing steps.
- Files read as binary (
Output
-
Stage Operations
- Confirmations or error messages after listing, adding, deleting, or reading files from stages.
-
Cortex & Snowpark Execution
- Search results from
_cortex_search
, or logs/output from_run_snowpark_python
computations.
- Search results from
-
File Contents
- If
return_contents=true
is set,_read_file_from_stage
can provide the actual file data for further processing.
- If