Harvest Control Tools
Manage the harvesting, tracking, and organization of metadata across connected databases for streamlined and up-to-date insights.
Overview
Tool Name
Purpose
The harvest_control_tools are designed to manage the harvesting, tracking, and organization of metadata across connected databases. This tool helps streamline and automate metadata collection processes, ensuring up-to-date insights into database structures, schemas, and available data assets.
Functions Available
-
_get_harvest_control_data
: Retrieves all active harvest control configurations, including databases in scope, schemas included or excluded, refresh intervals, and crawl status. -
_set_harvest_control_data
: Adds or updates harvest control configurations for a specific database or schema. -
_remove_harvest_control_data
: Deletes a harvest control setup, stopping future metadata crawls for the specified database. -
_remove_metadata_for_database
: Purges meta-harvest crawl results for a specific database, cleaning up the recorded metadata. -
_get_harvest_summary
: Provides a summary of ongoing, completed, or failed metadata harvests, including statistics (e.g., number of tables, columns processed).
Key Features
Automated Metadata Harvesting
Configure and manage automatic metadata harvesting for connected databases.
Schema Inclusions & Exclusions
Include or exclude specific schemas to focus harvesting on the most relevant data.
Manual or Scheduled Crawls
Perform harvest crawls manually or set automatic refresh intervals.
Progress & Data Availability
Track harvesting progress and control data availability across systems.
Metadata Cleanup
Remove outdated metadata or harvest configurations when no longer required.
Input Parameters
_get_harvest_control_data | Retrieve All Active Harvest Control Configurations | |
---|---|---|
Input Parameters | Definition | Format |
(None Required) | Returns all active harvest control configurations, including database names and schema rules. | (None) |
_set_harvest_control_data | Add or Update Harvest Control Configurations | |
---|---|---|
Input Parameters | Definition | Format |
connection_id (Optional) | Specifies the database connection ID (e.g., “Snowflake”). | String |
database_name | Name of the database for harvesting (e.g., “CUSTOMER_DATA”). | String |
refresh_interval | Suggested refresh interval in minutes (e.g., 1440 for daily). | Integer |
initial_crawl_complete | (Optional): Indicates if the initial crawl is done (default: False ). | Boolean |
schema_exclusions | (Optional): List of schemas to exclude from harvesting. | List |
schema_inclusions | (Optional): List of schemas to explicitly include. | List |
status | (Optional): Harvesting status; default is “Include”. | String |
_remove_harvest_control_data | Delete a Specific Harvest Control Setup | |
---|---|---|
Input Parameters | Definition | Format |
source_name | The source system for which harvesting is configured (e.g., “Snowflake”). | String |
database_name | The database whose control data should be removed. | String |
_remove_metadata_for_database | Purge Harvested Metadata for a Database | |
---|---|---|
Input Parameters | Definition | Format |
source_name | Name of the source database to clean (e.g., “Snowflake”). | String |
database_name | Name of the database for which metadata will be purged. | String |
_get_harvest_summary | Get a Summary of Harvest Progress & Statistics | |
---|---|---|
Input Parameters | Definition | Format |
(None Required) | Returns statistics for ongoing or completed metadata harvests (e.g., table counts). | (None) |
Output
-
Harvest Data Retrieval
_get_harvest_control_data
returns JSON detailing active harvest configurations, including the database names, refresh intervals, and schema inclusions/exclusions.
-
Harvest Configuration/Update
_set_harvest_control_data
confirms setup or modification of harvesting rules, returning success messages.
-
Metadata Cleanup
_remove_harvest_control_data
and_remove_metadata_for_database
respond with success confirmations indicating successful removal of control data or existing metadata.
-
Harvest Summary
_get_harvest_summary
outputs progress and statistics, including the number of schemas crawled, tables indexed, or failures encountered.
Genbot Tip
-
Use
schema_exclusions
to omit system-level schemas (likeINFORMATION_SCHEMA
) from your harvest for cleaner results. -
Align
refresh_interval
with your environment’s change frequency—shorter intervals for more rapidly changing data.
How It Works
Users define a database connection with desired harvesting configurations, specifying schemas to include or exclude. Once set, the system performs scheduled metadata crawls or initiates immediate crawls, updating the metadata store. Users can retrieve harvesting progress and summary reports, track connected database configurations, or clean outdated metadata.
IMPORTANT NOTE
-
By default, all schemas are included unless explicitly excluded—this may gather irrelevant data.
-
Large databases may extend crawling times or require optimized scheduling to avoid performance bottlenecks.
-
Combine harvest crawls with metadata audits to maintain compliance with data governance standards.