Harvest Control Tools

Overview

Tool Name

harvest_control_tools

Purpose

The harvest_control_tools are designed to manage the harvesting, tracking, and organization of metadata across connected databases. This tool helps streamline and automate metadata collection processes, ensuring up-to-date insights into database structures, schemas, and available data assets.

Functions Available

_get_harvest_control_data: Retrieves all active harvest control configurations, including databases in scope, schemas included or excluded, refresh intervals, and crawl status.
_set_harvest_control_data: Adds or updates harvest control configurations for a specific database or schema.
_remove_harvest_control_data: Deletes a harvest control setup, stopping future metadata crawls for the specified database.
_remove_metadata_for_database: Purges meta-harvest crawl results for a specific database, cleaning up the recorded metadata.
_get_harvest_summary: Provides a summary of ongoing, completed, or failed metadata harvests, including statistics (e.g., number of tables, columns processed).

Key Features

Automated Metadata Harvesting

Configure and manage automatic metadata harvesting for connected databases.

Schema Inclusions & Exclusions

Include or exclude specific schemas to focus harvesting on the most relevant data.

Manual or Scheduled Crawls

Perform harvest crawls manually or set automatic refresh intervals.

Progress & Data Availability

Track harvesting progress and control data availability across systems.

Metadata Cleanup

Remove outdated metadata or harvest configurations when no longer required.

Input Parameters

`_get_harvest_control_data`	Retrieve All Active Harvest Control Configurations
Input Parameters	Definition	Format
(None Required)	Returns all active harvest control configurations, including database names and schema rules.	(None)

`_set_harvest_control_data`	Add or Update Harvest Control Configurations
Input Parameters	Definition	Format
connection_id (Optional)	Specifies the database connection ID (e.g., “Snowflake”).	String
database_name	Name of the database for harvesting (e.g., “CUSTOMER_DATA”).	String
refresh_interval	Suggested refresh interval in minutes (e.g., 1440 for daily).	Integer
initial_crawl_complete	(Optional): Indicates if the initial crawl is done (default: `False`).	Boolean
schema_exclusions	(Optional): List of schemas to exclude from harvesting.	List
schema_inclusions	(Optional): List of schemas to explicitly include.	List
status	(Optional): Harvesting status; default is “Include”.	String

`_remove_harvest_control_data`	Delete a Specific Harvest Control Setup
Input Parameters	Definition	Format
source_name	The source system for which harvesting is configured (e.g., “Snowflake”).	String
database_name	The database whose control data should be removed.	String

`_remove_metadata_for_database`	Purge Harvested Metadata for a Database
Input Parameters	Definition	Format
source_name	Name of the source database to clean (e.g., “Snowflake”).	String
database_name	Name of the database for which metadata will be purged.	String

`_get_harvest_summary`	Get a Summary of Harvest Progress & Statistics
Input Parameters	Definition	Format
(None Required)	Returns statistics for ongoing or completed metadata harvests (e.g., table counts).	(None)

Output

Harvest Data Retrieval
- _get_harvest_control_data returns JSON detailing active harvest configurations, including the database names, refresh intervals, and schema inclusions/exclusions.
Harvest Configuration/Update
- _set_harvest_control_data confirms setup or modification of harvesting rules, returning success messages.
Metadata Cleanup
- _remove_harvest_control_data and _remove_metadata_for_database respond with success confirmations indicating successful removal of control data or existing metadata.
Harvest Summary
- _get_harvest_summary outputs progress and statistics, including the number of schemas crawled, tables indexed, or failures encountered.

Genbot Tip

Use schema_exclusions to omit system-level schemas (like INFORMATION_SCHEMA) from your harvest for cleaner results.
Align refresh_interval with your environment’s change frequency—shorter intervals for more rapidly changing data.

How It Works

Users define a database connection with desired harvesting configurations, specifying schemas to include or exclude. Once set, the system performs scheduled metadata crawls or initiates immediate crawls, updating the metadata store. Users can retrieve harvesting progress and summary reports, track connected database configurations, or clean outdated metadata.

IMPORTANT NOTE

By default, all schemas are included unless explicitly excluded—this may gather irrelevant data.
Large databases may extend crawling times or require optimized scheduling to avoid performance bottlenecks.
Combine harvest crawls with metadata audits to maintain compliance with data governance standards.

Getting Started

Genbot's Toolkit

Setup

Slack and Teams

Data Connections

Deployment Options

Example Use Case Deep Dives

Overview

Tool Name

Purpose

Functions Available

Key Features

Input Parameters

Output

How It Works

Getting Started

Genbot's Toolkit

Setup

Slack and Teams

Data Connections

Deployment Options

Example Use Case Deep Dives

​Overview

​Tool Name

​Purpose

​Functions Available

​Key Features

​Input Parameters

​Output

​How It Works

Overview

Tool Name

Purpose

Functions Available

Key Features

Input Parameters

Output

How It Works