The Field Mapping Agent automates and assists the field-mapping process between data sources and target models. It proposes candidate mappings, captures transformation logic, records lineage and rationale, and exports reviewable deliverables.
Human-in-the-loop: Assign a Mission Owner (human) for decisions and reviews. The data agent escalates low-confidence mappings, conflicts, and open questions to the owner and stakeholders.

Overview

Purpose

Automate discovery and proposal of source→target mappings, capture transformations and lineage, and produce reviewable outputs.

Scope

Column alignment (1:1, 1:n, n:1); transformation rules; data types and constraints; glossary alignment; confidence scoring; lineage notes.

Design Goals

Deterministic, reproducible, and reviewable outputs with safe, read-first defaults on production data.

Typical use cases

  • New source onboarding to an existing canonical/analytics model
  • Legacy→modern migration mapping (warehouse/lakehouse)
  • Cross-system harmonization for MDM/golden-record initiatives
  • Rapid discovery and scoping (first-pass mapping proposals)
  • Lineage and documentation updates for existing pipelines

Inputs and prerequisites

  • Sources & targets: source schemas (connections and/or data dictionaries); target model specs or canonical model (dims/facts, naming conventions)
  • Context: business glossary, synonym lists, code sets (optional but helpful)
  • Access posture: read access to metadata and tables; optional workspace for intermediate outputs (e.g., GENESIS.EVE_WORKSPACE)

Core workflows

1

Discover metadata

Inspect source/target schemas (names, types, keys, constraints). Pull column stats/samples with safe limits.
2

Generate candidate mappings

Use similarity signals (names, types, semantics, glossary hits, value patterns). Produce pairs with confidence and rationale.
3

Apply rules & constraints

Enforce naming/typing standards; incorporate joins/lookups and code-set normalization; capture transformation logic with notes.
4

Validate & refine

Surface conflicts, low-confidence areas, and unmapped fields. Suggest alternatives and log open questions as mission tasks.
5

Export deliverables

Export mapping table/workbook with lineage and rationale; optional SQL/dbt snippets and YAML stubs; gap/assumption summary.

Default outputs

  • Mapping table/workbook (with lineage notes, confidence, and rationale)
  • Optional snippets (SQL/dbt) and YAML stubs (docs/tests)
  • Summary report: gaps, assumptions, next actions

Mapping table schema (columns)

source_system
source_table
source_column
source_datatype
target_model
target_table
target_column
target_datatype
transform_rule
join_or_lookup_rule
default_value_or_null_handling
business_term
code_set_or_domain
constraints
lineage_notes
confidence_score
rationale
status

Tools and permissions

  • Common: project_manager_tools, data_connector_tools, document_index_tools, file_manager_tools, artifact_manager_tools, google_drive_tools, git_action, slack_tools, delegate_work
  • Optional: snowflake_tools, web_access_tools
  • Posture: read-first on production; workspace writes only when requested; no external sharing without approval

Safety and operational notes

  • Use safe LIMITs and narrow scans; avoid cross joins without justification.
  • Mask PII by default; sample carefully and only when necessary.
  • Confirm before creating/writing persistent objects; prefer GENESIS.EVE_WORKSPACE.
  • Record assumptions, caveats, and decisions in the exported report.

Configuration

  • Discovery: include/exclude lists; sampling thresholds
  • Scoring: weights for name/type/glossary/value signals; auto-accept thresholds
  • Rules: naming/type coercion standards; synonyms/stopwords; code-set sources
  • Outputs: table (e.g., Snowflake), CSV/XLSX, Sheets, dbt seed/YAML stubs; stage/paths

Example workflow (Healthcare Data Pipeline)

This illustration uses hcls_demo_1_sources.main sources. Adapt to your domain as needed.

ERD Automate Discovery And Proposal of Source→Target Mappings, Capture Transformations And Lineage, And Produce Reviewable Outputs.