Field Mapping Agent

The Field Mapping Agent automates and assists the field-mapping process between data sources and target models. It proposes candidate mappings, captures transformation logic, records lineage and rationale, and exports reviewable deliverables.

Human-in-the-loop: Assign a Mission Owner (human) for decisions and reviews. The data agent escalates low-confidence mappings, conflicts, and open questions to the owner and stakeholders.

Overview

Purpose

Automate discovery and proposal of source→target mappings, capture transformations and lineage, and produce reviewable outputs.

Scope

Column alignment (1:1, 1:n, n:1); transformation rules; data types and constraints; glossary alignment; confidence scoring; lineage notes.

Design Goals

Deterministic, reproducible, and reviewable outputs with safe, read-first defaults on production data.

Typical use cases

New source onboarding to an existing canonical/analytics model
Legacy→modern migration mapping (warehouse/lakehouse)
Cross-system harmonization for MDM/golden-record initiatives
Rapid discovery and scoping (first-pass mapping proposals)
Lineage and documentation updates for existing pipelines

Inputs and prerequisites

Sources & targets: source schemas (connections and/or data dictionaries); target model specs or canonical model (dims/facts, naming conventions)
Context: business glossary, synonym lists, code sets (optional but helpful)
Access posture: read access to metadata and tables; optional workspace for intermediate outputs (e.g., GENESIS.EVE_WORKSPACE)

Core workflows

Discover metadata

Inspect source/target schemas (names, types, keys, constraints). Pull column stats/samples with safe limits.

Generate candidate mappings

Use similarity signals (names, types, semantics, glossary hits, value patterns). Produce pairs with confidence and rationale.

Apply rules & constraints

Enforce naming/typing standards; incorporate joins/lookups and code-set normalization; capture transformation logic with notes.

Validate & refine

Surface conflicts, low-confidence areas, and unmapped fields. Suggest alternatives and log open questions as mission tasks.

Export deliverables

Export mapping table/workbook with lineage and rationale; optional SQL/dbt snippets and YAML stubs; gap/assumption summary.

Default outputs

Mapping table/workbook (with lineage notes, confidence, and rationale)
Optional snippets (SQL/dbt) and YAML stubs (docs/tests)
Summary report: gaps, assumptions, next actions

Mapping table schema (columns)

source_system
source_table
source_column
source_datatype
target_model
target_table
target_column
target_datatype
transform_rule
join_or_lookup_rule
default_value_or_null_handling
business_term
code_set_or_domain
constraints
lineage_notes
confidence_score
rationale
status

Tools and permissions

Common: project_manager_tools, data_connector_tools, document_index_tools, file_manager_tools, artifact_manager_tools, google_drive_tools, git_action, slack_tools, delegate_work
Optional: snowflake_tools, web_access_tools
Posture: read-first on production; workspace writes only when requested; no external sharing without approval

Safety and operational notes

Use safe LIMITs and narrow scans; avoid cross joins without justification.
Mask PII by default; sample carefully and only when necessary.
Confirm before creating/writing persistent objects; prefer GENESIS.EVE_WORKSPACE.
Record assumptions, caveats, and decisions in the exported report.

Configuration

Discovery: include/exclude lists; sampling thresholds
Scoring: weights for name/type/glossary/value signals; auto-accept thresholds
Rules: naming/type coercion standards; synonyms/stopwords; code-set sources
Outputs: table (e.g., Snowflake), CSV/XLSX, Sheets, dbt seed/YAML stubs; stage/paths

Example workflow (Healthcare Data Pipeline)

This illustration uses hcls_demo_1_sources.main sources. Adapt to your domain as needed.

Context: sources and targets

Sources (hcls_demo_1_sources.main)

CLAIMS(CLAIM_ID, PATIENT_ID, PROVIDER_ID, CLAIM_DATE, ADMISSION_DATE, DISCHARGE_DATE, CLAIM_TYPE, TOTAL_CHARGE, INSURANCE_PAID, PATIENT_RESPONSIBILITY, CLAIM_STATUS, …)
CLAIM_DETAILS(CLAIM_DETAIL_ID, CLAIM_ID, SERVICE_DATE, PROCEDURE_CODE, DIAGNOSIS_CODE, CHARGE_AMOUNT, UNITS, …)
PATIENTS(PATIENT_ID, FIRST_NAME, LAST_NAME, GENDER, DATE_OF_BIRTH, …)
PROVIDERS(PROVIDER_ID, PROVIDER_NAME, PROVIDER_TYPE, NPI_NUMBER, SPECIALTY, …)

**Targets (from mission specs)**  
- `CLAIM_SUMMARY`, `PATIENT_SUMMARY`, `PROVIDER_SUMMARY`, `FCT_READMISSIONS`

Candidate mappings (preview)

source_table	source_column	target_table	target_column	transform_rule	join_or_lookup_rule	confidence	status
CLAIMS	CLAIM_ID	CLAIM_SUMMARY	CLAIM_ID	CLAIM_ID	—	0.99	proposed
CLAIMS	CLAIM_DATE	CLAIM_SUMMARY	CLAIM_DATE	date(CLAIM_DATE)	—	0.95	proposed
CLAIMS	CLAIM_TYPE	CLAIM_SUMMARY	CLAIM_TYPE	upper(CLAIM_TYPE)	—	0.92	proposed
CLAIMS	INSURANCE_PAID,+PATIENT_RESPONSIBILITY	CLAIM_SUMMARY	ALLOWED_AMOUNT	coalesce(INSURANCE_PAID,0)+coalesce(PATIENT_RESPONSIBILITY,0)	—	0.90	proposed
CLAIMS	PATIENT_ID	PATIENT_SUMMARY	PATIENT_ID	PATIENT_ID	JOIN CLAIMS.PATIENT_ID = PATIENTS.PATIENT_ID	0.99	proposed
PATIENTS	FIRST_NAME,+LAST_NAME	PATIENT_SUMMARY	PATIENT_FULL_NAME	FIRST_NAME		’ ‘		LAST_NAME	JOIN CLAIMS.PATIENT_ID = PATIENTS.PATIENT_ID	0.88	proposed
PATIENTS	DATE_OF_BIRTH	PATIENT_SUMMARY	PATIENT_AGE	cast((julianday(‘now’)-julianday(DATE_OF_BIRTH))/365.25 as integer)	—	0.86	proposed
PROVIDERS	PROVIDER_ID	PROVIDER_SUMMARY	PROVIDER_ID	PROVIDER_ID	JOIN CLAIMS.PROVIDER_ID = PROVIDERS.PROVIDER_ID	0.99	proposed
CLAIM_DETAILS	DIAGNOSIS_CODE	FCT_READMISSIONS	DIAGNOSIS_GRP	substr(DIAGNOSIS_CODE,1,3)	JOIN CLAIM_DETAILS.CLAIM_ID = CLAIMS.CLAIM_ID	0.84	proposed

Next steps from review

Getting Started

Creating A Mission

Genesis Data Agents

Genesis Data Agent's Toolkit

Setup

Slack and Teams

Data Connections

Deployment Options

Overview

Purpose

Scope

Design Goals

Typical use cases

Inputs and prerequisites

Core workflows

Default outputs

Mapping table schema (columns)

Tools and permissions

Safety and operational notes

Configuration

Example workflow (Healthcare Data Pipeline)

ERD Automate Discovery And Proposal of Source→Target Mappings, Capture Transformations And Lineage, And Produce Reviewable Outputs.

Getting Started

Creating A Mission

Genesis Data Agents

Genesis Data Agent's Toolkit

Setup

Slack and Teams

Data Connections

Deployment Options

​Overview

Purpose

Scope

Design Goals

​Typical use cases

​Inputs and prerequisites

​Core workflows

​Default outputs

​Mapping table schema (columns)

​Tools and permissions

​Safety and operational notes

​Configuration

​Example workflow (Healthcare Data Pipeline)

​ERD Automate Discovery And Proposal of Source→Target Mappings, Capture Transformations And Lineage, And Produce Reviewable Outputs.

Overview

Typical use cases

Inputs and prerequisites

Core workflows

Default outputs

Mapping table schema (columns)

Tools and permissions

Safety and operational notes

Configuration

Example workflow (Healthcare Data Pipeline)

ERD Automate Discovery And Proposal of Source→Target Mappings, Capture Transformations And Lineage, And Produce Reviewable Outputs.