Autonomous Data Pipeline Building
This section provides a comprehensive walkthrough of how Genbots can be utilized to automate and streamline workflows for data mapping and validation.
Introduction
What you’ll be experiencing today are 4 Genbots working together with 1 human in the loop, The genbots go through a process of researching, mapping, and assessing the NUMBER_OF_ORDERS column in the DEMO_CUSTOMER_TGT table. By following this guide, users will gain a clear understanding of how to leverage Genbots for similar tasks in their projects.
IMPORTANT: This example is prepared to work with the specific bot names, database names, and table names.
With a click of a button you can copy them and paste them to get started.
Please don’t change any of the names as the bot instructions and project instructions are tailored to fit the names mentioned within this walkthrough.
Key Objectives of the Workflow
Step-by-Step Walkthrough
Identify Reliable Data Sources:
To locate and analyze potential tables in the DEMO_RAW database (JAFFLE_SHOP and STRIPE schemas) that could serve as sources for calculating or validating the NUMBER_OF_ORDERS column.
Propose a Mapping:
To define a clear and accurate mapping strategy for deriving the NUMBER_OF_ORDERS column, ensuring it is based on raw, reliable data.
Validate the Mapping:
To assess the proposed mapping for technical soundness, alignment with business logic, and adherence to data standards.
Evaluate Confidence:
To provide a confidence score for the mapping, ensuring stakeholders can trust the proposed solution.
Document the Process:
To create a transparent and structured record of the research, mapping, and validation steps, enabling reproducibility and clarity for future reference.
Try It Out!
To follow along with this example make sure to do the following:
-
Create a database named DEMO_RAW and include within the new database the dataset that we will be using which can be found here.
-
Create a database named DEMO_ANALYTICS and include within the new database a table named DEMO_CUSTOMER_TGT this will be the table that our first bot will be sourcing column and table definition from. Ensure that the table contains the columns
-
The custom instructions for all 4 Genbots can be found below and copied with a click of a button.
-
The individual tools for each bot can also be found below and copied with a click of a button.
Starting Point
Ending Result
Genbots Involved
Requirements Project Manager
Investigator
Data Mapper
Confidence Assessor
Creating The Bot Team
Start a conversation with Eve
To start, first, you want to ask Eve to create the 4 Genbots that will be used in this example.
NOTE: This example has had all instructions tailored to the names of the respective bots, IF you decide to change the bot names during creation be sure to make the same adjustments in the instructions.
Genbot Tip
All 4 Genbot’s have their custom instructions below for you to copy and paste if you would like to try this specific scenario.
Genbot Custom Instructions To Follow Along
Click “Copy” for each of the tabs below one at a time during bot creation.
Genbot Tools For The Bot Team
Please let Eve know to add the following tools to each Genbot to follow along with the JaffleShop example.
Once your bot team has been created with the instructions and tools above.
It’s now time to create the Project.
Setting Up The Project And Creating The TODOs
First, let’s explain what a project is and what TODOs are.
Much like a real-world project manager, projects are a list of objectives to manage the workflow systematically. Below we will be creating a project named JAFFLE-Proj-20B.
NOTE: You can name the project whatever you want but in this example please don’t as we have the name selected within the instructions below.
Next, to explain what TODOs are.
The list of objective actions that are within a Project is known as TODO.
Reference Example:
A TODO item was added to the project with the following task:
- Research, validate, and document potential sources for the NUMBER_OF_ORDERS column in the DEMO_CUSTOMER_TGT table.
Reasoning: This ensures that the task is tracked, dependencies are managed, and progress is documented.
Note
If you downloaded the dataset above for Jaffle Shop.
- Please copy the prompt below under “User Prompt” if you would like to follow along.
User Prompt
Bot Team in Action
Once the project is created, the team will begin working.
NOTE: The breakdown and explanation of what is happening throughout this example can be seen below.
Delegating Research to the Investigator Bot
Objective: To identify potential data sources for the NUMBER_OF_ORDERS column.
Action: The Investigator bot was tasked with researching the DEMO_RAW database, focusing on the JAFFLE_SHOP and STRIPE schemas.
Process:
-
The bot analyzed metadata to locate relevant tables and columns.
-
It identified two potential sources:
Primary Source: DEMO_RAW.JAFFLE_SHOP.ORDERS
-
Contains raw order data, including USER_ID (customer identifier) and ORDER_DATE.
-
Suitable for fresh calculations by aggregating the USER_ID column.
Alternative Source: DEMO_RAW.STRIPE.DEMO_CUSTOMER_TGT
- Already contains the NUMBER_OF_ORDERS column, which might be pre-aggregated.
The bot documented its findings in a structured format.
Output: The research findings were saved in the Git repository at:
demoresearch/DEMO_CUSTOMER_TGT_NUMBER_OF_ORDERS_demoresearch.txt
Reasoning: Automating the research step ensures thorough and consistent analysis, saving time and reducing errors.
Validating Research Results
Objective: To ensure the research findings are accurate and complete.
Action: The research file was reviewed to confirm:
-
All relevant tables and columns were identified.
-
The findings aligned with the task objectives.
Output: The file was validated and deemed ready for the next step.
Reasoning: Validation ensures that the workflow proceeds with reliable data.
Delegating Mapping to the Data Mapper Bot
Objective: To propose a mapping for the NUMBER_OF_ORDERS column based on the research findings.
Action: The Data Mapper bot was tasked with:
-
Analyzing the research findings.
-
Proposing a mapping strategy.
-
Documenting the mapping proposal with justification.
Process:
-
The bot selected the ORDERS table as the primary source for fresh calculations.
-
It proposed the following mapping logic:
The bot justified its choice:
The ORDERS table provides raw data, ensuring accurate and fresh calculations.
This approach aligns with the principle of deriving values from raw data rather than relying on pre-aggregated values.
Output: The mapping proposal was saved in the Git repository at: demoresearch/DEMO_CUSTOMER_TGT_NUMBER_OF_ORDERS_demoproposal.txt
Reasoning: Automating the mapping step ensures consistency and adherence to best practices.
Validating the Mapping Proposal
Objective: To ensure the mapping proposal aligns with the task objectives and is technically sound.
Action: The mapping file was reviewed to confirm:
-
The proposed logic was accurate and feasible.
-
The justification was clear and aligned with business requirements.
Output: The file was validated and deemed ready for the next step.
Reasoning: Validation ensures that the mapping logic is robust and reliable.
Delegating Confidence Assessment to the Confidence Assessor Bot
Objective: To evaluate the mapping proposal and provide a confidence score.
Action: The Confidence Assessor bot was tasked with:
-
Reviewing the mapping proposal.
-
Scoring the mapping based on five criteria: Business Intuition: Does the logic align with business expectations? Single Source Clarity: Is the source table well-defined and appropriate? Mapping Simplicity: Is the logic straightforward and easy to implement? Historical Similarity: Has similar logic been used successfully before? Documentation Availability: Is the mapping well-documented?
-
Documenting the confidence score and justification
Process:
-
The bot assigned a confidence score of 90/100.
-
It provided detailed justification for the score.
Output: The confidence assessment was saved in the Git repository at: demoresearch/DEMO_CUSTOMER_TGT_NUMBER_OF_ORDERS_demoQAscore.txt
Reasoning: Automating the assessment step ensures objective and consistent evaluation.
Updating the TODO Status
Objective: To document the work performed and mark the task as completed.
Action:
-
The TODO item was updated with a summary of the work performed.
-
The status was changed to COMPLETED.
Reasoning: Updating the TODO ensures that the workflow is tracked and documented for future reference.
Key Output: Below are the key outputs from each bot.
Genbot | Objective | File Path |
---|---|---|
Investigator | Research Findings | demoresearch/DEMO_CUSTOMER_TGT_NUMBER_OF_ORDERS_demoresearch.txt |
Data Mapper | Mapping Proposal | demoresearch/DEMO_CUSTOMER_TGT_NUMBER_OF_ORDERS_demoproposal.txt |
Confidence Assessor | Confidence Assessment | demoresearch/DEMO_CUSTOMER_TGT_NUMBER_OF_ORDERS_demoQAscore.txt |
Tools Used and Their Purposes
How These Tools Worked Together
-
The project and TODO management tools ensured the workflow was structured and tracked.
-
The delegation tool enabled specialized Genbots to handle specific tasks efficiently.
-
The Git management tool provided a centralized location for saving and validating outputs.
-
The work recording tool documented progress, ensuring transparency and accountability.
Genbot Tip
This example demonstrates how Genbots can be utilized to automate and streamline workflows for data mapping and validation. By leveraging bots like Investigator, Data Mapper, and Confidence Assessor, users can create a digital assembly line of Genbots all working towards the same objective.