Skip to main content

Overview

Tool Name

github_connector_tools

Purpose

The github_connector_tools enable direct integration between Genesis Data Agents and GitHub repositories. Manage repositories, files, branches, pull requests, and workflows programmatically—all within your data agent conversations. Perfect for automated documentation, code generation, data pipeline versioning, and collaborative data projects.

Functions Available

  1. git_action: Core Git operations including repository management, branching, committing, and history tracking.
  2. file: File operations for reading, writing, searching, and managing repository content.
  3. git_action (GitHub-specific actions): GitHub platform features including pull requests, remote repository management, and credential storage.

Key Features

Repository Management

Clone, create, list, and delete repositories with full version control integration.

Branch Operations

Create, switch, merge branches and manage parallel development workflows.

Pull Request Automation

Create, review, merge, and close pull requests programmatically for collaborative workflows.

File Management

Read, write, search, copy, move, and delete files with pattern matching and bulk operations.

Commit History

Access commit logs, fetch history, and track changes across repository timelines.

Secure Authentication

Store GitHub credentials securely with support for Personal Access Tokens (PAT).

Input Parameters for Each Function

git_action

Parameters
NameDefinitionFormat
actionGit operation to perform. Values: commit, get_history, create_branch, switch_branch, get_status, clone_repo, list_repos, pull, push, create_pull_request, etc.String (required)
repo_idRepository identifier. Required for most operations except list_repos and credential storage.String
commit_messageMessage for commit action.String
branch_nameName of branch for create/switch/pull/push operations.String
urlGit repository URL for cloning or remote operations.String
max_countMaximum number of commits to return in history (for get_history).Integer

file

Parameters
NameDefinitionFormat
actionFile operation. Values: read, write, delete, list, find, search, copy, move, commit, etc.String (required)
repo_idRepository identifier where files are located.String (required)
file_pathRelative path to file in repository (no leading /).String
contentContent to write to file (for write action).String
patternGlob pattern for find/search operations (e.g., *.py, **/*.md).String
source_patternsList of glob patterns for bulk copy/move/delete operations.Array of Strings
target_pathDestination directory for copy/move operations.String
messageCommit message (for commit action).String
Use repo_id consistently across git_action and file operations to work within the same repository context. List available repos with git_action(action="list_repos").

Use Cases

  1. Automated Documentation Generate and commit data dictionaries, analysis reports, or pipeline documentation directly to GitHub repositories.
  2. Code Generation & Versioning Create dbt models, SQL scripts, or Python notebooks and version them in GitHub for team collaboration.
  3. Data Pipeline Deployment Clone configuration repositories, update parameters, commit changes, and create pull requests for review.
  4. Collaborative Data Projects Share analysis notebooks, datasets, and findings with team members through structured GitHub workflows.
  5. Backup & Disaster Recovery Automatically commit critical configurations, metadata, and artifacts to GitHub for version-controlled backups.

Workflow/How It Works

  1. Step 1: Authenticate with GitHub Store GitHub credentials securely using Personal Access Token:
    git_action(
        action="store_github_credentials",
        username="your_github_username",
        token="ghp_your_personal_access_token"
    )
    
  2. Step 2: Create or Clone Repository Start with a new repository or clone an existing one:
    # Create new repository
    git_action(
        action="create_repo",
        repo_id="my_data_project"
    )
    
    # Or clone from GitHub
    git_action(
        action="clone_repo",
        repo_id="analytics_pipeline",
        url="https://github.com/organization/analytics-pipeline.git"
    )
    
  3. Step 3: Manage Files Create, read, and organize files within the repository:
    # Write documentation
    file(
        action="write",
        repo_id="my_data_project",
        file_path="docs/data_dictionary.md",
        content="# Data Dictionary\n\n## Customer Table\n..."
    )
    
    # Search for files
    file(
        action="find",
        repo_id="my_data_project",
        pattern="*.sql"
    )
    
  4. Step 4: Commit Changes Track changes with descriptive commit messages:
    git_action(
        action="commit",
        repo_id="my_data_project",
        commit_message="Add customer data dictionary and initial ETL scripts"
    )
    
  5. Step 5: Create Pull Request Collaborate with team through pull requests:
    git_action(
        action="create_pull_request",
        repo_id="my_data_project",
        branch_name="feature/new-pipeline",
        title="Add new customer segmentation pipeline",
        body="This PR introduces customer segmentation logic with dbt models."
    )
    
  6. Step 6: Push to GitHub Sync local changes to remote repository:
    git_action(
        action="push",
        repo_id="my_data_project"
    )
    

Integration Relevance

  • project_manager_tools to track data projects and link GitHub repositories to missions.
  • data_connector_tools to export analysis results and commit them to version control.
  • dbt_action to manage dbt projects in GitHub with full version control.
  • file_manager_tools to organize artifacts before committing to repositories.
  • slack_tools to notify teams when pull requests are created or commits are pushed.

Configuration Details

  • Personal Access Token: Generate from GitHub Settings → Developer settings → Personal access tokens. Required scopes: repo, workflow.
  • Repository Naming: Use lowercase with hyphens (e.g., data-pipeline-prod) for consistency.
  • Branch Strategy: Use feature branches (feature/, bugfix/) for changes; protect main branch.
  • Commit Messages: Follow conventional commits format: feat:, fix:, docs:, refactor:.
  • File Paths: Always use forward slashes (/) and relative paths without leading slash.
  • Large Files: GitHub has 100MB file size limit; use Git LFS for larger artifacts or store in data platforms.
Never commit sensitive credentials, API keys, or passwords to GitHub repositories. Use environment variables or secret management systems instead.

Limitations or Notes

  1. File Size Limits: GitHub limits individual files to 100MB; repositories to 1GB recommended size.
  2. API Rate Limits: GitHub API has rate limits (5,000 requests/hour for authenticated users); batch operations when possible.
  3. Private Repository Access: Requires appropriate permissions and PAT scopes for private repos.
  4. Binary Files: Git is optimized for text files; large binary files can slow repository performance.
  5. Merge Conflicts: Automatic merging may fail with conflicts; manual resolution required.
  6. Branch Protection: Protected branches may prevent direct pushes; use pull requests instead.
  7. Network Dependency: All operations require internet connectivity to GitHub servers.

Supported Actions

clone_repo - Clone repository from GitHub URL
create_repo - Create new local repository
list_repos - Show all available repositories
commit - Commit staged changes
push - Push commits to remote
pull - Pull changes from remote
create_branch - Create new branch
switch_branch - Switch to different branch
get_status - Show repository status
get_history - View commit history
create_pull_request - Create PR on GitHub
merge_pull_request - Merge PR
store_github_credentials - Save authentication
list_remote_repos - List repos on GitHub
file operations - Full CRUD on repository files

Not Supported

❌ Git submodules or subtrees
❌ Git LFS (Large File Storage) direct management
❌ GitHub Actions workflow execution
❌ GitHub Issues or Projects management
❌ Repository webhooks configuration
❌ GitHub Pages deployment
❌ Repository transfer or ownership changes
❌ Advanced merge strategies (squash, rebase from tool)

Output

  • clone_repo: Confirmation with repository path and clone details.
  • commit: Commit hash, files changed, and commit message confirmation.
  • push: Push status, branch name, and remote URL.
  • get_history: List of commits with hash, author, date, and message.
  • get_status: Current branch, staged/unstaged files, and sync status.
  • create_pull_request: PR number, URL, and creation confirmation.
  • file operations: Success confirmation, file paths, and content previews.
  • list_repos: Table of repositories with names, paths, and status.
  • Errors: Detailed error messages with resolution guidance (e.g., authentication failures, merge conflicts).

Best Practices

Commit Often

Make small, focused commits with clear messages. Easier to review, revert, and understand history.

Branch Strategy

Use feature branches for development. Keep main/master stable and deployable.

Pull Before Push

Always pull latest changes before pushing to avoid conflicts and ensure smooth integration.

Meaningful Messages

Write descriptive commit messages explaining what and why, not just what changed.

Example: Complete Data Project Workflow

# 1. Store GitHub credentials (one-time setup)
git_action(
    action="store_github_credentials",
    username="data_engineer",
    token="ghp_abc123..."
)

# 2. Clone existing project repository
git_action(
    action="clone_repo",
    repo_id="customer_analytics",
    url="https://github.com/company/customer-analytics.git"
)

# 3. Create feature branch for new analysis
git_action(
    action="create_branch",
    repo_id="customer_analytics",
    branch_name="feature/churn-analysis"
)

git_action(
    action="switch_branch",
    repo_id="customer_analytics",
    branch_name="feature/churn-analysis"
)

# 4. Create analysis notebook
file(
    action="write",
    repo_id="customer_analytics",
    file_path="notebooks/churn_analysis.py",
    content="""
# Customer Churn Analysis
import pandas as pd

# Load data
customers = pd.read_csv('data/customers.csv')

# Analysis logic...
"""
)

# 5. Create documentation
file(
    action="write",
    repo_id="customer_analytics",
    file_path="docs/churn_methodology.md",
    content="# Churn Analysis Methodology\n\n## Overview\n..."
)

# 6. Commit changes
git_action(
    action="commit",
    repo_id="customer_analytics",
    commit_message="feat: Add customer churn analysis with documentation"
)

# 7. Push to GitHub
git_action(
    action="push",
    repo_id="customer_analytics",
    branch_name="feature/churn-analysis"
)

# 8. Create pull request for review
git_action(
    action="create_pull_request",
    repo_id="customer_analytics",
    branch_name="feature/churn-analysis",
    title="Add Customer Churn Analysis",
    body="This PR adds:\n- Churn prediction model\n- Analysis notebooks\n- Documentation"
)

# 9. Check repository status
git_action(
    action="get_status",
    repo_id="customer_analytics"
)

Advanced Features

Bulk File Operations

Copy multiple files matching patterns:
file(
    action="copy",
    repo_id="customer_analytics",
    source_patterns=["*.sql", "*.py"],
    target_path="archive/",
    preserve_structure=True
)

Search Across Files

Find specific content in repository:
file(
    action="search",
    repo_id="customer_analytics",
    pattern="SELECT.*FROM customers",
    regex=True
)

Fetch Deep History

Get more commit history for analysis:
git_action(
    action="fetch_more_history",
    repo_id="customer_analytics",
    depth=100
)

git_action(
    action="get_history",
    repo_id="customer_analytics",
    max_count=50
)

Troubleshooting

  • Verify Personal Access Token is valid and not expired
  • Check token has required scopes: repo, workflow
  • Re-store credentials with store_github_credentials
  • Pull latest changes first: git_action(action="pull")
  • Check for merge conflicts in get_status
  • Verify you have write permissions to repository
  • Verify file_path uses forward slashes (/)
  • Check path is relative without leading slash
  • Use file(action="list") to see available files
  • Switch to existing branch instead of creating
  • Use get_branch to check current branch
  • Delete old branch or use different name
I