Skip to main content

Overview

Tool Name

code_executor_tools

Purpose

The code_executor_tools enable secure execution of Python code within Genesis Data Agents. Run dynamic calculations, data transformations, API integrations, and custom logic without pre-defined functions. Perfect for ad-hoc analysis, prototyping, data science workflows, and extending agent capabilities with custom code on-the-fly.

Functions Available

  1. python_exec: Execute Python code in a secure sandboxed environment with access to common libraries, data sources, and file systems.

Key Features

Secure Execution

Run code in isolated sandbox with resource limits and security constraints to prevent system abuse.

Rich Library Access

Pre-loaded with pandas, numpy, requests, and other common data science and utility libraries.

Data Source Integration

Direct access to configured databases, file systems, and APIs within executed code.

File System Access

Read and write files to Git repositories and storage locations from within executed code.

Dynamic Variables

Pass variables into code execution and retrieve results for further processing.

Error Handling

Comprehensive error messages with stack traces for debugging failed executions.

Input Parameters for Each Function

python_exec

Parameters
NameDefinitionFormat
codePython code to execute. Can be single line or multi-line block.String (required)
variablesDictionary of variables to make available in the execution context.Object
timeoutMaximum execution time in seconds (default: 30, max: 300).Integer
return_varName of variable to return from execution context (default: returns all variables).String
librariesAdditional libraries to import (beyond pre-loaded defaults).Array of Strings
safe_modeEnable additional security restrictions (default: True).Boolean
Use variables parameter to pass data from the agent context into your code. This is more efficient than embedding large data directly in code strings.

Use Cases

  1. Ad-Hoc Data Analysis Perform quick calculations, statistical analysis, or data exploration without creating permanent functions or scripts.
  2. Data Transformation Execute custom transformation logic on datasets that does not fit into standard SQL or built-in functions.
  3. API Integration Make HTTP requests, parse responses, and integrate with external APIs dynamically based on user requirements.
  4. Prototyping & Testing Test algorithms, validate assumptions, or prototype solutions before implementing them as permanent features.
  5. Custom Business Logic Implement organization-specific calculations, rules, or validations that vary by use case or customer.

Workflow/How It Works

  1. Step 1: Simple Calculation Execute basic Python for quick results:
    python_exec(
        code="result = 42 * 365; print(f'Days in 42 years: {result}')"
    )
    
  2. Step 2: Data Analysis with Pandas Analyze data using pre-loaded libraries:
    python_exec(
        code="""
    import pandas as pd
    import numpy as np
    
    data = {
        'product': ['A', 'B', 'C', 'D', 'E'],
        'sales': [150, 200, 175, 300, 225],
        'cost': [100, 150, 125, 200, 175]
    }
    
    df = pd.DataFrame(data)
    df['profit'] = df['sales'] - df['cost']
    df['margin'] = (df['profit'] / df['sales'] * 100).round(2)
    
    summary = {
        'total_sales': df['sales'].sum(),
        'total_profit': df['profit'].sum(),
        'avg_margin': df['margin'].mean(),
        'top_product': df.loc[df['profit'].idxmax(), 'product']
    }
    
    print("Sales Analysis:")
    print(df)
    print(f"Summary: {summary}")
    """
    )
    
  3. Step 3: Pass Variables from Context Use data from previous operations:
    python_exec(
        code="""
    import pandas as pd
    
    df = pd.DataFrame(customer_data)
    
    df['segment'] = pd.cut(
        df['total_spend'], 
        bins=[0, 1000, 5000, float('inf')],
        labels=['Bronze', 'Silver', 'Gold']
    )
    
    segment_summary = df.groupby('segment').agg({
        'customer_id': 'count',
        'total_spend': 'sum'
    }).round(2)
    
    print(segment_summary)
    result = segment_summary.to_dict()
    """,
        variables={'customer_data': customer_data},
        return_var='result'
    )
    
  4. Step 4: API Integration Make external API calls and process responses:
    python_exec(
        code="""
    import requests
    
    response = requests.get(
        'https://api.openweathermap.org/data/2.5/weather',
        params={
            'q': city_name,
            'appid': api_key,
            'units': 'metric'
        }
    )
    
    if response.status_code == 200:
        weather_data = response.json()
        result = {
            'temperature': weather_data['main']['temp'],
            'humidity': weather_data['main']['humidity'],
            'description': weather_data['weather'][0]['description']
        }
        print(f"Weather in {city_name}: {result['temperature']}°C")
    else:
        result = {'error': f"API call failed: {response.status_code}"}
    """,
        variables={'city_name': 'London', 'api_key': 'your_api_key'},
        return_var='result'
    )
    
  5. Step 5: File Operations Read and write files within execution:
    python_exec(
        code="""
    import json
    
    with open('/workspace/config.json', 'r') as f:
        config = json.load(f)
    
    config['last_run'] = '2024-01-15'
    config['status'] = 'completed'
    
    with open('/workspace/config.json', 'w') as f:
        json.dump(config, f, indent=2)
    
    print(f"Config updated: {config}")
    result = config
    """,
        return_var='result'
    )
    
  6. Step 6: Complex Data Processing Implement multi-step algorithms:
    python_exec(
        code="""
    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(transaction_data)
    df['date'] = pd.to_datetime(df['date'])
    df = df.sort_values('date')
    
    df['rolling_avg_7d'] = df['amount'].rolling(window=7).mean()
    df['rolling_sum_30d'] = df['amount'].rolling(window=30).sum()
    
    mean = df['amount'].mean()
    std = df['amount'].std()
    df['z_score'] = (df['amount'] - mean) / std
    df['is_anomaly'] = abs(df['z_score']) > 3
    
    anomalies = df[df['is_anomaly']]
    summary = {
        'total_transactions': len(df),
        'anomaly_count': len(anomalies),
        'anomaly_dates': anomalies['date'].dt.strftime('%Y-%m-%d').tolist(),
        'avg_7d_trend': df['rolling_avg_7d'].iloc[-1]
    }
    
    print(f"Detected {len(anomalies)} anomalies")
    result = summary
    """,
        variables={'transaction_data': transactions},
        return_var='result',
        timeout=60
    )
    

Integration Relevance

  • data_connector_tools to fetch data from databases, then process with custom Python code.
  • file tools to read/write files before or after code execution.
  • github_connector_tools / gitlab_connector_tools to version control generated scripts.
  • image_tools to process images or generate visualizations from code execution results.
  • web_access_tools to fetch data via HTTP within executed code.
  • project_manager_tools to execute custom code as part of automated mission tasks.

Configuration Details

  • Execution Environment: Python 3.9+ with isolated namespace per execution.
  • Pre-loaded Libraries: pandas, numpy, requests, json, datetime, os, sys, re, math, statistics.
  • Resource Limits: CPU time limit (default 30s), memory limit (default 512MB).
  • File System Access: Read/write to designated workspace directories and Git repositories.
  • Network Access: HTTP/HTTPS requests allowed; other protocols restricted.
  • Security Mode: Restricted access to system operations, subprocess execution, and dangerous functions.
  • Return Value: Can return primitive types, dictionaries, lists, or serializable objects.
Code execution happens in a sandboxed environment, but always validate and sanitize user-provided code to prevent malicious operations. Never execute untrusted code without review.

Limitations or Notes

  1. Execution Timeout: Default 30 seconds, maximum 300 seconds (5 minutes). Long-running operations may be terminated.
  2. Memory Limits: Default 512MB. Large dataset operations may exceed limits and fail.
  3. No Subprocess: Cannot spawn subprocesses or execute system commands for security.
  4. Library Restrictions: Cannot install arbitrary packages; limited to pre-loaded libraries.
  5. Network Limitations: Only HTTP/HTTPS allowed; no raw socket access or other protocols.
  6. File System Scope: Access limited to designated directories; cannot access system files.
  7. Serialization: Return values must be JSON-serializable or primitive types.
  8. State Persistence: Each execution is isolated; no state persists between calls.
  9. Standard Output: Print statements captured but may be truncated for very large outputs.

Supported Operations

Basic Python operations - Variables, functions, loops, conditionals
Data manipulation - pandas DataFrames, numpy arrays, list/dict operations
File I/O - Read/write text and binary files in workspace
HTTP requests - GET, POST, PUT, DELETE with requests library
JSON/CSV processing - Parse and generate structured data formats
Date/time calculations - datetime, timedelta operations
Regular expressions - Pattern matching and text processing
Mathematical operations - Statistics, linear algebra, calculations
String manipulation - Formatting, parsing, encoding
Exception handling - Try/except blocks and error recovery

Not Supported

❌ Installing new packages (pip install) during execution
❌ Subprocess execution (subprocess, os.system, etc.)
❌ System file access (outside workspace)
❌ Database connections (use data_connector_tools instead)
❌ Multi-threading or multiprocessing
❌ Raw socket programming
❌ Modifying Python environment or interpreter
❌ Accessing environment variables (except explicitly passed)
❌ Infinite loops (terminated by timeout)
❌ Heavy machine learning model training (use dedicated tools)

Output

  • Success: Execution result with returned variable(s), stdout output, and execution time.
  • Variables: Dictionary of all variables in execution context (unless specific return_var specified).
  • Standard Output: Captured print statements and logging output.
  • Execution Time: Time taken to execute code in seconds.
  • Errors: Exception type, error message, and full stack trace for debugging.
  • Warnings: Any resource warnings (approaching timeout, high memory usage).

Best Practices

Small & Focused

Keep code blocks focused on single tasks. Break complex operations into multiple executions.

Error Handling

Always use try/except blocks for operations that might fail (API calls, file I/O, parsing).

Resource Awareness

Monitor execution time and memory usage. Use sampling for large datasets.

Data Validation

Validate input data and function parameters before processing to avoid runtime errors.

Logging

Use print statements to track execution progress and debug issues.

Return Values

Always assign results to variables for retrieval. Do not rely solely on print output.

Example: Complete Data Analysis Workflow

# Step 1: Fetch data from database
query_result = _query_database(
    connection_id="analytics_db",
    query="SELECT customer_id, order_date, order_amount, product_category FROM orders WHERE order_date >= '2024-01-01' LIMIT 10000",
    max_rows=10000
)

# Step 2: Perform complex analysis
analysis_result = python_exec(
    code="""
import pandas as pd
import numpy as np

df = pd.DataFrame(data)
df['order_date'] = pd.to_datetime(df['order_date'])
df['order_amount'] = pd.to_numeric(df['order_amount'], errors='coerce')
df = df.dropna()

print(f"Loaded {len(df)} orders")

customer_stats = df.groupby('customer_id').agg({
    'order_amount': ['sum', 'mean', 'count'],
    'order_date': ['min', 'max']
}).reset_index()

customer_stats.columns = ['customer_id', 'total_spend', 'avg_order', 'order_count', 'first_order', 'last_order']
customer_stats['lifetime_days'] = (customer_stats['last_order'] - customer_stats['first_order']).dt.days

def segment_customer(row):
    if row['total_spend'] > 5000:
        return 'VIP'
    elif row['total_spend'] > 2000:
        return 'Gold'
    elif row['total_spend'] > 500:
        return 'Silver'
    else:
        return 'Bronze'

customer_stats['segment'] = customer_stats.apply(segment_customer, axis=1)

category_stats = df.groupby('product_category').agg({
    'order_amount': ['sum', 'mean', 'count'],
    'customer_id': 'nunique'
}).reset_index()

category_stats.columns = ['category', 'total_revenue', 'avg_order', 'order_count', 'unique_customers']
category_stats = category_stats.sort_values('total_revenue', ascending=False)

daily_sales = df.groupby(df['order_date'].dt.date)['order_amount'].agg(['sum', 'count']).reset_index()
daily_sales.columns = ['date', 'revenue', 'orders']
daily_sales['avg_order_value'] = (daily_sales['revenue'] / daily_sales['orders']).round(2)
daily_sales['revenue_7d_avg'] = daily_sales['revenue'].rolling(window=7).mean().round(2)
daily_sales['revenue_growth'] = daily_sales['revenue'].pct_change() * 100

summary = {
    'total_revenue': float(df['order_amount'].sum()),
    'total_orders': len(df),
    'unique_customers': df['customer_id'].nunique(),
    'avg_order_value': float(df['order_amount'].mean()),
    'segments': customer_stats['segment'].value_counts().to_dict(),
    'top_category': category_stats.iloc[0]['category'],
    'top_category_revenue': float(category_stats.iloc[0]['total_revenue'])
}

result = {
    'summary': summary,
    'customer_segments': customer_stats.to_dict('records'),
    'category_performance': category_stats.to_dict('records'),
    'daily_trends': daily_sales.tail(30).to_dict('records')
}
""",
    variables={'data': query_result['data']},
    return_var='result',
    timeout=60
)

print(f"Summary: {analysis_result['result']['summary']}")

Advanced Features

Statistical Analysis

python_exec(
    code="""
import numpy as np
from scipy import stats

data = np.array(values)

result = {
    'mean': float(np.mean(data)),
    'median': float(np.median(data)),
    'std': float(np.std(data)),
    'min': float(np.min(data)),
    'max': float(np.max(data)),
    'q25': float(np.percentile(data, 25)),
    'q75': float(np.percentile(data, 75))
}

stat, p_value = stats.normaltest(data)
result['is_normal'] = p_value > 0.05
result['normality_p_value'] = float(p_value)

print(f"Distribution analysis: {result}")
""",
    variables={'values': [23, 45, 67, 89, 34, 56, 78, 90, 12, 45]},
    return_var='result'
)

Text Processing

python_exec(
    code="""
import re
from collections import Counter

text = text_data.lower()
text = re.sub(r'[^a-z0-9 ]', '', text)
words = text.split()

stop_words = {'the', 'a', 'an', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for'}
words = [w for w in words if w not in stop_words and len(w) > 2]

word_freq = Counter(words)
top_words = word_freq.most_common(20)

result = {
    'total_words': len(words),
    'unique_words': len(set(words)),
    'top_words': [{'word': w, 'count': c} for w, c in top_words]
}

print(f"Analyzed {result['total_words']} words")
""",
    variables={'text_data': long_text_content},
    return_var='result'
)

Data Validation

python_exec(
    code="""
import pandas as pd

df = pd.DataFrame(dataset)

validation_results = {
    'row_count': len(df),
    'column_count': len(df.columns),
    'missing_values': df.isnull().sum().to_dict(),
    'duplicate_rows': df.duplicated().sum(),
    'data_types': df.dtypes.astype(str).to_dict()
}

errors = []

required_cols = ['customer_id', 'order_date', 'amount']
missing_cols = [col for col in required_cols if col not in df.columns]
if missing_cols:
    errors.append(f"Missing required columns: {missing_cols}")

if 'amount' in df.columns:
    negative_amounts = (df['amount'] < 0).sum()
    if negative_amounts > 0:
        errors.append(f"Found {negative_amounts} negative amounts")

if 'order_date' in df.columns:
    try:
        pd.to_datetime(df['order_date'])
    except:
        errors.append("Invalid date format in order_date column")

validation_results['errors'] = errors
validation_results['is_valid'] = len(errors) == 0

print(f"Validation {'passed' if validation_results['is_valid'] else 'failed'}")
result = validation_results
""",
    variables={'dataset': data_to_validate},
    return_var='result'
)

API Pagination Handler

python_exec(
    code="""
import requests
import time

def fetch_paginated_data(base_url, api_key, max_pages=10):
    all_data = []
    page = 1
    
    while page <= max_pages:
        response = requests.get(
            base_url,
            params={'page': page, 'per_page': 100},
            headers={'Authorization': f'Bearer {api_key}'}
        )
        
        if response.status_code != 200:
            print(f"Error on page {page}: {response.status_code}")
            break
        
        data = response.json()
        if not data.get('items'):
            break
        
        all_data.extend(data['items'])
        print(f"Fetched page {page}: {len(data['items'])} items")
        
        if not data.get('has_next'):
            break
        
        page += 1
        time.sleep(0.5)
    
    return all_data

result = fetch_paginated_data(api_url, token)
print(f"Total items fetched: {len(result)}")
""",
    variables={'api_url': 'https://api.example.com/data', 'token': 'your_token'},
    return_var='result',
    timeout=120
)

Troubleshooting

  • Reduce dataset size or use sampling
  • Optimize algorithms (vectorize operations with numpy/pandas)
  • Increase timeout parameter (up to 300 seconds)
  • Break into smaller execution steps
  • Use generators or lazy evaluation for large data
  • Process data in chunks instead of loading all at once
  • Use pandas iterator options (chunksize parameter)
  • Delete large intermediate variables with del variable
  • Use more memory-efficient data types (int8 vs int64)
  • Consider using data_connector_tools for heavy processing
  • Verify library is in pre-loaded list
  • Check for typos in import statement
  • Use alternative built-in libraries when possible
  • Request library addition from Genesis team
  • Implement functionality manually if simple
  • Verify file path is relative to workspace
  • Check file exists using os.path.exists()
  • Ensure file was created in previous step
  • Use absolute paths within workspace directories
  • Check file permissions and access rights
  • Convert numpy arrays to lists: array.tolist()
  • Convert pandas DataFrames to dicts: df.to_dict()
  • Use float() for numpy float types
  • Use int() for numpy integer types
  • Convert datetime to ISO strings: dt.isoformat()
  • Check internet connectivity (if applicable)
  • Verify API endpoint URL is correct
  • Check authentication credentials
  • Add error handling with try/except
  • Check for rate limiting and add delays
  • Verify API response format matches expectations
  • Check for proper indentation (use 4 spaces)
  • Verify quotes are properly closed
  • Check for balanced parentheses/brackets
  • Test code in local Python environment first
  • Use syntax highlighter to identify issues

Security Considerations

Code Execution Risks: Always validate user-provided code before execution. Never execute code from untrusted sources without thorough review.

Sandbox Protections

The code executor implements multiple security layers:
  • Restricted Imports: Cannot import subprocess, os.system, eval, exec, compile, or other dangerous functions
  • File System Isolation: Access limited to designated workspace directories
  • Network Filtering: Only HTTP/HTTPS; no raw sockets or other protocols
  • Resource Limits: CPU time and memory caps prevent denial-of-service
  • Namespace Isolation: Each execution has isolated variable scope
  • No Privilege Escalation: Cannot access system files or escalate permissions

Safe Coding Practices

Input Validation

Validate and sanitize all user inputs before using them in code execution.

Error Containment

Use try/except blocks to prevent error propagation and information leakage.

Resource Monitoring

Monitor execution time and memory usage. Set appropriate timeouts.

Audit Logging

Log all code executions with user context for security auditing.

Performance Optimization

Efficient Data Processing

# Inefficient - Row-by-row iteration
python_exec(
    code="""
result = []
for row in data:
    result.append(row['value'] * 2)
"""
)

# Efficient - Vectorized operations
python_exec(
    code="""
import pandas as pd
df = pd.DataFrame(data)
result = (df['value'] * 2).tolist()
"""
)

Memory Management

python_exec(
    code="""
import pandas as pd

chunks = []
for chunk in pd.read_csv('large_file.csv', chunksize=10000):
    processed = chunk[chunk['amount'] > 100]
    chunks.append(processed)

result = pd.concat(chunks, ignore_index=True)
del chunks
"""
)
I