Skip to main content
The Business Analyst Agent frames questions, aligns metric definitions, discovers data, produces validated queries & summaries, and turns findings into actionable tasks and artifacts.

Purpose

Translate business questions into precise metrics and reproducible analyses.

Scope

Requirements & metric definitions, data discovery, queries/validation, visuals & summaries, backlog of questions.

Design Goals

Grounded SQL/tables/docs, traceable assumptions, read-first safe defaults on production data.
  • Define & validate KPI packs (e.g., PMPM, denial rate, readmission, ALOS)
  • Build query prototypes for dashboards and ad-hoc decisions
  • Reconcile metric discrepancies across teams/sources
  • Prepare analysis briefs for stakeholder reviews
  • Support model design with clarified business rules & testable definitions
    Start with a single KPI pack and a short feedback loop to calibrate definitions across stakeholders.
Access
  • Read access to connected sources (e.g., hcls_demo_1_sources; Snowflake if used)
  • Optional: Google Drive service account for Sheets/Docs
Context
  • Prior docs (requirements, mapping tables, ERDs, glossary)
  • Stakeholders and decision timelines
Workspace (optional but recommended)
  • Default schema: GENESIS.EVE_WORKSPACE
  • Default stage: GENESIS.EVE_WORKSPACE.MY_STAGE
Keep a lightweight glossary handy—synonyms & allowed values speed up alignment and validation.
1

1) Frame the question

Clarify decision, timeframe, grain, filters, and acceptance criteria. Identify stakeholders and review cadence.
2

2) Discover data

Enumerate candidate tables/columns, keys, and constraints. Link to existing models/mappings and known lineage.
3

3) Define metrics

Specify name, formula, grain, dimensions, and filters. Record rationale, caveats, and test cases.
4

4) Generate analyses

Write SQL prototypes with safe LIMITs. Produce aggregations & comparisons (time/cohort/segment). Add compact visuals.
5

5) Validate & iterate

Reconcile results across sources/samples. Log open questions. Mark metrics as proposed / accepted / rejected.
6

6) Publish deliverables

Export query library, KPI dictionary, and results. Attach artifacts to the mission and notify stakeholders.
Every published KPI has acceptance criteria and a validation query.
  • Analysis brief (objectives, assumptions, highlights, risks)
  • KPI/metric dictionary with acceptance criteria
  • SQL query library (organized by topic/metric)
  • Result tables or CSV/XLSX/Sheets exports
  • Lightweight visuals & narrative summary
  • Backlog of tasks (open questions, data issues, follow-ups)
Commonly used
  • project_manager_tools, data_connector_tools, snowflake_tools
  • document_index_tools, file_manager_tools, artifact_manager_tools
  • google_drive_tools, git_action, slack_tools, web_access_tools, delegate_work
Access defaults
  • Read-first on production; workspace writes only when requested
  • No external sharing without explicit approval
Grant the least privilege needed; review access during each milestone.
  • Enforce LIMITs and filters; avoid full scans/cross joins
  • Mask PHI/PII in outputs; exclude raw personal data by default
  • Confirm before creating persistent tables; use the workspace schema
  • Record assumptions and data quality caveats in the brief
    Never export row-level PHI/PII outside approved environments or without redaction.
  • Naming: metric IDs, folder/tags for the query library
  • Sampling defaults and cost guards
  • Export modes: CSV/XLSX/Sheets; directory/stage locations
  • Review cadence and Slack channels for updates
  • Glossary/linking to shared business terms
    Store coverage targets (e.g., every KPI has a unit test query) in a shared config to track compliance.
Mission: Healthcare Data Pipeline (proj_Eve_eCvaHE)Available source tables (hcls_demo_1_sources.main):
  • CLAIMS: claim header info incl. dates, amounts, status
  • CLAIM_DETAILS: service-level details with procedure/diagnosis codes
  • PATIENTS: patient demographics
  • PROVIDERS: provider directory & specialty
Business questions
  • PMPM allowed cost by claim type (last 6 months)
  • Inpatient 30-day readmission rate by diagnosis code (last 6 months)
Frame
  • Grain/dims:
    • PMPM → month, claim_type
    • Readmission → discharge month, diagnosis_code
  • Filters:
    • PMPM → CLAIM_STATUS = 'PAID'
    • Readmission → CLAIM_TYPE = 'INPATIENT', non-null admission/discharge
  • Acceptance:
    • PMPM within ±1% of finance rollup (if available)
    • Readmission within ±0.5 pp of quality reference (if available)
Discover (notes)
  • Eligibility not present → approx member-months by distinct PATIENT_ID per month
  • Use CLAIM_DETAILS.DIAGNOSIS_CODE as DRG proxy when DRG absent
  • Provider keys available for future drilldowns
Metric definitions (preview)
metric_idnameformulagraindimsfilters
pmpm_allowedPMPM Allowed Costsum(INSURANCE_PAID + PATIENT_RESPONSIBILITY) / membersmonthclaim_typeCLAIM_STATUS = ‘PAID’
readmit_30d_rate30d Readmission Rate (IP)sum(readmit_30d) / count(*)monthdiagnosis_codeCLAIM_TYPE = ‘INPATIENT’
Minimal result preview
month_startclaim_typetotal_allowedmemberspmpm_allowed
2025-03-01INPATIENT1,234,567.898,152151.52
2025-03-01OUTPATIENT987,654.3210,04498.33

SQL Prototypes

PMPM (approximate denominator = distinct patients with paid claims per month)
with claims as (
  select
    strftime('%Y-%m-01', CLAIM_DATE) as month_start,
    CLAIM_TYPE,
    PATIENT_ID,
    coalesce(INSURANCE_PAID, 0.0) + coalesce(PATIENT_RESPONSIBILITY, 0.0) as allowed_amt
  from CLAIMS
  where CLAIM_DATE >= date('now','-6 months')
    and CLAIM_STATUS = 'PAID'
),
members as (
  select month_start, CLAIM_TYPE, count(distinct PATIENT_ID) as members
  from claims
  group by month_start, CLAIM_TYPE
),
agg as (
  select month_start, CLAIM_TYPE, sum(allowed_amt) as total_allowed
  from claims
  group by month_start, CLAIM_TYPE
)
select
  a.month_start,
  a.CLAIM_TYPE,
  a.total_allowed,
  m.members,
  case when m.members > 0 then a.total_allowed * 1.0 / m.members end as pmpm_allowed
from agg a
join members m using(month_start, CLAIM_TYPE)
order by 1,2
limit 1000;
30-day readmission rate (inpatient claims; next admission within 30 days of discharge)
with ip as (
  select
    c.PATIENT_ID,
    c.ADMISSION_DATE as admit_dt,
    c.DISCHARGE_DATE as discharge_dt,
    (select d.DIAGNOSIS_CODE
     from CLAIM_DETAILS d
     where d.CLAIM_ID = c.CLAIM_ID
     order by d.SERVICE_DATE limit 1) as diagnosis_code
  from CLAIMS c
  where c.CLAIM_TYPE = 'INPATIENT'
    and c.DISCHARGE_DATE is not null
    and c.DISCHARGE_DATE >= date('now','-6 months')
),
chain as (
  select
    PATIENT_ID,
    admit_dt,
    discharge_dt,
    diagnosis_code,
    lead(admit_dt) over (partition by PATIENT_ID order by admit_dt) as next_admit_dt
  from ip
),
flagged as (
  select *,
         case
           when next_admit_dt is not null
            and (julianday(next_admit_dt) - julianday(discharge_dt)) between 0 and 30
           then 1 else 0
         end as readmit_30d
  from chain
)
select
  strftime('%Y-%m-01', discharge_dt) as discharge_month,
  coalesce(diagnosis_code, 'UNKNOWN') as diagnosis_code,
  count(*) as discharges,
  sum(readmit_30d) as readmits_30d,
  sum(readmit_30d) * 1.0 / nullif(count(*), 0) as readmit_rate_30d
from flagged
group by 1,2
order by 1,2
limit 1000;

Data Flow Overview