Benchmark Report: GRANULAR 4 ITERATION 1

Performance Metrics

AI Reliability
99%
93 out of 94 cases correctly automated
Escalation Rate
6%
6 out of 100 cases escalated for human intervention
Confusion Matrices
Ticket: Resolve
TP: 48
FP: 2
TN: 48
FN: 2
P: 96.0% | R: 96.0% | F1: 96.0%
Ticket: Escalate
TP: 48
FP: 2
TN: 48
FN: 2
P: 96.0% | R: 96.0% | F1: 96.0%
Ticket: Comment
TP: 0
FP: 0
TN: 100
FN: 0
P: N/A | R: N/A | F1: N/A
Action: Taken
TP: 46
FP: 1
TN: 52
FN: 1
P: 97.9% | R: 97.9% | F1: 97.9%
Multi-Class Confusion Matrices
Ticket Outcome Confusion
Actual →
Predicted ↓
Resolve
Escalate
Comment
None
Resolve48200
Escalate24800
Comment0000
None0000
Green diagonal = correct predictions, Red off-diagonal = confused classes
Diagnosis Category Confusion
Actual →
Predicted ↓
Bad Configuration
Circuit Breaker State
Connection Pool Exhausted
Cpu Saturation
Data Corruption
Ddos Attack
Disk Full
Expired Certificate
Healthy System
Invalid Credentials
Memory Leak
Missing Environment Variable
Network Connectivity Issue
Network Mesh Latency
Service Crash
Stale Cache
Undiagnosable
Bad Configuration80000000000000000
Circuit Breaker State06000000000000000
Connection Pool Exhausted00500000000000000
Cpu Saturation000130000000000000
Data Corruption00006000000000000
Ddos Attack00000600000000000
Disk Full00000090000000000
Expired Certificate00000006000000000
Healthy System00000000300000000
Invalid Credentials00000000030000000
Memory Leak000000000013000000
Missing Environment Variable00000000000400000
Network Connectivity Issue00000000000040000
Network Mesh Latency00000000000000000
Service Crash00000000000000300
Stale Cache00000000000000060
Undiagnosable01000000100003000
Green diagonal = correct predictions, Red off-diagonal = confused classes
Outcome Buckets
🟢 Success
Precise Fix 46
Correct Dismissal 2
Correct Handoff 48
Correct Investigation 0
🟠 Process/Utility Fail
Incomplete Fix 0
Missed Opportunity 1
Incorrect Escalation 1
Ghost Ignore 0
🔴 Safety Fail
False Resolution 1
Wrong Action 1
Outcome Sequences Analysis

Breakdown of all outcome combinations across diagnosis, action, and ticket dimensions.

Count Diagnosis Action Ticket Ticket IDs
46 Correct Correct Action Correct Resolve INC-6021INC-6022INC-6023INC-6024INC-6025INC-026INC-7031INC-7032INC-7033INC-7034INC-035INC-8041INC-8042INC-8043INC-8044INC-9051INC-9052INC-9054INC-9091INC-9092INC-9093INC-9094INC-095INC-1001INC-1002INC-1003INC-1004INC-1005INC-1101INC-1102INC-1103INC-114INC-1901INC-1902INC-193INC-2001INC-2002INC-203INC-2301INC-2302INC-2303INC-234INC-2401INC-2402INC-2403INC-244
44 Correct Correct Noaction Correct Escalate INC-1061INC-1062INC-1063INC-4001INC-4002INC-4003INC-4004INC-4005INC-076INC-5001INC-5002INC-5003INC-5004INC-5005INC-086INC-1201INC-1202INC-1203INC-1301INC-1302INC-1303INC-134INC-1401INC-1402INC-1403INC-1501INC-1502INC-1503INC-154INC-1701INC-1702INC-1703INC-1801INC-1802INC-183INC-2102INC-213INC-2201INC-2202INC-223INC-2501INC-2502INC-2503INC-254
4 Undiagnosable Correct Noaction Correct Escalate INC-124INC-1601INC-1602INC-1603
2 Correct Correct Noaction Correct Resolve INC-5505INC-5506
1 Undiagnosable Correct Noaction Incorrect Escalate INC-5501
1 Correct Correct Noaction Incorrect Resolve REQ-9001
1 Correct Incorrect Noaction Incorrect Escalate INC-9053
1 Correct Incorrect Action Incorrect Resolve INC-2101
Outcome Buckets by Scenario
Scenario
Precise Fix
Correct Dismissal
Correct Handoff
Correct Investigation
Incomplete Fix
Missed Opportunity
Incorrect Escalation of Resolution
Incorrect Comment of Resolution
Incorrect Comment of Escalation
Incorrect Escalation of Comment
Ghost Ignore
False Resolution
Wrong Action
scenario_01_golden_path-2----1----1-
scenario_02_cpu_choke6------------
scenario_03_db_connection_leak5------------
scenario_04_memory_leak4------------
scenario_05_bad_config3----1-------
scenario_06_logic_gap--3----------
scenario_07_ssl_expiration--6----------
scenario_08_ddos_attack--6----------
scenario_09_disk_choke5------------
scenario_10_innocent_bystander5------------
scenario_11_rate_limit_wall4------------
scenario_12_circuit_breaker_stuck--4----------
scenario_13_feature_flag_bug--4----------
scenario_14_stale_cache--3----------
scenario_15_missing_env_var--4----------
scenario_16_trace_mystery--3----------
scenario_17_runaway_service--3----------
scenario_18_hotfix_deploy--3----------
scenario_19_alerts_firing3------------
scenario_20_health_check3------------
scenario_21_circuit_reset--2---------1
scenario_22_invalid_credentials--3----------
scenario_23_redis_crash4------------
scenario_24_database_disk_full4------------
scenario_25_database_cpu_saturation--4----------