ASPIRE: Search Evaluation Visual Analytics Tool
I developed ASPIRE, a web based visual analytics tool for evaluating and comparing search systems and ranking variants beyond average metrics. It supports deeper analysis of system performance through query level inspection, error patterns, and diagnostic views that help explain why a model works or fails.
ASPIRE was developed at the University of Milano Bicocca and is used in research and teaching to support reproducible evaluation workflows and evidence based reporting for information retrieval experiments.
What ASPIRE provides
System comparison with statistical testing
Compare multiple search systems and ranking variants using standard effectiveness measures and statistical significance tests.
Impact: Helps teams make changes with confidence, and reduces decisions based only on small metric deltas.
Query-level diagnostics
Drill down from overall results to per-query differences to see where improvements happen and where regressions appear.
Impact: Turns evaluation into actionable debugging by revealing which information needs are not served well.
Query grouping and characteristics
Analyse performance by query properties and explore groups of similar queries to understand patterns rather than isolated cases.
Impact: Supports structured analysis of user information needs and clarifies why a change helps one segment but harms another.
Relevance coverage and content insights
Inspect relevance labels, recurring relevant content, and where that content appears in rankings to understand coverage and gaps.
Impact: Helps identify coverage gaps and prioritise ranking improvements for high value content.
How it is used today
- Teaching: Used in my Information Retrieval Master’s course and in the Bachelor’s course on Information Retrieval and Recommender Systems in AI to show that evaluation requires deep analysis, not only average metrics.
- Industry proof-of-concept: Demonstrates a practical workflow for search evaluation and ranking diagnostics, enabling fast and transparent debugging to guide ranking improvements.
Artifacts
- Repository: github.com/GiorgosPeikos/ASPIRE
- Demo: aspire-ir-eval.streamlit.app
- Paper: ACM Digital Library
- Video: YouTube use case walkthrough