FinLLM Safety Report Part II: Data, Training & Evaluation

September 1, 2025

Aveni Labs has released the second installment of our FinLLM safety framework, focusing on the technical implementation of safety measures throughout data collection, model training, and evaluation processes. This report demonstrates how we transform governance principles into practical safeguards that protect against the specific risks facing AI systems in financial services.

What’s Inside

The report details our systematic approach to building safety into every stage of FinLLM development, from initial data collection through final model evaluation. We document the specific tools, techniques, and metrics we use to identify and mitigate seven key risk categories, including toxicity, bias, hallucination, and privacy violations.

Key areas covered include:

Data Collection Safeguards: Our comprehensive data cleaning pipeline processes over 50 quality metrics while implementing pseudonymisation, toxicity detection, and bias filtering. We detail our three-tier pseudonymisation system that preserves factual information about public figures while protecting private individuals, and show how our finance-specific taxonomy ensures UK regulatory alignment.

Training Methodologies: Systematic safety integration through supervised fine-tuning and alignment techniques. We document our progressive safety data mixing approach (10% to 30% safety content) and demonstrate how we use proprietary vulnerability detection datasets from anonymised Aveni Detect call transcripts to train models for real-world financial applications.

Evaluation Framework: Risk-specific benchmarking using dedicated datasets for each safety category, enabling targeted assessment and mitigation. Our preliminary results show FinLLM SFT v1 achieving a 67.39 average safety score, outperforming OLMo 7B Instruct and Qwen2.5 7B Instruct through system prompt engineering and targeted fine-tuning.

Technical Implementation: Detailed methodology using Microsoft Presidio for personal data identification, Detoxify for toxicity detection, and Celadon for bias classification. We provide specific performance metrics showing 98%+ unbiased content across major bias categories and toxicity scores below 0.006 across our training data.

This technical framework ensures FinLLM models maintain safety standards throughout development while preserving the domain expertise required for financial services applications. The systematic evaluation approach allows continuous monitoring and improvement of safety performance against evolving regulatory requirements.

The final installment will cover our red teaming methodologies, ongoing monitoring systems, and deployment safety measures.

Download FinLLM Safety Part II: Data, Training & Evaluation