Experiments

My personal experiments with AI models, where I document both the brilliant insights and the spectacular failures that occur when you ask large language models to solve real business problems.

May 19, 2025

Long-Context Financial QA: An Empirical Evaluation of Large Language Models on Financial Document Analysis

Long-Context Financial QA: An Empirical Evaluation of Large Language Models on Financial Document Analysis Executive Summary This whitepaper presents findings from an experimental evaluation of large language models with extended...

Ben Reeve

May 12, 2025

DeepCredit Experiment: Evaluating LLM Performance on Hard CRR Questions

DeepCredit Experiment: Evaluating LLM Performance on Hard CRR Questions Introduction This report details our latest DeepCredit experiment, which represents a significant step toward building a general credit risk analysis agent....

Ben Reeve

Apr 6, 2025

DeepCredit v0.2: Advancing AI-Powered Credit Analysis

Executive Summary This report documents the progress of DeepCredit, our custom AI system for generating comprehensive credit rating reports. The experiment compares our improved DeepCredit v0.2 against OpenAI’s Deep Research,...

Ben Reeve

Apr 6, 2025

Comparing Credit Rating Systems: Deep Research vs. DeepCredit v0.1

Experiment Overview I recently conducted an experiment comparing OpenAI’s Deep Research feature against the first version of my custom-built “DeepCredit” agent for generating credit rating reports. As part of the...

Ben Reeve

← Back to all posts