Experiments
My personal experiments with AI models, where I document both the brilliant insights and the spectacular failures that occur when you ask large language models to solve real business problems.
Long-Context Financial QA: An Empirical Evaluation of Large Language Models on Financial Document Analysis
Long-Context Financial QA: An Empirical Evaluation of Large Language Models on Financial Document Analysis Executive Summary This whitepaper presents findings from an experimental evaluation of large language models with extended...
DeepCredit Experiment: Evaluating LLM Performance on Hard CRR Questions
DeepCredit Experiment: Evaluating LLM Performance on Hard CRR Questions Introduction This report details our latest DeepCredit experiment, which represents a significant step toward building a general credit risk analysis agent....
DeepCredit v0.2: Advancing AI-Powered Credit Analysis
Executive Summary This report documents the progress of DeepCredit, our custom AI system for generating comprehensive credit rating reports. The experiment compares our improved DeepCredit v0.2 against OpenAI’s Deep Research,...
Comparing Credit Rating Systems: Deep Research vs. DeepCredit v0.1
Experiment Overview I recently conducted an experiment comparing OpenAI’s Deep Research feature against the first version of my custom-built “DeepCredit” agent for generating credit rating reports. As part of the...