Humanitarians.AI -- Madison Framework

Madison Intelligence Agent
RL System

Reinforcement learning for adaptive source selection. UCB Contextual Bandits and REINFORCE Policy Gradient on real live APIs.

StudentMadhumitha Nandhikatti
ID002304994
FrameworkHumanitarians.AI -- Madison
EnvironmentGoogle Colab CPU
+0.283UCB reward gain
0.873UCB late avg
+0.088REINFORCE gain
0.778baseline avg
5live APIs

Full PDF Report

System architecture, mathematical formulation, experimental results, analysis, ethical considerations, and references.

Madison_RL_Report.pdf

Colab Notebook

Full implementation including UCB Bandit, REINFORCE, training loop, reward function, real API fetchers, statistical analysis, and visualizations.

Madison_RL_Agent.ipynb

Learning Curves

Per-episode reward, cumulative reward, Q-value heatmap, REINFORCE policy distribution, policy entropy over training, and source selection frequency.

Learning curves for UCB and REINFORCE agents over 50 training episodes

RewardSignalEngine

A standalone, reusable multi-component reward scoring tool built specifically for Madison's source selection problem. Evaluates three independent quality dimensions — not just binary success/failure.

Reward components — Cell 5 of notebook

Component Condition Value
r_successFetch succeeded+1.0
r_successFetch failed−1.0
r_lengthWord count > 50+0.5
r_lengthWord count < 10−0.2
r_relevanceKeyword overlap ratio0 – 0.3

Results Summary

Comparison across all three agents. Welch t-test: t=0.987, p=0.329 (not statistically significant at alpha=0.05 — consistent with high API variance over 50 episodes).

Agent performance -- 50 training episodes across 7 topic contexts and 5 information sources

Metric Random Baseline UCB Bandit REINFORCE Winner
Avg Reward (Random baseline)0.778----Baseline
Avg Reward (Late training)0.7780.8730.553UCB
Improvement over random--+0.096−0.225UCB

Project Demo Video

10-minute walkthrough covering notebook structure, live training, learning curves analysis, and before/after performance comparison.

Madison RL Agent -- Full Demo

Madhumitha Nandhikatti -- 002304994