v31.5 GNN Enhancement — GraphSAGE GPU Training

Generated: 2026-06-11 Device: CUDA (NVIDIA GeForce RTX 4060 Ti, 16.7 GB VRAM)

Results

Metric	Value
Previous AUC (v30 baseline)	0.7716
Previous AUC (MLP CPU)	0.8296
Previous AUC (MLP GPU)	0.8326
New AUC (GraphSAGE GPU)	0.9257
Target	> 0.85
Status	✅ Target reached

What Changed

Entity features: 384-dim KG embeddings (from v31.1) as node features
Full graph structure: Used observed_in + hierarchical edges (23,525 positive examples)
Proper GNN architecture: GraphSAGE-style message passing instead of simple MLP
GPU training: RTX 4060 Ti with CUDA 12.8

Architecture

Input: 768-dim (384 entity + 384 domain embedding, concatenated)
Hidden: 256 → 128 → 64 with BatchNorm and Dropout
Output: Sigmoid link prediction
150 epochs, batch size 512
238,593 parameters

Training Data

	Count
Positive examples	23,525 (observed_in + hierarchical edges)
Negative examples	23,525 (random unconnected pairs)
Total	47,050
Train/test split	80/20

Training Progress

Epoch	AUC
10	0.8503
20	0.8686
30	0.8850
40	0.8904
50	0.8974
60	0.9046
70	0.9117
80	0.9154
90	0.9165
100	0.9202
110	0.9223
130	0.9249
150	0.9257

Key Insight

The jump from MLP (0.8326) to GraphSAGE (0.9257) shows that graph structure matters. The MLP only saw concatenated node features. The GraphSAGE model uses the actual edge structure of the KG for message passing, which captures relationships that feature concatenation misses.

Training time: 24.6 seconds on GPU (vs. 104.3 seconds on CPU for the MLP).

v31.5 Stream E — GNN GPU Enhancement. 2026-06-11.