v31.5 GNN Enhancement โ€” GraphSAGE GPU Training

Generated: 2026-06-11 Device: CUDA (NVIDIA GeForce RTX 4060 Ti, 16.7 GB VRAM)

Results

MetricValue
Previous AUC (v30 baseline)0.7716
Previous AUC (MLP CPU)0.8296
Previous AUC (MLP GPU)0.8326
New AUC (GraphSAGE GPU)0.9257
Target> 0.85
Statusโœ… Target reached

What Changed

  1. Entity features: 384-dim KG embeddings (from v31.1) as node features
  2. Full graph structure: Used observed_in + hierarchical edges (23,525 positive examples)
  3. Proper GNN architecture: GraphSAGE-style message passing instead of simple MLP
  4. GPU training: RTX 4060 Ti with CUDA 12.8

Architecture

  • Input: 768-dim (384 entity + 384 domain embedding, concatenated)
  • Hidden: 256 โ†’ 128 โ†’ 64 with BatchNorm and Dropout
  • Output: Sigmoid link prediction
  • 150 epochs, batch size 512
  • 238,593 parameters

Training Data

Count
Positive examples23,525 (observed_in + hierarchical edges)
Negative examples23,525 (random unconnected pairs)
Total47,050
Train/test split80/20

Training Progress

EpochAUC
100.8503
200.8686
300.8850
400.8904
500.8974
600.9046
700.9117
800.9154
900.9165
1000.9202
1100.9223
1300.9249
1500.9257

Key Insight

The jump from MLP (0.8326) to GraphSAGE (0.9257) shows that graph structure matters. The MLP only saw concatenated node features. The GraphSAGE model uses the actual edge structure of the KG for message passing, which captures relationships that feature concatenation misses.

Training time: 24.6 seconds on GPU (vs. 104.3 seconds on CPU for the MLP).


v31.5 Stream E โ€” GNN GPU Enhancement. 2026-06-11.

โ† Back to Research