DDESONN vs Keras — 1000-Seed Summary

Overview

This vignette summarizes DDESONN results across 1000 randomized seeds (two separate 500-seed runs) and compares them against a Keras benchmark summary stored in an Excel workbook bundled with the package.

The purpose of this benchmark is not to showcase a single favorable run. Instead, it evaluates distributional behavior across many random initializations, with emphasis on:

mean performance
variance and standard deviation
worst-case behavior
reproducibility under repeated stress

In this context, stronger stability across seeds is important because it indicates that the training procedure is less sensitive to random initialization and therefore more dependable at scale.

Where the demo data lives

The four RDS artifacts included with the package are stored under:

inst/extdata/heart_failure_runs/
├─ run1/
│  ├─ SingleRun_Train_Acc_Val_Metrics_500_seeds_20251025.rds
│  └─ SingleRun_Test_Metrics_500_seeds_20251025.rds
└─ run2/
   ├─ SingleRun_Train_Acc_Val_Metrics_500_seeds_20251026.rds
   └─ SingleRun_Test_Metrics_500_seeds_20251026.rds

Each folder represents one 500-seed run performed locally; together they form the 1000-seed composite.

Motivation and comparison philosophy

This benchmark addresses a focused research question:

Can a fully R-native, from-first-principles neural network implementation achieve competitive statistical stability relative to an established deep-learning framework under repeated randomized initialization?

The Keras comparison is included as a reference benchmark, not as an implementation template. DDESONN was built independently from scratch and was not derived from Keras source code.

Load DDESONN runs and build the summary

suppressPackageStartupMessages({
  library(dplyr)
  library(tibble)
  library(knitr)
})

if (!requireNamespace("DDESONN", quietly = TRUE)) {
  message("DDESONN not installed in this build session; skipping evaluation.")
  knitr::opts_chunk$set(eval = FALSE)
}

.render_tbl <- function(x, title = NULL, digits = 4) {
  if (requireNamespace("DDESONN", quietly = TRUE) &&
      exists("ddesonn_viewTables", envir = asNamespace("DDESONN"), inherits = FALSE)) {
    get("ddesonn_viewTables", envir = asNamespace("DDESONN"))(x, title = title)
  } else {
    if (!is.null(title)) cat("\n\n###", title, "\n\n")
    knitr::kable(x, digits = digits, format = "html")
  }
}

heart_failure_root <- system.file("extdata", "heart_failure_runs", package = "DDESONN")

if (!nzchar(heart_failure_root)) {
  # Fallback when building from source before installation
  heart_failure_root <- file.path("..", "inst", "extdata", "heart_failure_runs")
}

stopifnot(dir.exists(heart_failure_root))

train_run1_path <- file.path(
  heart_failure_root, "run1",
  "SingleRun_Train_Acc_Val_Metrics_500_seeds_20251025.rds"
)
test_run1_path <- file.path(
  heart_failure_root, "run1",
  "SingleRun_Test_Metrics_500_seeds_20251025.rds"
)
train_run2_path <- file.path(
  heart_failure_root, "run2",
  "SingleRun_Train_Acc_Val_Metrics_500_seeds_20251026.rds"
)
test_run2_path <- file.path(
  heart_failure_root, "run2",
  "SingleRun_Test_Metrics_500_seeds_20251026.rds"
)

stopifnot(
  file.exists(train_run1_path),
  file.exists(test_run1_path),
  file.exists(train_run2_path),
  file.exists(test_run2_path)
)

train_run1 <- readRDS(train_run1_path)
test_run1  <- readRDS(test_run1_path)
train_run2 <- readRDS(train_run2_path)
test_run2  <- readRDS(test_run2_path)

train_all <- dplyr::bind_rows(train_run1, train_run2)
test_all  <- dplyr::bind_rows(test_run1, test_run2)

train_seed <- train_all %>%
  group_by(seed) %>%
  slice_max(order_by = best_val_acc, n = 1, with_ties = FALSE) %>%
  ungroup() %>%
  transmute(
    seed,
    train_acc = best_train_acc,
    val_acc   = best_val_acc
  )

test_seed <- test_all %>%
  group_by(seed) %>%
  slice_max(order_by = accuracy, n = 1, with_ties = FALSE) %>%
  ungroup() %>%
  transmute(
    seed,
    test_acc = accuracy
  )

merged <- inner_join(train_seed, test_seed, by = "seed") %>%
  arrange(seed)

summarize_column <- function(x) {
  pct <- function(p) stats::quantile(x, probs = p, names = FALSE, type = 7)
  data.frame(
    count = length(x),
    mean  = mean(x),
    std   = sd(x),
    min   = min(x),
    `25%` = pct(0.25),
    `50%` = pct(0.50),
    `75%` = pct(0.75),
    max   = max(x),
    check.names = FALSE
  )
}

summary_train <- summarize_column(merged$train_acc)
summary_val   <- summarize_column(merged$val_acc)
summary_test  <- summarize_column(merged$test_acc)

summary_all <- data.frame(
  stat = c("count","mean","std","min","25%","50%","75%","max"),
  train_acc = unlist(summary_train[1, ]),
  val_acc   = unlist(summary_val[1, ]),
  test_acc  = unlist(summary_test[1, ]),
  check.names = FALSE
)

round4 <- function(x) if (is.numeric(x)) round(x, 4) else x
pretty_summary <- as.data.frame(lapply(summary_all, round4))

.render_tbl(
  pretty_summary,
  title = "DDESONN — 1000-seed summary (train/val/test)"
)

stat	train_acc	val_acc	test_acc
count	1000.0000	1000.0000	1000.0000
mean	0.9928	0.9992	0.9992
std	0.0014	0.0013	0.0013
min	0.9854	0.9893	0.9920
25%	0.9920	0.9987	0.9987
50%	0.9929	1.0000	1.0000
75%	0.9937	1.0000	1.0000
max	0.9963	1.0000	1.0000

Keras parity (Excel, Sheet 2)

Keras parity results are stored in an Excel workbook included with the package under:

inst/scripts/vsKeras/1000SEEDSRESULTSvsKeras/1000seedsKeras.xlsx

The file is accessed programmatically using system.file() so the path remains CRAN-safe and cross-platform.

if (!requireNamespace("readxl", quietly = TRUE)) {
  message("Skipping keras-summary chunk: 'readxl' not installed.")
} else {
  keras_path <- system.file(
    "scripts", "vsKeras", "1000SEEDSRESULTSvsKeras", "1000seedsKeras.xlsx",
    package = "DDESONN"
  )

  if (nzchar(keras_path) && file.exists(keras_path)) {
    keras_stats <- readxl::read_excel(keras_path, sheet = 2)
    .render_tbl(
      keras_stats,
      title = "Keras — 1000-seed summary (Sheet 2)"
    )
  } else {
    cat("Keras Excel not found in installed package.\n")
  }
}

stat	seed	train_loss	train_acc	val_loss	val_acc	val_auc	val_auprc	test_loss	test_acc	test_auc	test_auprc
count	1000.0000	1000.0000000	1000.0000000	1000.0000000	1000.0000000	1000.0000000	1.00000e+03	1000.0000000	1000.0000000	1000.0000000	1000.0000000
mean	500.0999	0.1285539	0.9853164	0.0923288	0.9943097	0.9981954	9.97682e-01	0.0801902	0.9968086	0.9992427	0.9989122
std	288.9524	0.2685126	0.0031003	0.1312915	0.0048215	0.0046459	6.14290e-03	0.1931197	0.0035695	0.0031493	0.0052804
min	1.0000	0.0810060	0.9705710	0.0511130	0.9653330	0.9612140	9.03136e-01	0.0498250	0.9786670	0.9691980	0.8900960
25%	250.0000	0.0951690	0.9837140	0.0653810	0.9920000	0.9989820	9.98331e-01	0.0591380	0.9946670	0.9999330	0.9998530
50%	500.0000	0.1022350	0.9857140	0.0709260	0.9960000	0.9997410	9.99473e-01	0.0629980	0.9973330	1.0000000	1.0000000
75%	750.0000	0.1121150	0.9874290	0.0812150	0.9986670	0.9999420	9.99873e-01	0.0697250	1.0000000	1.0000000	1.0000000
max	1000.0000	6.0387850	0.9925710	2.1582980	1.0000000	1.0000000	1.00000e+00	5.5763540	1.0000000	1.0000000	1.0000000

🔬 Benchmark results across 1000 seeds

Across 1000 random neural network initializations, DDESONN demonstrated stronger stability than the Keras benchmark model on this heart-failure task.

benchmark_results <- data.frame(
  Metric = c(
    "Mean Test Accuracy",
    "Standard Deviation",
    "Minimum Test Accuracy",
    "Maximum Test Accuracy"
  ),
  DDESONN = c("≈ 99.92%", "≈ 0.0013", "≈ 99.20%", "100%"),
  Keras   = c("≈ 99.69%", "≈ 0.0036", "≈ 97.82%", "100%"),
  check.names = FALSE
)

.render_tbl(
  benchmark_results,
  title = "Benchmark results across 1000 seeds"
)

Metric	DDESONN	Keras
Mean Test Accuracy	≈ 99.92%	≈ 99.69%
Standard Deviation	≈ 0.0013	≈ 0.0036
Minimum Test Accuracy	≈ 99.20%	≈ 97.82%
Maximum Test Accuracy	100%	100%

These results suggest that DDESONN achieved:

higher average test accuracy
materially lower variance across seeds
stronger minimum-case performance
equal best-case ceiling performance

This is important because lower variance implies the model is less sensitive to randomized initialization and more dependable across repeated training runs.

Why this matters for large-scale projects

Enterprise machine learning pipelines

In large corporate environments, teams may train hundreds or thousands of models across changing datasets, validation windows, and deployment cycles. A lower-variance model reduces the need for repeated retraining simply to obtain a “good seed,” which lowers compute cost and improves operational predictability.

Trading and financial systems

In trading, portfolio analytics, execution modeling, or risk forecasting, model instability can create inconsistent outputs across retrains. A model that is more stable across seeds can improve confidence in:

signal consistency,
scenario analysis,
risk control calibration,
production retraining pipelines.

This does not guarantee trading profitability, but it does support stronger engineering reliability and more reproducible model behavior.

Healthcare and regulated environments

In healthcare and other regulated domains, reproducibility matters because stakeholders need confidence that retraining the same workflow will not produce materially unstable outcomes. Lower dispersion across seeds can help support validation, governance, and auditability.

Aerospace and autonomous systems

In mission-critical environments such as autonomous control or space-related analytics, reproducibility and reliability are essential. More stable training behavior can be valuable when models need to be trusted under constrained or high-stakes deployment settings.

Reproducibility notes

These results aggregate two independent 500-seed runs performed locally.

A master seed was not set for those original runs. Since then:

DDESONN benchmarking has been updated to use a master seed in TestDDESONN_1000seeds.R
Keras parity benchmarking has been updated to use a synchronized master seed in TestKeras_1000seeds.py

Keras raw and summary outputs are compiled in:

inst/scripts/vsKeras/1000SEEDSRESULTSvsKeras/1000seedsKeras.xlsx

Distributed execution (scaling note)

The results shown here were computed locally.

For large-scale experiments involving hundreds or thousands of seeds, DDESONN can be executed in distributed environments to reduce wall-clock time significantly. Distributed orchestration and development-stage scaling scripts are maintained in the GitHub repository and are intentionally excluded from the CRAN package so this vignette remains focused on validated results and benchmark methodology.

DDESONN vs Keras — 1000-Seed Summary — Heart Failure