Architecture Guide

This guide explains how BaseAttentive works internally, what changed in v2.0.0, and how to use the new registry / resolver / assembly system. If you are migrating from v1.0.0, read the breaking changes section first.


Overview

BaseAttentive is an encoder-decoder neural network for sequence-to-sequence time series forecasting. It accepts three distinct feature streams:

  1. Static features — time-invariant properties (batch, static_dim)

  2. Dynamic features — historical time series (batch, T, dynamic_dim)

  3. Future features — known future exogenous variables (batch, H, future_dim)

┌─────────────────────────────────────┐
│           Inputs (3 types)          │
├─────────────────────────────────────┤
│ static:   (batch, S)                │
│ dynamic:  (batch, T, D)             │
│ future:   (batch, H, F)             │
└────────────────┬────────────────────┘
                 │
                 ▼
        ┌─────────────────────┐
        │   Encoder-Decoder   │
        └────────────┬────────┘
                     │
             ┌───────┴────────┐
             │                │
             ▼                ▼
        Point Forecast      With Quantiles
      (B, H, output_dim)  (B, H, Q, output_dim)

Conceptual flow:

  1. Select — Variable Selection Networks (VSN) weight each input feature

  2. Project — Transform features into a shared embedding space

  3. Encode — Process temporal context (hybrid LSTM or pure transformer)

  4. Attend — Apply the decoder attention stack (cross / hierarchical / memory)

  5. Pool — Collapse the sequence representation into a fixed vector

  6. Forecast — Generate point or probabilistic outputs


Encoder Architectures

Hybrid Mode (objective="hybrid")

Multi-scale LSTM with attention. Each LSTM processes a down-sampled version of the sequence at scale s, then the outputs are aggregated before entering the decoder:

import numpy as np
from base_attentive import BaseAttentive

model = BaseAttentive(
    static_input_dim=4,
    dynamic_input_dim=8,
    future_input_dim=6,
    output_dim=1,
    forecast_horizon=24,
    objective="hybrid",
    scales=[1, 2, 4],          # sequence sub-sampled at ×1, ×2, ×4
    multi_scale_agg="average", # how to merge the scale outputs
    embed_dim=32,
)

scales=[1, 2, 4] creates three parallel LSTMs. At scale s, every s-th time step is kept, so the LSTM at scale 4 sees a quarter of the full history. This lets the model capture both fine-grained and coarse temporal patterns simultaneously.

multi_scale_agg choices:

Value

Effect

"last"

Keep the final hidden state of each scale; concatenate then project

"average"

Average all hidden states across time, then merge

"flatten"

Flatten the full output sequence of each scale, then project

"sum"

Sum hidden states element-wise across time

"concat"

Concatenate all time-step outputs end-to-end

Transformer Mode (objective="transformer")

Pure self-attention encoder — better parallelism on shorter sequences (T < 500):

model = BaseAttentive(
    static_input_dim=4,
    dynamic_input_dim=8,
    future_input_dim=6,
    output_dim=1,
    forecast_horizon=24,
    objective="transformer",
    num_encoder_layers=4,
    num_heads=8,
    embed_dim=64,
)

Decoder Attention Stack

After encoding, a configurable stack of attention mechanisms bridges the encoded history with the future feature context.

Attention types:

Type

Purpose

Use case

"cross"

Bridge encoder outputs to future context

Default; works for all forecasting tasks

"hierarchical"

Multi-level temporal patterns in the decoder

Seasonal / structured data with nested cycles

"memory"

Retrieve patterns from a learned memory bank

Long-range dependencies, repeated anomalies

Controlling the stack with attention_levels:

# All three levels
model = BaseAttentive(..., attention_levels=None)

# Single level by name
model = BaseAttentive(..., attention_levels="cross")

# Two levels by list
model = BaseAttentive(..., attention_levels=["cross", "memory"])

# Single level by integer (1=cross, 2=hierarchical, 3=memory)
model = BaseAttentive(..., attention_levels=2)

Operational Mode Shortcuts

The mode parameter applies a named configuration profile, wiring up encoder type, attention stack, and decoder in one step:

Value

Effect

None (default)

Manual configuration — use objective, architecture_config, etc.

"tft" / "tft_like"

Temporal Fusion Transformer style: VSN + gated residuals + cross attention

"pihal" / "pihal_like"

Physics-Informed HAL style: memory-augmented + hierarchical stack

# TFT-like mode — no need to specify objective or attention_levels
model = BaseAttentive(
    static_input_dim=4,
    dynamic_input_dim=8,
    future_input_dim=6,
    output_dim=1,
    forecast_horizon=24,
    mode="tft",
    embed_dim=32,
)

Output Modes

# Point forecast — shape (batch, H, output_dim)
model = BaseAttentive(..., output_dim=2, forecast_horizon=24)

# Quantile forecast — shape (batch, H, Q, output_dim)
model = BaseAttentive(..., quantiles=[0.1, 0.5, 0.9])

# Probabilistic (Gaussian mixture, for CRPSLoss)
model = BaseAttentive(..., output_mode="gaussian_mixture")

V2 Architecture: Registry / Resolver / Assembly

Version 2.0.0 replaces the monolithic class hierarchy of v1.0.0 with a registry / resolver / assembly system. Every model component is now registered under a string key and resolved at build time. This makes the model fully pluggable and backend-neutral.

Why this matters

In v1.0.0 the encoder, attention heads, and forecast head were hard-coded inside BaseAttentive. Customising them required subclassing internal layers and overriding private methods — fragile and backend-specific.

In v2.0.0:

  • Each component is a builder function stored in a registry.

  • BaseAttentiveSpec / BaseAttentiveComponentSpec describe the model purely as data (no Keras imports required at spec-creation time).

  • BaseAttentiveV2Assembly reads the spec, resolves each component from the registry, and wires everything together.

  • Swapping a component is a one-line registry call — no subclassing.

The Three Registries

ComponentRegistry

Stores builder functions for individual layers (encoders, projections, attention heads, pooling, forecast heads). Key format: "<category>.<name>".

ModelRegistry

Stores assembler functions that construct the full model from a spec.

Both registries are available as singletons:

from base_attentive.registry import (
    DEFAULT_COMPONENT_REGISTRY,
    DEFAULT_MODEL_REGISTRY,
)

Registering a custom encoder

from base_attentive.registry import DEFAULT_COMPONENT_REGISTRY

def wavenet_encoder_builder(*, context, units, hidden_units, **kw):
    """
    A WaveNet-style dilated causal encoder.
    context: BaseAttentiveSpec — gives access to embed_dim, dropout_rate, etc.
    """
    from my_layers import WaveNetBlock
    return WaveNetBlock(
        units=units,
        dilation_rates=[1, 2, 4, 8],
        dropout=context.dropout_rate,
    )

DEFAULT_COMPONENT_REGISTRY.register(
    "encoder.wavenet",
    wavenet_encoder_builder,
    backend="generic",          # works across TF / Torch / JAX
    description="WaveNet dilated causal encoder.",
)

Then use the key in a spec:

from base_attentive.config import BaseAttentiveSpec, BaseAttentiveComponentSpec

spec = BaseAttentiveSpec(
    static_input_dim=4,
    dynamic_input_dim=8,
    future_input_dim=6,
    output_dim=1,
    forecast_horizon=24,
    embed_dim=64,
    components=BaseAttentiveComponentSpec(
        temporal_encoder="encoder.wavenet",   # <-- custom component
    ),
)

from base_attentive.assembly import BaseAttentiveV2Assembly
assembler = BaseAttentiveV2Assembly()
model = assembler.build(spec)

BaseAttentiveSpec

A frozen dataclass that fully describes a model without any framework imports. All fields have defaults.

from base_attentive.config import BaseAttentiveSpec, BaseAttentiveComponentSpec

spec = BaseAttentiveSpec(
    # ── Input dimensions ────────────────────────────────────────────
    static_input_dim=4,
    dynamic_input_dim=8,
    future_input_dim=6,
    output_dim=1,
    forecast_horizon=24,

    # ── Model capacity ──────────────────────────────────────────────
    embed_dim=32,
    hidden_units=64,
    attention_heads=4,
    dropout_rate=0.1,
    activation="relu",
    layer_norm_epsilon=1e-6,

    # ── Backend / head ──────────────────────────────────────────────
    backend_name="tensorflow",   # or "torch" / "jax"
    head_type="point",           # or "quantile"
    quantiles=(),                # e.g. (0.1, 0.5, 0.9)

    # ── Component overrides ─────────────────────────────────────────
    components=BaseAttentiveComponentSpec(
        sequence_pooling="pool.last",      # override pooling
        temporal_encoder="encoder.wavenet",# override encoder
    ),
)

BaseAttentiveComponentSpec accepts the following keys (all optional — omitted keys use the registry default):

Field

Registry key resolved

static_projection

"projection.static"

dynamic_projection

"projection.dynamic"

future_projection

"projection.future"

hidden_projection

"projection.hidden"

temporal_encoder

"encoder.temporal_self_attention"

sequence_pooling

"pool.mean"

feature_fusion

"fusion.concat"

forecast_head

"head.point_forecast" or "head.quantile_forecast"

Default component keys (built-in, "generic" backend):

Registry key

Purpose

"projection.static"

Static feature linear projection

"projection.dynamic"

Dynamic sequence projection

"projection.future"

Future covariate projection

"projection.hidden"

Post-fusion hidden projection

"projection.dense"

Generic dense projection (fallback)

"encoder.temporal_self_attention"

Temporal self-attention encoder

"pool.mean"

Sequence mean pooling

"pool.last"

Last-step pooling

"fusion.concat"

Feature concatenation

"head.point_forecast"

Point forecast head

"head.quantile_forecast"

Quantile forecast head

Inspecting the registry

from base_attentive.registry import DEFAULT_COMPONENT_REGISTRY

# List all registered keys
for key in DEFAULT_COMPONENT_REGISTRY.list_keys():
    print(key)

# Check if a key exists
if DEFAULT_COMPONENT_REGISTRY.has("encoder.wavenet"):
    print("custom encoder registered")

# Retrieve builder metadata
info = DEFAULT_COMPONENT_REGISTRY.get_info("encoder.temporal_self_attention")
print(info["description"])

Full v2 build-from-spec example

import numpy as np
from base_attentive.config import BaseAttentiveSpec
from base_attentive.assembly import BaseAttentiveV2Assembly

spec = BaseAttentiveSpec(
    static_input_dim=4,
    dynamic_input_dim=8,
    future_input_dim=6,
    output_dim=1,
    forecast_horizon=24,
    embed_dim=32,
    hidden_units=64,
    attention_heads=4,
    backend_name="tensorflow",
    head_type="quantile",
    quantiles=(0.1, 0.5, 0.9),
)

model = BaseAttentiveV2Assembly().build(spec)
model.compile(optimizer="adam", loss="mse")

x_static  = np.random.randn(16, 4).astype("float32")
x_dynamic = np.random.randn(16, 100, 8).astype("float32")
x_future  = np.random.randn(16, 24, 6).astype("float32")
y         = np.random.randn(16, 24, 1).astype("float32")

model.fit([x_static, x_dynamic, x_future], y, epochs=2)

Using BaseAttentive (facade)

The BaseAttentive class is a convenience facade that builds the model from keyword arguments without requiring you to construct a spec manually. It delegates to the same registry/assembly system under the hood:

from base_attentive import BaseAttentive

# This is equivalent to building through BaseAttentiveSpec + Assembly
model = BaseAttentive(
    static_input_dim=4,
    dynamic_input_dim=8,
    future_input_dim=6,
    output_dim=1,
    forecast_horizon=24,
    embed_dim=32,
    num_heads=4,
    quantiles=[0.1, 0.5, 0.9],
)

Breaking Changes in v2.0.0

v2.0.0 is a major release. If you are upgrading from v1.0.0, the following changes require action.

Note

These changes are intentional. The v1.0.0 API was tightly coupled to TensorFlow; v2.0.0 achieves full backend neutrality through these structural changes.

1. Keras 3 required

v1.0.0 used tensorflow.keras directly. v2.0.0 uses Keras 3 (import keras) as the framework abstraction layer.

What breaks: Any code that imports from tensorflow.keras or passes tf.Tensor objects to model inputs may need updating.

Migration:

# v1.0.0 — TensorFlow-coupled
import tensorflow as tf
model = BaseAttentive(...)
x = tf.random.normal([32, 100, 8])

# v2.0.0 — backend-neutral
import numpy as np
model = BaseAttentive(...)
x = np.random.randn(32, 100, 8).astype("float32")
# or use the active backend's tensor type directly

2. Internal layer paths removed

In v1.0.0, internal layer classes were importable from base_attentive.layers.* and base_attentive.models.components.*. These paths no longer exist in v2.0.0. All components are accessed through the registry.

What breaks: Direct imports of internal layer classes.

Migration:

# v1.0.0 (breaks in v2.0.0)
from base_attentive.layers import HierarchicalAttention

# v2.0.0 — use registry or components_reference API
from base_attentive.registry import DEFAULT_COMPONENT_REGISTRY
builder = DEFAULT_COMPONENT_REGISTRY.get("attention.hierarchical")

3. architecture_config dict keys changed

Several architecture_config keys were renamed for clarity:

v1.0.0 key

v2.0.0 key

Notes

"encoder_units"

"embed_dim"

Unified dimension name

"decoder_heads"

"num_heads"

Consistent with Keras naming

"use_attention"

"attention_levels"

Now accepts name, list, or int

"temporal_mode"

"objective"

"hybrid" or "transformer"

Migration:

# v1.0.0
model = BaseAttentive(
    ...,
    architecture_config={"encoder_units": 64, "use_attention": True},
)

# v2.0.0
model = BaseAttentive(
    ...,
    embed_dim=64,
    attention_levels=["cross"],
)

4. output_mode default changed

v1.0.0 default was "quantile" when quantiles was set. v2.0.0 always infers the output mode from the combination of quantiles and output_mode. Passing quantiles without output_mode now produces a quantile forecast as before, but the internal tensor layout changed:

Setting

v1.0.0 output shape

v2.0.0 output shape

output_dim=2, no quantiles

(B, H, 2)

(B, H, 2) (unchanged)

output_dim=2, quantiles=[0.1,0.5,0.9]

(B, H, 2, 3) ← Q last

(B, H, 3, 2) ← Q before output_dim

Migration: If you index the quantile axis, update from [..., i] (v1) to [:, :, i, :] (v2).


Data Flow Diagram (v2)

Static (B,S)  Dynamic (B,T,D)  Future (B,H,F)
     │               │                │
     │        ┌──────▼──────┐         │
     │        │  VSN / Dense │         │
     │        └──────┬──────┘         │
     │               │                │
┌────▼────┐   ┌──────▼──────┐  ┌──────▼──────┐
│ Static  │   │  Temporal   │  │  Future     │
│ Proj.   │   │  Encoder    │  │  Proj.      │
│ (Dense) │   │ (LSTM/Attn) │  │  (Dense)    │
└────┬────┘   └──────┬──────┘  └──────┬──────┘
     │               │                │
     └───────────────┴────────────────┘
                     │
          ┌──────────▼──────────┐
          │   Feature Fusion    │
          │   (concat + proj)   │
          └──────────┬──────────┘
                     │
          ┌──────────▼──────────┐
          │   Attention Stack   │
          │   (cross → hier     │
          │    → memory)        │
          └──────────┬──────────┘
                     │
          ┌──────────▼──────────┐
          │  Sequence Pooling   │
          │  (mean / last)      │
          └──────────┬──────────┘
                     │
          ┌──────────▼──────────┐
          │  Hidden Projection  │
          └──────────┬──────────┘
                     │
          ┌──────────┴──────────┐
          │                     │
    Point Forecast        Quantile Forecast
      (B, H, D)             (B, H, Q, D)

Configuration Hierarchy

Precedence (lowest → highest):

  1. Built-in defaults (DEFAULT_ARCHITECTURE)

  2. Explicit keyword arguments (objective, mode, attention_levels, …)

  3. architecture_config dict (overrides all)

model = BaseAttentive(
    ...,
    objective="hybrid",            # step 2
    architecture_config={
        "encoder_type": "transformer",  # step 3 — wins over step 2
    },
)

Performance Notes

Mode

Encoder

Complexity

Notes

Hybrid

Multi-scale LSTM

O(T·h²)

Recommended for T > 500

Transformer

Self-attention

O(T²·h)

Recommended for T < 500


See Also