Components Reference
This page summarizes the reusable neural-network components that power
BaseAttentive. All components are importable from
base_attentive.components.
from base_attentive.components import (
# Attention
CrossAttention, HierarchicalAttention, MemoryAugmentedAttention,
ExplainableAttention, TemporalAttentionLayer,
MultiResolutionAttentionFusion,
# Temporal
MultiScaleLSTM, DynamicTimeWindow,
# Gating / normalization
VariableSelectionNetwork, GatedResidualNetwork,
LearnedNormalization, StaticEnrichmentLayer,
# Transformer blocks
TransformerEncoderLayer, TransformerDecoderLayer,
TransformerEncoderBlock, TransformerDecoderBlock,
MultiDecoder,
# Forecast heads
PointForecastHead, QuantileHead, GaussianHead,
MixtureDensityHead, QuantileDistributionModeling, CombinedHeadLoss,
# Positional / embedding
PositionalEncoding, TSPositionalEncoding,
MultiModalEmbedding, Activation,
# Losses
AdaptiveQuantileLoss, MultiObjectiveLoss, CRPSLoss, AnomalyLoss,
QuantileLoss, MeanSquaredErrorLoss, HuberLoss, WeightedLoss,
# Layer utilities
Gate, LayerScale, ResidualAdd, SqueezeExcite1D, StochasticDepth,
)
Core Components
Variable Selection Network
Import: base_attentive.components.VariableSelectionNetwork
Signature: VariableSelectionNetwork(units, num_inputs)
Learns feature-wise gating weights and projects inputs into a more useful representation. Each feature receives a learnable importance weight.
where \(g(\mathbf{x})\) is a gating function and \(\odot\) is element-wise multiplication.
vsn = VariableSelectionNetwork(units=32, num_inputs=8)
output = vsn([feat_1, feat_2, ..., feat_8]) # list of input tensors
Gated Residual Network
Import: base_attentive.components.GatedResidualNetwork
Signature: GatedResidualNetwork(units, dropout_rate=0.0)
Applies a gated linear unit followed by a residual connection and layer normalization. The backbone of TFT-style gating.
grn = GatedResidualNetwork(units=32, dropout_rate=0.1)
output = grn(x) # without context
output = grn([x, context_vector]) # with optional context
Learned Normalization
Import: base_attentive.components.LearnedNormalization
Learnable layer normalization with per-channel scale and shift.
Static Enrichment Layer
Import: base_attentive.components.StaticEnrichmentLayer
Signature: StaticEnrichmentLayer(units)
Enriches temporal features with static context information.
enriched = StaticEnrichmentLayer(units=32)([temporal, static_context])
Multi-Scale LSTM
Import: base_attentive.components.MultiScaleLSTM
Signature: MultiScaleLSTM(lstm_units, scales=None, return_sequences=False)
Encodes temporal context across multiple resolutions. Each scale processes a sub-sampled sequence, then outputs are concatenated.
lstm = MultiScaleLSTM(lstm_units=64, scales=[1, 2, 4])
output = lstm(x) # x: (batch, time_steps, features)
Dynamic Time Window
Import: base_attentive.components.DynamicTimeWindow
Signature: DynamicTimeWindow(max_window_size)
Adaptively selects temporal context windows up to max_window_size steps.
dtw = DynamicTimeWindow(max_window_size=10)
output = dtw(x) # x: (batch, time_steps, features)
Attention Layers
Cross-Attention
Import: base_attentive.components.CrossAttention
Signature: CrossAttention(units, num_heads)
Fuses decoder queries with encoder memory using scaled dot-product attention.
ca = CrossAttention(units=32, num_heads=4)
output = ca([query, context])
Hierarchical Attention
Import: base_attentive.components.HierarchicalAttention
Signature: HierarchicalAttention(units, num_heads)
Multi-level attention for richer contextual relationships.
ha = HierarchicalAttention(units=32, num_heads=4)
output = ha(x)
Memory-Augmented Attention
Import: base_attentive.components.MemoryAugmentedAttention
Signature: MemoryAugmentedAttention(units, num_heads)
Maintains a persistent memory bank for long-range pattern retrieval.
maa = MemoryAugmentedAttention(units=32, num_heads=4)
output = maa(x)
Temporal Attention Layer
Import: base_attentive.components.TemporalAttentionLayer
Signature: TemporalAttentionLayer(units, num_heads)
General-purpose temporal multi-head attention.
tal = TemporalAttentionLayer(units=32, num_heads=4)
output = tal([x, x]) # self-attention
Explainable Attention
Import: base_attentive.components.ExplainableAttention
Signature: ExplainableAttention(units, num_heads)
Returns attention weights alongside the output for interpretability.
Multi-Resolution Attention Fusion
Import: base_attentive.components.MultiResolutionAttentionFusion
Signature: MultiResolutionAttentionFusion(units, num_heads)
Fuses representations from multiple temporal resolutions using attention.
mraf = MultiResolutionAttentionFusion(units=32, num_heads=4)
output = mraf([fine_features, coarse_features])
Transformer Blocks
TransformerEncoderLayer
Import: base_attentive.components.TransformerEncoderLayer
Signature: TransformerEncoderLayer(embed_dim, num_heads, ffn_dim, dropout_rate=0.1)
Single transformer encoder layer: self-attention + feed-forward + layer norm.
enc = TransformerEncoderLayer(embed_dim=32, num_heads=4, ffn_dim=64)
output = enc(x) # x: (batch, seq_len, embed_dim)
TransformerDecoderLayer
Import: base_attentive.components.TransformerDecoderLayer
Signature: TransformerDecoderLayer(embed_dim, num_heads, ffn_dim, dropout_rate=0.1)
Single transformer decoder layer: masked self-attention + cross-attention + feed-forward + layer norm.
dec = TransformerDecoderLayer(embed_dim=32, num_heads=4, ffn_dim=64)
output = dec([tgt, memory]) # tgt: (B, T_dec, D), memory: (B, T_enc, D)
TransformerEncoderBlock
Import: base_attentive.components.TransformerEncoderBlock
Signature: TransformerEncoderBlock(embed_dim, num_heads, ffn_dim, num_layers, dropout_rate=0.1)
Stack of TransformerEncoderLayer blocks.
TransformerDecoderBlock
Import: base_attentive.components.TransformerDecoderBlock
Signature: TransformerDecoderBlock(embed_dim, num_heads, ffn_dim, num_layers, dropout_rate=0.1)
Stack of TransformerDecoderLayer blocks.
Multi-Decoder
Import: base_attentive.components.MultiDecoder
Signature: MultiDecoder(output_dim, num_horizons)
Projects the latent representation into horizon-wise forecast outputs with one dense layer per horizon step.
md = MultiDecoder(output_dim=2, num_horizons=24)
output = md(h) # h: (batch, embed_dim)
Forecast Heads
PointForecastHead
Import: base_attentive.components.PointForecastHead
Signature: PointForecastHead(output_dim, forecast_horizon)
Single dense output head for point forecasting.
head = PointForecastHead(output_dim=2, forecast_horizon=24)
output = head(h) # output: (batch, 24, 2)
QuantileHead
Import: base_attentive.components.QuantileHead
Signature: QuantileHead(quantiles, output_dim, forecast_horizon)
Outputs a separate dense projection for each quantile level.
head = QuantileHead(quantiles=[0.1, 0.5, 0.9], output_dim=2, forecast_horizon=24)
output = head(h) # output: (batch, 24, 3, 2)
GaussianHead
Import: base_attentive.components.GaussianHead
Signature: GaussianHead(output_dim)
Outputs mean and log-variance parameters for a Gaussian predictive distribution.
MixtureDensityHead
Import: base_attentive.components.MixtureDensityHead
Signature: MixtureDensityHead(num_components, output_dim)
Outputs mixture weights, means, and variances for a Gaussian Mixture Model.
QuantileDistributionModeling
Import: base_attentive.components.QuantileDistributionModeling
Signature: QuantileDistributionModeling(quantiles, output_dim)
Full quantile distribution modeling head.
CombinedHeadLoss
Import: base_attentive.components.CombinedHeadLoss
Signature: CombinedHeadLoss(output_dim, forecast_horizon)
Combines forecast head with loss computation.
Positional Encoding and Embedding
PositionalEncoding
Import: base_attentive.components.PositionalEncoding
Signature: PositionalEncoding(max_length=2048)
Sinusoidal positional encoding (added to input embeddings).
pe = PositionalEncoding(max_length=200)
output = pe(x) # x: (batch, seq_len, embed_dim)
TSPositionalEncoding
Import: base_attentive.components.TSPositionalEncoding
Signature: TSPositionalEncoding(max_position, embed_dim)
Time-series-specific positional encoding with explicit position and embedding dimensions.
tspe = TSPositionalEncoding(max_position=100, embed_dim=32)
output = tspe(x)
MultiModalEmbedding
Import: base_attentive.components.MultiModalEmbedding
Signature: MultiModalEmbedding(embed_dim)
Embeds multi-modal inputs into a shared embedding space.
Activation
Import: base_attentive.components.Activation
Signature: Activation(activation)
Wrapped Keras activation layer supporting all standard activations:
'relu', 'elu', 'selu', 'sigmoid', 'tanh', 'linear',
'gelu', 'swish'.
Loss Functions
All loss classes extend keras.losses.Loss and are registered as
Keras-serializable.
Class |
Description |
|---|---|
|
Standard MSE loss |
|
Pinball / quantile loss for given quantile levels |
|
Huber (smooth L1) loss |
|
Wraps any loss with a scalar weight |
|
Adaptive quantile regression loss |
|
Combines multiple loss terms |
|
Continuous Ranked Probability Score |
|
Loss for anomaly-aware training |
from base_attentive.components import AdaptiveQuantileLoss
loss = AdaptiveQuantileLoss(quantiles=[0.1, 0.5, 0.9])
result = loss(y_true, y_pred)
Layer Utilities
Class / Function |
Description |
|---|---|
|
GLU-style gating layer |
|
Per-channel learnable scaling (ViT-style) |
|
Adds two tensors as a residual connection |
|
Squeeze-and-excite channel re-weighting for 1D sequences |
|
Stochastic depth / drop-path regularization |
|
Functional residual: |
|
Broadcast 2D tensor to 3D by expanding a time axis |
|
Expand dims until tensor reaches minimum rank |
|
Expand 2D |
|
Functional drop-path |
|
Create upper-triangular causal attention mask |
|
Combine two boolean masks ( |
|
Boolean padding mask from sequence lengths |
|
Apply a 2D mask to a 3D tensor |
|
Aggregate multi-scale 3D features |
|
Aggregate dynamic time window output |
Utility Functions (components.utils)
resolve_attn_levels(att_levels)Converts
att_levelsto a canonical list of attention type strings.from base_attentive.components.utils import resolve_attn_levels resolve_attn_levels(None) # ['cross', 'hierarchical', 'memory'] resolve_attn_levels("hier") # ['hierarchical'] resolve_attn_levels([1, 3]) # ['cross', 'memory']
configure_architecture(objective, use_vsn, attention_levels, architecture_config)Builds the final architecture config dict from layered inputs.
resolve_fusion_mode(fusion_mode)Returns
'integrated'or'disjoint'.
See Also
API Reference — Full API autodoc
Architecture Guide — Architecture overview
Usage — Usage patterns