=======================
Configuration Guide
=======================

Parameter Reference
====================

Required Parameters
-------------------

.. code-block:: python

   from base_attentive import BaseAttentive

   model = BaseAttentive(
       static_input_dim=4,      # Number of static features
       dynamic_input_dim=8,     # Number of dynamic features
       future_input_dim=6,      # Number of future features
       output_dim=2,            # Number of output variables
       forecast_horizon=24,     # Forecast horizon (steps)
   )

.. list-table:: Required Parameters
   :header-rows: 1
   :widths: 22 12 12 54

   * - Parameter
     - Type
     - Constraints
     - Description
   * - ``static_input_dim``
     - int
     - >= 0
     - Static feature dimension (0 = no static input)
   * - ``dynamic_input_dim``
     - int
     - >= 1
     - Dynamic (historical) feature dimension
   * - ``future_input_dim``
     - int
     - >= 0
     - Future covariate dimension (0 = no future input)
   * - ``output_dim``
     - int
     - >= 1
     - Number of output variables
   * - ``forecast_horizon``
     - int
     - >= 1
     - Forecast length in time steps

Architectural Parameters
------------------------

.. code-block:: python

   model = BaseAttentive(
       static_input_dim=4, dynamic_input_dim=8, future_input_dim=6,
       output_dim=2, forecast_horizon=24,
       embed_dim=32,
       hidden_units=64,
       lstm_units=64,
       attention_units=32,
       num_heads=4,
       num_encoder_layers=2,
       max_window_size=10,
       memory_size=100,
   )

.. list-table:: Architectural Parameters
   :header-rows: 1
   :widths: 22 10 10 15 43

   * - Parameter
     - Type
     - Default
     - Range
     - Description
   * - ``embed_dim``
     - int
     - 32
     - [8, 512]
     - Shared embedding dimension
   * - ``hidden_units``
     - int
     - 64
     - [16, 1024]
     - Dense hidden layer width
   * - ``lstm_units``
     - int
     - 64
     - [16, 1024]
     - LSTM hidden size (hybrid mode)
   * - ``attention_units``
     - int
     - 32
     - [16, 1024]
     - Attention projection dimension
   * - ``num_heads``
     - int
     - 4
     - [1, 16]
     - Multi-head attention heads (``embed_dim`` must be divisible by ``num_heads``)
   * - ``num_encoder_layers``
     - int
     - 2
     - [1, 12]
     - Stacked encoder layer count
   * - ``max_window_size``
     - int
     - 10
     - [1, oo)
     - Maximum dynamic time window size
   * - ``memory_size``
     - int
     - 100
     - [1, oo)
     - Memory bank size for memory-augmented attention

Temporal Aggregation Parameters
---------------------------------

.. code-block:: python

   model = BaseAttentive(
       ...,
       scales=[1, 2, 4],
       multi_scale_agg="last",
       final_agg="last",
   )

.. list-table:: Temporal Aggregation
   :header-rows: 1
   :widths: 22 12 12 54

   * - Parameter
     - Type
     - Default
     - Description
   * - ``scales``
     - list[int] / 'auto' / None
     - None
     - LSTM sub-sampling strides. ``None`` uses single scale ``[1]``. ``'auto'`` selects automatically.
   * - ``multi_scale_agg``
     - str
     - 'last'
     - Merge multi-scale outputs: ``'last'``, ``'average'``, ``'flatten'``, ``'sum'``, ``'concat'``
   * - ``final_agg``
     - str
     - 'last'
     - Final temporal aggregation: ``'last'``, ``'average'``, ``'flatten'``

Regularization Parameters
--------------------------

.. list-table:: Regularization
   :header-rows: 1
   :widths: 22 10 10 58

   * - Parameter
     - Type
     - Default
     - Description
   * - ``dropout_rate``
     - float
     - 0.1
     - Dropout probability [0, 1]
   * - ``activation``
     - str
     - 'relu'
     - ``'relu'``, ``'elu'``, ``'selu'``, ``'sigmoid'``, ``'tanh'``, ``'linear'``, ``'gelu'``, ``'swish'``
   * - ``use_batch_norm``
     - bool
     - False
     - Apply batch normalization
   * - ``use_residuals``
     - bool
     - True
     - Use residual connections

Feature Processing Parameters
-------------------------------

.. list-table:: Feature Processing
   :header-rows: 1
   :widths: 22 10 10 58

   * - Parameter
     - Type
     - Default
     - Description
   * - ``use_vsn``
     - bool
     - True
     - Enable Variable Selection Network
   * - ``vsn_units``
     - int or None
     - None
     - VSN projection size (defaults to ``embed_dim``)
   * - ``apply_dtw``
     - bool
     - True
     - Apply Dynamic Time Warping alignment

Configuration / Routing Parameters
-------------------------------------

.. list-table:: Configuration / Routing
   :header-rows: 1
   :widths: 22 12 12 54

   * - Parameter
     - Type
     - Default
     - Description
   * - ``objective``
     - str
     - 'hybrid'
     - Encoder type: ``'hybrid'`` or ``'transformer'``
   * - ``mode``
     - str or None
     - None
     - Mode shortcut: ``'tft'``, ``'tft_like'``, ``'pihal'``, ``'pihal_like'``, or ``None``
   * - ``attention_levels``
     - str / list / int / None
     - None
     - Decoder attention stack control (see below)
   * - ``quantiles``
     - list[float] or None
     - None
     - Enables probabilistic output
   * - ``architecture_config``
     - dict or None
     - None
     - Structural overrides (highest precedence)
   * - ``verbose``
     - int
     - 0
     - Logging verbosity

Architecture Configuration
==========================

Use ``architecture_config`` for structural choices:

.. code-block:: python

   config = {
       "encoder_type": "hybrid",
       "decoder_attention_stack": ["cross", "hierarchical", "memory"],
       "feature_processing": "vsn",
   }

   model = BaseAttentive(
       static_input_dim=4, dynamic_input_dim=8, future_input_dim=6,
       output_dim=2, forecast_horizon=24,
       architecture_config=config,
   )

Attention Level Shortcuts
--------------------------

.. code-block:: python

   model = BaseAttentive(..., attention_levels=None)              # all three
   model = BaseAttentive(..., attention_levels="cross")           # string
   model = BaseAttentive(..., attention_levels=["cross", "memory"]) # list
   model = BaseAttentive(..., attention_levels=1)  # 1=cross, 2=hier, 3=memory

V2 Schema Configuration
========================

For programmatic, backend-neutral construction use ``BaseAttentiveSpec``:

.. code-block:: python

   from base_attentive.config import BaseAttentiveSpec, BaseAttentiveComponentSpec

   spec = BaseAttentiveSpec(
       static_input_dim=4,
       dynamic_input_dim=8,
       future_input_dim=6,
       output_dim=1,
       forecast_horizon=24,
       embed_dim=32,
       hidden_units=64,
       attention_heads=4,
       layer_norm_epsilon=1e-6,
       dropout_rate=0.1,
       activation="relu",
       backend_name="tensorflow",
       head_type="point",
       quantiles=(),
       components=BaseAttentiveComponentSpec(
           sequence_pooling="pool.last",
       ),
   )

Configuration Presets
=====================

Minimal Configuration
---------------------

.. code-block:: python

   MINIMAL = dict(embed_dim=8, hidden_units=16, lstm_units=16,
                  attention_units=16, num_heads=1, dropout_rate=0.1)

   model = BaseAttentive(
       static_input_dim=4, dynamic_input_dim=8, future_input_dim=6,
       output_dim=2, forecast_horizon=24, **MINIMAL,
   )

Standard Configuration
----------------------

.. code-block:: python

   STANDARD = dict(embed_dim=32, hidden_units=64, lstm_units=64,
                   attention_units=32, num_heads=4, dropout_rate=0.1,
                   use_residuals=True)

Large Configuration
-------------------

.. code-block:: python

   LARGE = dict(embed_dim=128, hidden_units=256, lstm_units=256,
                attention_units=128, num_heads=8, dropout_rate=0.3,
                use_batch_norm=True, use_residuals=True)

Hybrid Preset
-------------

.. code-block:: python

   HYBRID = dict(
       objective="hybrid",
       scales=[1, 2, 4],
       multi_scale_agg="last",
       embed_dim=32,
       num_heads=4,
       dropout_rate=0.1,
   )

Transformer Preset
------------------

.. code-block:: python

   TRANSFORMER = dict(
       objective="transformer",
       num_encoder_layers=4,
       embed_dim=64,
       num_heads=8,
       dropout_rate=0.15,
   )

Tuning Guidelines
=================

For Longer Sequences (T > 500)
--------------------------------

.. code-block:: python

   model = BaseAttentive(
       ..., objective="hybrid", scales=[1, 2, 4],
       embed_dim=32, dropout_rate=0.15,
   )

For Complex Patterns
---------------------

.. code-block:: python

   model = BaseAttentive(
       ..., embed_dim=64, num_heads=8,
       use_batch_norm=True, use_residuals=True, dropout_rate=0.2,
   )

For Fast Inference
------------------

.. code-block:: python

   model = BaseAttentive(
       ..., objective="hybrid", embed_dim=16,
       hidden_units=32, dropout_rate=0.1,
       attention_levels="cross",
   )

For Probabilistic Forecasts
-----------------------------

.. code-block:: python

   model = BaseAttentive(
       ..., quantiles=[0.1, 0.5, 0.9],
       dropout_rate=0.2, use_residuals=True,
   )

Configuration Management
=========================

Get Configuration
------------------

.. code-block:: python

   config = model.get_config()
   print(config)
   # {'static_input_dim': 4, ..., 'scales': None, 'mode': None, ...}

Create from Configuration
--------------------------

.. code-block:: python

   new_model = BaseAttentive.from_config(model.get_config())

Reconfigure Model
-----------------

.. code-block:: python

   model2 = model.reconfigure({"encoder_type": "transformer"})

Common Mistakes
===============

Mismatched input dimensions
----------------------------

.. code-block:: python

   # Wrong
   model = BaseAttentive(static_input_dim=4, ...)
   static = np.random.randn(32, 5)   # 5 features but model expects 4

   # Correct
   static = np.random.randn(32, 4)

num_heads must divide embed_dim
---------------------------------

.. code-block:: python

   # Wrong — 32 / 6 is not integer
   model = BaseAttentive(..., embed_dim=32, num_heads=6)

   # Correct
   model = BaseAttentive(..., embed_dim=32, num_heads=4)

See Also
========

- :doc:`quick_start` — Quick start guide
- :doc:`architecture_guide` — Architecture details
- :doc:`api_reference` — Full API reference