Time Series Transformer
This model was released on 2022-12-01 and added to Hugging Face Transformers on 2022-09-30.
Time Series Transformer
Section titled “Time Series Transformer”
Overview
Section titled “Overview”The Time Series Transformer model is a vanilla encoder-decoder Transformer for time series forecasting. This model was contributed by kashif.
Usage tips
Section titled “Usage tips”- Similar to other models in the library,
TimeSeriesTransformerModelis the raw Transformer without any head on top, andTimeSeriesTransformerForPredictionadds a distribution head on top of the former, which can be used for time-series forecasting. Note that this is a so-called probabilistic forecasting model, not a point forecasting model. This means that the model learns a distribution, from which one can sample. The model doesn’t directly output values. TimeSeriesTransformerForPredictionconsists of 2 blocks: an encoder, which takes acontext_lengthof time series values as input (calledpast_values), and a decoder, which predicts aprediction_lengthof time series values into the future (calledfuture_values). During training, one needs to provide pairs of (past_valuesandfuture_values) to the model.- In addition to the raw (
past_valuesandfuture_values), one typically provides additional features to the model. These can be the following:past_time_features: temporal features which the model will add topast_values. These serve as “positional encodings” for the Transformer encoder. Examples are “day of the month”, “month of the year”, etc. as scalar values (and then stacked together as a vector). e.g. if a given time-series value was obtained on the 11th of August, then one could have [11, 8] as time feature vector (11 being “day of the month”, 8 being “month of the year”).future_time_features: temporal features which the model will add tofuture_values. These serve as “positional encodings” for the Transformer decoder. Examples are “day of the month”, “month of the year”, etc. as scalar values (and then stacked together as a vector). e.g. if a given time-series value was obtained on the 11th of August, then one could have [11, 8] as time feature vector (11 being “day of the month”, 8 being “month of the year”).static_categorical_features: categorical features which are static over time (i.e., have the same value for allpast_valuesandfuture_values). An example here is the store ID or region ID that identifies a given time-series. Note that these features need to be known for ALL data points (also those in the future).static_real_features: real-valued features which are static over time (i.e., have the same value for allpast_valuesandfuture_values). An example here is the image representation of the product for which you have the time-series values (like the ResNet embedding of a “shoe” picture, if your time-series is about the sales of shoes). Note that these features need to be known for ALL data points (also those in the future).
- The model is trained using “teacher-forcing”, similar to how a Transformer is trained for machine translation. This means that, during training, one shifts the
future_valuesone position to the right as input to the decoder, prepended by the last value ofpast_values. At each time step, the model needs to predict the next target. So the set-up of training is similar to a GPT model for language, except that there’s no notion ofdecoder_start_token_id(we just use the last value of the context as initial input for the decoder). - At inference time, we give the final value of the
past_valuesas input to the decoder. Next, we can sample from the model to make a prediction at the next time step, which is then fed to the decoder in order to make the next prediction (also called autoregressive generation).
Resources
Section titled “Resources”A list of official Hugging Face and community (indicated by 🌎) resources to help you get started. If you’re interested in submitting a resource to be included here, please feel free to open a Pull Request and we’ll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
- Check out the Time Series Transformer blog-post in HuggingFace blog: Probabilistic Time Series Forecasting with 🤗 Transformers
TimeSeriesTransformerConfig
Section titled “TimeSeriesTransformerConfig”[[autodoc]] TimeSeriesTransformerConfig
TimeSeriesTransformerModel
Section titled “TimeSeriesTransformerModel”[[autodoc]] TimeSeriesTransformerModel - forward
TimeSeriesTransformerForPrediction
Section titled “TimeSeriesTransformerForPrediction”[[autodoc]] TimeSeriesTransformerForPrediction - forward