跳转至

🌟 QUANT-SCHOLAR 🌟

Automatically Quantitative Finance Papers List

🚩 Updated on 2025.04.02

📜 Contents
  1. 📌 Machine Learning in Finance
  2. 📌 Deep Learning in Finance
  3. 📌 Reinforcement Learning in Finance
  4. 📌 Time Series Forecasting

📌 Machine Learning in Finance

📅 Publish Date 📖 Title 👨‍💻 Authors 🔗 PDF 💻 Code 💬 Comment 📜 Abstract
2025-03-31 Asymmetry in Distributions of Accumulated Gains and Losses in Stock Returns Hamed Farahani, R. A. Serota et.al. 2503.24241 16 pages, 17 figures, 3 tables
Abstract (click to expand)We study decades-long historic distributions of accumulated S\&P500 returns, from daily returns to those over several weeks. The time series of the returns emphasize major upheavals in the markets -- Black Monday, Tech Bubble, Financial Crisis and Covid Pandemic -- which are reflected in the tail ends of the distributions. De-trending the overall gain, we concentrate on comparing distributions of gains and losses. Specifically, we compare the tails of the distributions, which are believed to exhibit power-law behavior and possibly contain outliers. Towards this end we find confidence intervals of the linear fits of the tails of the complementary cumulative distribution functions on a log-log scale, as well as conduct a statistical U-test in order to detect outliers. We also study probability density functions of the full distributions of the returns with the emphasis on their asymmetry. The key empirical observations are that the mean of de-trended distributions increases near-linearly with the number of days of accumulation while the overall skew is negative -- consistent with the heavier tails of losses -- and depends little on the number of days of accumulation. At the same time the variance of the distributions exhibits near-perfect linear dependence on the number of days of accumulation, that is it remains constant if scaled to the latter. Finally, we discuss the theoretical framework for understanding accumulated returns. Our main conclusion is that the current state of theory, which predicts symmetric or near-symmetric distributions of returns cannot explain the aggregate of empirical results.
2025-03-31 A cost of capital approach to determining the LGD discount rate Janette Larney, Arno Botha, Gerrit Lodewicus Grobler et.al. 2503.23992 7374 words, 5 figures
Abstract (click to expand)Loss Given Default (LGD) is a key risk parameter in determining a bank's regulatory capital. During LGD-estimation, realised recovery cash flows are to be discounted at an appropriate rate. Regulatory guidance mandates that this rate should allow for the time value of money, as well as include a risk premium that reflects the "undiversifiable risk" within these recoveries. Having extensively reviewed earlier methods of determining this rate, we propose a new approach that is inspired by the cost of capital approach from the Solvency II regulatory regime. Our method involves estimating a market-consistent price for a portfolio of defaulted loans, from which an associated discount rate may be inferred. We apply this method to mortgage and personal loans data from a large South African bank. The results reveal the main drivers of the discount rate to be the mean and variance of these recoveries, as well as the bank's cost of capital in excess of the risk-free rate. Our method therefore produces a discount rate that reflects both the undiversifiable risk of recovery recoveries and the time value of money, thereby satisfying regulatory requirements. This work can subsequently enhance the LGD-component within the modelling of both regulatory and economic capital.
2025-03-14 Bridging Language Models and Financial Analysis Alejandro Lopez-Lira, Jihoon Kwon, Sangwoon Yoon et.al. 2503.22693 28 pages
Abstract (click to expand)The rapid advancements in Large Language Models (LLMs) have unlocked transformative possibilities in natural language processing, particularly within the financial sector. Financial data is often embedded in intricate relationships across textual content, numerical tables, and visual charts, posing challenges that traditional methods struggle to address effectively. However, the emergence of LLMs offers new pathways for processing and analyzing this multifaceted data with increased efficiency and insight. Despite the fast pace of innovation in LLM research, there remains a significant gap in their practical adoption within the finance industry, where cautious integration and long-term validation are prioritized. This disparity has led to a slower implementation of emerging LLM techniques, despite their immense potential in financial applications. As a result, many of the latest advancements in LLM technology remain underexplored or not fully utilized in this domain. This survey seeks to bridge this gap by providing a comprehensive overview of recent developments in LLM research and examining their applicability to the financial sector. Building on previous survey literature, we highlight several novel LLM methodologies, exploring their distinctive capabilities and their potential relevance to financial data analysis. By synthesizing insights from a broad range of studies, this paper aims to serve as a valuable resource for researchers and practitioners, offering direction on promising research avenues and outlining future opportunities for advancing LLM applications in finance.
2025-03-27 From Deep Learning to LLMs: A survey of AI in Quantitative Investment Bokai Cao, Saizhuo Wang, Xinyi Lin et.al. 2503.21422
Abstract (click to expand)Quantitative investment (quant) is an emerging, technology-driven approach in asset management, increasingy shaped by advancements in artificial intelligence. Recent advances in deep learning and large language models (LLMs) for quant finance have improved predictive modeling and enabled agent-based automation, suggesting a potential paradigm shift in this field. In this survey, taking alpha strategy as a representative example, we explore how AI contributes to the quantitative investment pipeline. We first examine the early stage of quant research, centered on human-crafted features and traditional statistical models with an established alpha pipeline. We then discuss the rise of deep learning, which enabled scalable modeling across the entire pipeline from data processing to order execution. Building on this, we highlight the emerging role of LLMs in extending AI beyond prediction, empowering autonomous agents to process unstructured data, generate alphas, and support self-iterative workflows.
2025-03-27 Dynamic Asset Pricing Theory for Life Contingent Risks Patrick Ling et.al. 2503.21256
Abstract (click to expand)Although the valuation of life contingent assets has been thoroughly investigated under the framework of mathematical statistics, little financial economics research pays attention to the pricing of these assets in a non-arbitrage, complete market. In this paper, we first revisit the Fundamental Theorem of Asset Pricing (FTAP) and the short proof of it. Then we point out that discounted asset price is a martingale only when dividends are zero under all random states of the world, using a simple proof based on pricing kernel. Next, we apply Fundamental Theorem of Asset Pricing (FTAP) to find valuation formula for life contingent assets including life insurance policies and life contingent annuities. Last but not least, we state the assumption of static portfolio in a dynamic economy, and clarify the FTAP that accommodates the valuation of a portfolio of life contingent policies.
2025-03-01 Ornstein-Uhlenbeck Process for Horse Race Betting: A Micro-Macro Analysis of Herding and Informed Bettors Tomoya Sugawara, Shintaro Mori et.al. 2503.16470 20 pages, 5 figures
Abstract (click to expand)We model the time evolution of single win odds in Japanese horse racing as a stochastic process, deriving an Ornstein--Uhlenbeck process by analyzing the probability dynamics of vote shares and the empirical time series of odds movements. Our framework incorporates two types of bettors: herders, who adjust their bets based on current odds, and fundamentalists, who wager based on a horse's true winning probability. Using data from 3450 Japan Racing Association races in 2008, we identify a microscopic probability rule governing individual bets and a mean-reverting macroscopic pattern in odds convergence. This structure parallels financial markets, where traders' decisions are influenced by market fluctuations, and the interplay between herding and fundamentalist strategies shapes price dynamics. These results highlight the broader applicability of our approach to non-equilibrium financial and betting markets, where mean-reverting dynamics emerge from simple behavioral interactions.
2025-03-19 HQNN-FSP: A Hybrid Classical-Quantum Neural Network for Regression-Based Financial Stock Market Prediction Prashant Kumar Choudhary, Nouhaila Innan, Muhammad Shafique et.al. 2503.15403 11 pages and 11 figures
Abstract (click to expand)Financial time-series forecasting remains a challenging task due to complex temporal dependencies and market fluctuations. This study explores the potential of hybrid quantum-classical approaches to assist in financial trend prediction by leveraging quantum resources for improved feature representation and learning. A custom Quantum Neural Network (QNN) regressor is introduced, designed with a novel ansatz tailored for financial applications. Two hybrid optimization strategies are proposed: (1) a sequential approach where classical recurrent models (RNN/LSTM) extract temporal dependencies before quantum processing, and (2) a joint learning framework that optimizes classical and quantum parameters simultaneously. Systematic evaluation using TimeSeriesSplit, k-fold cross-validation, and predictive error analysis highlights the ability of these hybrid models to integrate quantum computing into financial forecasting workflows. The findings demonstrate how quantum-assisted learning can contribute to financial modeling, offering insights into the practical role of quantum resources in time-series analysis.
2025-03-18 A Note on the Asymptotic Properties of the GLS Estimator in Multivariate Regression with Heteroskedastic and Autocorrelated Errors Koichiro Moriya, Akihiko Noda et.al. 2503.13950 10 pages, 2 tables
Abstract (click to expand)We study the asymptotic properties of the GLS estimator in multivariate regression with heteroskedastic and autocorrelated errors. We derive Wald statistics for linear restrictions and assess their performance. The statistics remains robust to heteroskedasticity and autocorrelation.
2025-03-06 Matrix H-theory approach to stock market fluctuations Luan M. T. de Moraes, Antônio M. S. Macedo, Raydonal Ospina et.al. 2503.08697 26 pages, 10 figures. Published on Physical Review E
Abstract (click to expand)We introduce matrix H theory, a framework for analyzing collective behavior arising from multivariate stochastic processes with hierarchical structure. The theory models the joint distribution of the multiple variables (the measured signal) as a compound of a large-scale multivariate distribution with the distribution of a slowly fluctuating background. The background is characterized by a hierarchical stochastic evolution of internal degrees of freedom, representing the correlations between stocks at different time scales. As in its univariate version, the matrix H-theory formalism also has two universality classes: Wishart and inverse Wishart, enabling a concise description of both the background and the signal probability distributions in terms of Meijer G-functions with matrix argument. Empirical analysis of daily returns of stocks within the S&P500 demonstrates the effectiveness of matrix H theory in describing fluctuations in stock markets. These findings contribute to a deeper understanding of multivariate hierarchical processes and offer potential for developing more informed portfolio strategies in financial markets.
2025-03-05 Multimodal Stock Price Prediction: A Case Study of the Russian Securities Market Kasymkhan Khubiev, Mikhail Semenov et.al. 2503.08696 NSCF-2024, PROGRAM SYSTEMS: THEORY AND APPLICATIONS
Abstract (click to expand)Classical asset price forecasting methods primarily rely on numerical data, such as price time series, trading volumes, limit order book data, and technical analysis indicators. However, the news flow plays a significant role in price formation, making the development of multimodal approaches that combine textual and numerical data for improved prediction accuracy highly relevant. This paper addresses the problem of forecasting financial asset prices using the multimodal approach that combines candlestick time series and textual news flow data. A unique dataset was collected for the study, which includes time series for 176 Russian stocks traded on the Moscow Exchange and 79,555 financial news articles in Russian. For processing textual data, pre-trained models RuBERT and Vikhr-Qwen2.5-0.5b-Instruct (a large language model) were used, while time series and vectorized text data were processed using an LSTM recurrent neural network. The experiments compared models based on a single modality (time series only) and two modalities, as well as various methods for aggregating text vector representations. Prediction quality was estimated using two key metrics: Accuracy (direction of price movement prediction: up or down) and Mean Absolute Percentage Error (MAPE), which measures the deviation of the predicted price from the true price. The experiments showed that incorporating textual modality reduced the MAPE value by 55%. The resulting multimodal dataset holds value for the further adaptation of language models in the financial sector. Future research directions include optimizing textual modality parameters, such as the time window, sentiment, and chronological order of news messages.
2025-03-02 Liquidity-adjusted Return and Volatility, and Autoregressive Models Qi Deng, Zhong-guo Zhou et.al. 2503.08693
Abstract (click to expand)We construct liquidity-adjusted return and volatility using purposely designed liquidity metrics (liquidity jump and liquidity diffusion) that incorporate additional liquidity information. Based on these measures, we introduce a liquidity-adjusted ARMA-GARCH framework to address the limitations of traditional ARMA-GARCH models, which are not effectively in modeling illiquid assets with high liquidity variability, such as cryptocurrencies. We demonstrate that the liquidity-adjusted model improves model fit for cryptocurrencies, with greater volatility sensitivity to past shocks and reduced volatility persistence of erratic past volatility. Our model is validated by the empirical evidence that the liquidity-adjusted mean-variance (LAMV) portfolios outperform the traditional mean-variance (TMV) portfolios.
2025-02-27 Detecting Crypto Pump-and-Dump Schemes: A Thresholding-Based Approach to Handling Market Noise Mahya Karbalaii et.al. 2503.08692
Abstract (click to expand)We propose a simple yet robust unsupervised model to detect pump-and-dump events on tokens listed on the Poloniex Exchange platform. By combining threshold-based criteria with exponentially weighted moving averages (EWMA) and volatility measures, our approach effectively distinguishes genuine anomalies from minor trading fluctuations, even for tokens with low liquidity and prolonged inactivity. These characteristics present a unique challenge, as standard anomaly-detection methods often over-flag negligible volume spikes. Our framework overcomes this issue by tailoring both price and volume thresholds to the specific trading patterns observed, resulting in a model that balances high true-positive detection with minimal noise.
2025-03-18 Large language models in finance : what is financial sentiment? Kemal Kirtac, Guido Germano et.al. 2503.03612 There are two different articles with the same content and different names (see arXiv:2412.19245)
Abstract (click to expand)Financial sentiment has become a crucial yet complex concept in finance, increasingly used in market forecasting and investment strategies. Despite its growing importance, there remains a need to define and understand what financial sentiment truly represents and how it can be effectively measured. We explore the nature of financial sentiment and investigate how large language models (LLMs) contribute to its estimation. We trace the evolution of sentiment measurement in finance, from market-based and lexicon-based methods to advanced natural language processing techniques. The emergence of LLMs has significantly enhanced sentiment analysis, providing deeper contextual understanding and greater accuracy in extracting sentiment from financial text. We examine how BERT-based models, such as RoBERTa and FinBERT, are optimized for structured sentiment classification, while GPT-based models, including GPT-4, OPT, and LLaMA, excel in financial text generation and real-time sentiment interpretation. A comparative analysis of bidirectional and autoregressive transformer architectures highlights their respective roles in investor sentiment analysis, algorithmic trading, and financial decision-making. By exploring what financial sentiment is and how it is estimated within LLMs, we provide insights into the growing role of AI-driven sentiment analysis in finance.
2025-03-04 VWAP Execution with Signature-Enhanced Transformers: A Multi-Asset Learning Approach Remi Genet et.al. 2503.02680 link
Abstract (click to expand)In this paper I propose a novel approach to Volume Weighted Average Price (VWAP) execution that addresses two key practical challenges: the need for asset-specific model training and the capture of complex temporal dependencies. Building upon my recent work in dynamic VWAP execution arXiv:2502.18177, I demonstrate that a single neural network trained across multiple assets can achieve performance comparable to or better than traditional asset-specific models. The proposed architecture combines a transformer-based design inspired by arXiv:2406.02486 with path signatures for capturing geometric features of price-volume trajectories, as in arXiv:2406.17890. The empirical analysis, conducted on hourly cryptocurrency trading data from 80 trading pairs, shows that the globally-fitted model with signature features (GFT-Sig) achieves superior performance in both absolute and quadratic VWAP loss metrics compared to asset-specific approaches. Notably, these improvements persist for out-of-sample assets, demonstrating the model's ability to generalize across different market conditions. The results suggest that combining global parameter sharing with signature-based feature extraction provides a scalable and robust approach to VWAP execution, offering significant practical advantages over traditional asset-specific implementations.
2025-03-04 Extrapolating the long-term seasonal component of electricity prices for forecasting in the day-ahead market Katarzyna Chęć, Bartosz Uniejewski, Rafał Weron et.al. 2503.02518
Abstract (click to expand)Recent studies provide evidence that decomposing the electricity price into the long-term seasonal component (LTSC) and the remaining part, predicting both separately, and then combining their forecasts can bring significant accuracy gains in day-ahead electricity price forecasting. However, not much attention has been paid to predicting the LTSC, and the last 24 hourly values of the estimated pattern are typically copied for the target day. To address this gap, we introduce a novel approach which extracts the trend-seasonal pattern from a price series extrapolated using price forecasts for the next 24 hours. We assess it using two 5-year long test periods from the German and Spanish power markets, covering the Covid-19 pandemic, the 2021/2022 energy crisis, and the war in Ukraine. Considering parsimonious autoregressive and LASSO-estimated models, we find that improvements in predictive accuracy range from 3\% to 15\% in terms of the root mean squared error and exceed 1\% in terms of profits from a realistic trading strategy involving day-ahead bidding and battery storage.
2025-03-01 Understanding the Commodity Futures Term Structure Through Signatures Hari P. Krishnan, Stephan Sturm et.al. 2503.00603 19 pages, 1 figure
Abstract (click to expand)Signature methods have been widely and effectively used as a tool for feature extraction in statistical learning methods, notably in mathematical finance. They lack, however, interpretability: in the general case, it is unclear why signatures actually work. The present article aims to address this issue directly, by introducing and developing the concept of signature perturbations. In particular, we construct a regular perturbation of the signature of the term structure of log prices for various commodities, in terms of the convenience yield. Our perturbation expansion and rigorous convergence estimates help explain the success of signature-based classification of commodities markets according to their term structure, with the volatility of the convenience yield as the major discriminant.
2025-03-04 Using quantile time series and historical simulation to forecast financial risk multiple steps ahead Richard Gerlach, Antonio Naimoli, Giuseppe Storti et.al. 2502.20978
Abstract (click to expand)A method for quantile-based, semi-parametric historical simulation estimation of multiple step ahead Value-at-Risk (VaR) and Expected Shortfall (ES) models is developed. It uses the quantile loss function, analogous to how the quasi-likelihood is employed by standard historical simulation methods. The returns data are scaled by the estimated quantile series, then resampling is employed to estimate the forecast distribution one and multiple steps ahead, allowing tail risk forecasting. The proposed method is applicable to any data or model where the relationship between VaR and ES does not change over time and can be extended to allow a measurement equation incorporating realized measures, thus including Realized GARCH and Realized CAViaR type models. Its finite sample properties, and its comparison with existing historical simulation methods, are evaluated via a simulation study. A forecasting study assesses the relative accuracy of the 1% and 2.5% VaR and ES one-day-ahead and ten-day-ahead forecasting results for the proposed class of models compared to several competitors.
2025-02-26 Corporate Fraud Detection in Rich-yet-Noisy Financial Graph Shiqi Wang, Zhibo Zhang, Libing Fang et.al. 2502.19305 link
Abstract (click to expand)Corporate fraud detection aims to automatically recognize companies that conduct wrongful activities such as fraudulent financial statements or illegal insider trading. Previous learning-based methods fail to effectively integrate rich interactions in the company network. To close this gap, we collect 18-year financial records in China to form three graph datasets with fraud labels. We analyze the characteristics of the financial graphs, highlighting two pronounced issues: (1) information overload: the dominance of (noisy) non-company nodes over company nodes hinders the message-passing process in Graph Convolution Networks (GCN); and (2) hidden fraud: there exists a large percentage of possible undetected violations in the collected data. The hidden fraud problem will introduce noisy labels in the training dataset and compromise fraud detection results. To handle such challenges, we propose a novel graph-based method, namely, Knowledge-enhanced GCN with Robust Two-stage Learning ( \({\rm KeGCN}_{R}\)), which leverages Knowledge Graph Embeddings to mitigate the information overload and effectively learns rich representations. The proposed model adopts a two-stage learning method to enhance robustness against hidden frauds. Extensive experimental results not only confirm the importance of interactions but also show the superiority of \({\rm KeGCN}_{R}\) over a number of strong baselines in terms of fraud detection effectiveness and robustness.
2025-02-25 Recurrent Neural Networks for Dynamic VWAP Execution: Adaptive Trading Strategies with Temporal Kolmogorov-Arnold Networks Remi Genet et.al. 2502.18177 link
Abstract (click to expand)The execution of Volume Weighted Average Price (VWAP) orders remains a critical challenge in modern financial markets, particularly as trading volumes and market complexity continue to increase. In my previous work arXiv:2502.13722, I introduced a novel deep learning approach that demonstrated significant improvements over traditional VWAP execution methods by directly optimizing the execution problem rather than relying on volume curve predictions. However, that model was static because it employed the fully linear approach described in arXiv:2410.21448, which is not designed for dynamic adjustment. This paper extends that foundation by developing a dynamic neural VWAP framework that adapts to evolving market conditions in real time. We introduce two key innovations: first, the integration of recurrent neural networks to capture complex temporal dependencies in market dynamics, and second, a sophisticated dynamic adjustment mechanism that continuously optimizes execution decisions based on market feedback. The empirical analysis, conducted across five major cryptocurrency markets, demonstrates that this dynamic approach achieves substantial improvements over both traditional methods and our previous static implementation, with execution performance gains of 10 to 15% in liquid markets and consistent outperformance across varying conditions. These results suggest that adaptive neural architectures can effectively address the challenges of modern VWAP execution while maintaining computational efficiency suitable for practical deployment.
2025-02-25 LLM Knows Geometry Better than Algebra: Numerical Understanding of LLM-Based Agents in A Trading Arena Tianmi Ma, Jiawei Du, Wenxin Huang et.al. 2502.17967 link
Abstract (click to expand)Recent advancements in large language models (LLMs) have significantly improved performance in natural language processing tasks. However, their ability to generalize to dynamic, unseen tasks, particularly in numerical reasoning, remains a challenge. Existing benchmarks mainly evaluate LLMs on problems with predefined optimal solutions, which may not align with real-world scenarios where clear answers are absent. To bridge this gap, we design the Agent Trading Arena, a virtual numerical game simulating complex economic systems through zero-sum games, where agents invest in stock portfolios. Our experiments reveal that LLMs, including GPT-4o, struggle with algebraic reasoning when dealing with plain-text stock data, often focusing on local details rather than global trends. In contrast, LLMs perform significantly better with geometric reasoning when presented with visual data, such as scatter plots or K-line charts, suggesting that visual representations enhance numerical reasoning. This capability is further improved by incorporating the reflection module, which aids in the analysis and interpretation of complex data. We validate our findings on NASDAQ Stock dataset, where LLMs demonstrate stronger reasoning with visual data compared to text. Our code and data are publicly available at https://github.com/wekjsdvnm/Agent-Trading-Arena.git.
2025-02-24 A data-driven econo-financial stress-testing framework to estimate the effect of supply chain networks on financial systemic risk Jan Fialkowski, Christian Diem, András Borsos et.al. 2502.17044 link
Abstract (click to expand)Supply chain disruptions constitute an often underestimated risk for financial stability. As in financial networks, systemic risks in production networks arises when the local failure of one firm impacts the production of others and might trigger cascading disruptions that affect significant parts of the economy. Here, we study how systemic risk in production networks translates into financial systemic risk through a mechanism where supply chain contagion leads to correlated bank-firm loan defaults. We propose a financial stress-testing framework for micro- and macro-prudential applications that features a national firm level supply chain network in combination with interbank network layers. The model is calibrated by using a unique data set including about 1 million firm-level supply links, practically all bank-firm loans, and all interbank loans in a small European economy. As a showcase we implement a real COVID-19 shock scenario on the firm level. This model allows us to study how the disruption dynamics in the real economy can lead to interbank solvency contagion dynamics. We estimate to what extent this amplifies financial systemic risk. We discuss the relative importance of these contagion channels and find an increase of interbank contagion by 70% when production network contagion is present. We then examine the financial systemic risk firms bring to banks and find an increase of up to 28% in the presence of the interbank contagion channel. This framework is the first financial systemic risk model to take agent-level dynamics of the production network and shocks of the real economy into account which opens a path for directly, and event-driven understanding of the dynamical interaction between the real economy and financial systems.
2025-02-22 Contrastive Similarity Learning for Market Forecasting: The ContraSim Framework Nicholas Vinden, Raeid Saqur, Zining Zhu et.al. 2502.16023 8 pages, 3 appendices
Abstract (click to expand)We introduce the Contrastive Similarity Space Embedding Algorithm (ContraSim), a novel framework for uncovering the global semantic relationships between daily financial headlines and market movements. ContraSim operates in two key stages: (I) Weighted Headline Augmentation, which generates augmented financial headlines along with a semantic fine-grained similarity score, and (II) Weighted Self-Supervised Contrastive Learning (WSSCL), an extended version of classical self-supervised contrastive learning that uses the similarity metric to create a refined weighted embedding space. This embedding space clusters semantically similar headlines together, facilitating deeper market insights. Empirical results demonstrate that integrating ContraSim features into financial forecasting tasks improves classification accuracy from WSJ headlines by 7%. Moreover, leveraging an information density analysis, we find that the similarity spaces constructed by ContraSim intrinsically cluster days with homogeneous market movement directions, indicating that ContraSim captures market dynamics independent of ground truth labels. Additionally, ContraSim enables the identification of historical news days that closely resemble the headlines of the current day, providing analysts with actionable insights to predict market trends by referencing analogous past events.
2025-02-21 Multi-Agent Stock Prediction Systems: Machine Learning Models, Simulations, and Real-Time Trading Strategies Daksh Dave, Gauransh Sawhney, Vikhyat Chauhan et.al. 2502.15853
Abstract (click to expand)This paper presents a comprehensive study on stock price prediction, leveragingadvanced machine learning (ML) and deep learning (DL) techniques to improve financial forecasting accuracy. The research evaluates the performance of various recurrent neural network (RNN) architectures, including Long Short-Term Memory (LSTM) networks, Gated Recurrent Units (GRU), and attention-based models. These models are assessed for their ability to capture complex temporal dependencies inherent in stock market data. Our findings show that attention-based models outperform other architectures, achieving the highest accuracy by capturing both short and long-term dependencies. This study contributes valuable insights into AI-driven financial forecasting, offering practical guidance for developing more accurate and efficient trading systems.
2025-02-20 Financial fraud detection system based on improved random forest and gradient boosting machine (GBM) Tianzuo Hu et.al. 2502.15822
Abstract (click to expand)This paper proposes a financial fraud detection system based on improved Random Forest (RF) and Gradient Boosting Machine (GBM). Specifically, the system introduces a novel model architecture called GBM-SSRF (Gradient Boosting Machine with Simplified and Strengthened Random Forest), which cleverly combines the powerful optimization capabilities of the gradient boosting machine (GBM) with improved randomization. The computational efficiency and feature extraction capabilities of the Simplified and Strengthened Random Forest (SSRF) forest significantly improve the performance of financial fraud detection. Although the traditional random forest model has good classification capabilities, it has high computational complexity when faced with large-scale data and has certain limitations in feature selection. As a commonly used ensemble learning method, the GBM model has significant advantages in optimizing performance and handling nonlinear problems. However, GBM takes a long time to train and is prone to overfitting problems when data samples are unbalanced. In response to these limitations, this paper optimizes the random forest based on the structure, reducing the computational complexity and improving the feature selection ability through the structural simplification and enhancement of the random forest. In addition, the optimized random forest is embedded into the GBM framework, and the model can maintain efficiency and stability with the help of GBM's gradient optimization capability. Experiments show that the GBM-SSRF model not only has good performance, but also has good robustness and generalization capabilities, providing an efficient and reliable solution for financial fraud detection.
2025-02-21 Network topology of the Euro Area interbank market Ilias Aarab, Thomas Gottron et.al. 2502.15611 This is the preprint version of the paper published in: Aarab, I., Gottron, T. (2024). Network Topology of the Euro Area Interbank Market. In: Mingione, M., Vichi, M., Zaccaria, G. (eds) High-quality and Timely Statistics. CESS 2022. Studies in Theoretical and Applied Statistics. Springer, Cham. https://doi.org/10.1007/978-3-031-63630-1_1
Abstract (click to expand)The rapidly increasing availability of large amounts of granular financial data, paired with the advances of big data related technologies induces the need of suitable analytics that can represent and extract meaningful information from such data. In this paper we propose a multi-layer network approach to distill the Euro Area (EA) banking system in different distinct layers. Each layer of the network represents a specific type of financial relationship between banks, based on various sources of EA granular data collections. The resulting multi-layer network allows one to describe, analyze and compare the topology and structure of EA banks from different perspectives, eventually yielding a more complete picture of the financial market. This granular information representation has the potential to enable researchers and practitioners to better apprehend financial system dynamics as well as to support financial policies to manage and monitor financial risk from a more holistic point of view.
2025-02-21 Clustered Network Connectedness: A New Measurement Framework with Application to Global Equity Markets Bastien Buchwalter, Francis X. Diebold, Kamil Yilmaz et.al. 2502.15458
Abstract (click to expand)Network connections, both across and within markets, are central in countless economic contexts. In recent decades, a large literature has developed and applied flexible methods for measuring network connectedness and its evolution, based on variance decompositions from vector autoregressions (VARs), as in Diebold and Yilmaz (2014). Those VARs are, however, typically identified using full orthogonalization (Sims, 1980), or no orthogonalization (Koop, Pesaran, and Potter, 1996; Pesaran and Shin, 1998), which, although useful, are special and extreme cases of a more general framework that we develop in this paper. In particular, we allow network nodes to be connected in "clusters", such as asset classes, industries, regions, etc., where shocks are orthogonal across clusters (Sims style orthogonalized identification) but correlated within clusters (Koop-Pesaran-Potter-Shin style generalized identification), so that the ordering of network nodes is relevant across clusters but irrelevant within clusters. After developing the clustered connectedness framework, we apply it in a detailed empirical exploration of sixteen country equity markets spanning three global regions.
2025-02-20 Modelling the term-structure of default risk under IFRS 9 within a multistate regression framework Arno Botha, Tanja Verster, Roland Breedt et.al. 2502.14479 33 pages, 8192 words, 12 figures
Abstract (click to expand)The lifetime behaviour of loans is notoriously difficult to model, which can compromise a bank's financial reserves against future losses, if modelled poorly. Therefore, we present a data-driven comparative study amongst three techniques in modelling a series of default risk estimates over the lifetime of each loan, i.e., its term-structure. The behaviour of loans can be described using a nonstationary and time-dependent semi-Markov model, though we model its elements using a multistate regression-based approach. As such, the transition probabilities are explicitly modelled as a function of a rich set of input variables, including macroeconomic and loan-level inputs. Our modelling techniques are deliberately chosen in ascending order of complexity: 1) a Markov chain; 2) beta regression; and 3) multinomial logistic regression. Using residential mortgage data, our results show that each successive model outperforms the previous, likely as a result of greater sophistication. This finding required devising a novel suite of simple model diagnostics, which can itself be reused in assessing sampling representativeness and the performance of other modelling techniques. These contributions surely advance the current practice within banking when conducting multistate modelling. Consequently, we believe that the estimation of loss reserves will be more timeous and accurate under IFRS 9.
2025-02-20 Causality Analysis of COVID-19 Induced Crashes in Stock and Commodity Markets: A Topological Perspective Buddha Nath Sharma, Anish Rai, SR Luwang et.al. 2502.14431
Abstract (click to expand)The paper presents a comprehensive causality analysis of the US stock and commodity markets during the COVID-19 crash. The dynamics of different sectors are also compared. We use Topological Data Analysis (TDA) on multidimensional time-series to identify crashes in stock and commodity markets. The Wasserstein Distance WD shows distinct spikes signaling the crash for both stock and commodity markets. We then compare the persistence diagrams of stock and commodity markets using the WD metric. A significant spike in the \(WD\) between stock and commodity markets is observed during the crisis, suggesting significant topological differences between the markets. Similar spikes are observed between the sectors of the US market as well. Spikes obtained may be due to either a difference in the magnitude of crashes in the two markets (or sectors), or from the temporal lag between the two markets suggesting information flow. We study the Granger-causality between stock and commodity markets and also between different sectors. The results show a bidirectional Granger-causality between commodity and stock during the crash period, demonstrating the greater interdependence of financial markets during the crash. However, the overall analysis shows that the causal direction is from stock to commodity. A pairwise Granger-causal analysis between US sectors is also conducted. There is a significant increase in the interdependence between the sectors during the crash period. TDA combined with Granger-causality effectively analyzes the interdependence and sensitivity of different markets and sectors.

(back to top)

📌 Deep Learning in Finance

📅 Publish Date 📖 Title 👨‍💻 Authors 🔗 PDF 💻 Code 💬 Comment 📜 Abstract
2025-03-31 Graph Neural Network-Based Predictive Modeling for Robotic Plaster Printing Diego Machain Rivera, Selen Ercan Jenny, Ping Hsun Tsai et.al. 2503.24130
Abstract (click to expand)This work proposes a Graph Neural Network (GNN) modeling approach to predict the resulting surface from a particle based fabrication process. The latter consists of spray-based printing of cementitious plaster on a wall and is facilitated with the use of a robotic arm. The predictions are computed using the robotic arm trajectory features, such as position, velocity and direction, as well as the printing process parameters. The proposed approach, based on a particle representation of the wall domain and the end effector, allows for the adoption of a graph-based solution. The GNN model consists of an encoder-processor-decoder architecture and is trained using data from laboratory tests, while the hyperparameters are optimized by means of a Bayesian scheme. The aim of this model is to act as a simulator of the printing process, and ultimately used for the generation of the robotic arm trajectory and the optimization of the printing parameters, towards the materialization of an autonomous plastering process. The performance of the proposed model is assessed in terms of the prediction error against unseen ground truth data, which shows its generality in varied scenarios, as well as in comparison with the performance of an existing benchmark model. The results demonstrate a significant improvement over the benchmark model, with notably better performance and enhanced error scaling across prediction steps.
2025-03-31 Organizations, teams, and job mobility: A social microdynamics approach Bryan Adams, Valentín Vergara Hidd, Daniel Stimpson et.al. 2503.24117
Abstract (click to expand)The internal structures of large organizations determine much of what occurs inside including the way in which tasks are performed, the workers that perform them, and the mobility of those workers within the organization. However, regarding this latter process, most of the theoretical and modeling approaches used to understand organizational worker mobility are highly stylized, using idealizations such as structureless organizations, indistinguishable workers, and a lack of social bonding of the workers. In this article, aided by a decade of precise, temporally resolved data of a large US government organization, we introduce a new model to describe organizations as composites of teams within which individuals perform specific tasks and where social connections develop. By tracking the personnel composition of organizational teams, we find that workers that change jobs are highly influenced by preferring to reunite with past co-workers. In this organization, 34\% of all moves lead to worker reunions, a percentage well-above expectation. We find that the greater the time workers spend together or the smaller the team they share both increase their likelihood to reunite, supporting the notion of increased familiarity and trust behind such reunions and the dominant role of social capital in the evolution of large organizations.
2025-03-31 Estimation of thermal properties and boundary heat transfer coefficient of the ground with a Bayesian technique Zhanat Karashbayeva, Julien Berger, Helcio R. B. Orlande et.al. 2503.24072
Abstract (click to expand)Urbanization is the key contributor for climate change. Increasing urbanization rate causes an urban heat island (UHI) effect, which strongly depends on the short- and long-wave radiation balance heat flux between the surfaces. In order to calculate accurately this heat flux, it is required to assess the surface temperature which depends on the knowledge of the thermal properties and the surface heat transfer coefficients in the heat transfer problem. The aim of this paper is to estimate the thermal properties of the ground and the time varying surface heat transfer coefficient by solving an inverse problem. The Dufort--Frankel scheme is applied for solving the unsteady heat transfer problem. For the inverse problem, a Markov chain Monte Carlo method is used to estimate the posterior probability density function of unknown parameters within the Bayesian framework of statistics, by applying the Metropolis-Hastings algorithm for random sample generation. Actual temperature measurements available at different ground depths were used for the solution of the inverse problem. Different time discretizations were examined for the transient heat transfer coefficient at the ground surface, which then involved different prior distributions. Results of different case studies show that the estimated values of the unknown parameters were in accordance with literature values. Moreover, with the present solution of the inverse problem the temperature residuals were smaller than those obtained by using literature values for the unknowns.
2025-03-31 Evaluating Variational Quantum Eigensolver and Quantum Dynamics Algorithms on the Advection-Diffusion Equation A. Barış Özgüler et.al. 2503.24045 7 pages, 2 figures
Abstract (click to expand)We investigate the potential of near-term quantum algorithms for solving partial differential equations (PDEs), focusing on a linear one-dimensional advection-diffusion equation as a test case. This study benchmarks a ground-state algorithm, Variational Quantum Eigensolver (VQE), against three leading quantum dynamics algorithms, Trotterization, Variational Quantum Imaginary Time Evolution (VarQTE), and Adaptive Variational Quantum Dynamics Simulation (AVQDS), applied to the same PDE on small quantum hardware. While Trotterization is fully quantum, VarQTE and AVQDS are variational algorithms that reduce circuit depth for noisy intermediate-scale quantum (NISQ) devices. However, hardware results from these dynamics methods show sizable errors due to noise and limited shot statistics. To establish a noise-free performance baseline, we implement the VQE-based solver on a noiseless statevector simulator. Our results show VQE can reach final-time infidelities as low as \({O}(10^{-9})\) with \(N=4\) qubits and moderate circuit depths, outperforming hardware-deployed dynamics methods that show infidelities \(\gtrsim 10^{-2}\) . By comparing noiseless VQE to shot-based and hardware-run algorithms, we assess their accuracy and resource demands, providing a baseline for future quantum PDE solvers. We conclude with a discussion of limitations and potential extensions to higher-dimensional, nonlinear PDEs relevant to engineering and finance.
2025-03-30 Exact Characterization of Aggregate Flexibility via Generalized Polymatroids Karan Mukhi, Georg Loho, Alessandro Abate et.al. 2503.23458
Abstract (click to expand)There is growing interest in utilizing the flexibility in populations of distributed energy resources (DER) to mitigate the intermittency and uncertainty of renewable generation and provide additional grid services. To enable this, aggregators must effectively represent the flexibility in the populations they control to the market or system operator. A key challenge is accurately computing the aggregate flexibility of a population, which can be formally expressed as the Minkowski sum of a collection of polytopes - a problem that is generally computationally intractable. However, the flexibility polytopes of many DERs exhibit structural symmetries that can be exploited for computational efficiency. To this end, we introduce generalized polymatroids - a family of polytope - into the flexibility aggregation literature. We demonstrate that individual flexibility sets belong to this family, enabling efficient computation of their Minkowski sum. For homogeneous populations of DERs we further derive simplifications that yield more succinct representations of aggregate flexibility. Additionally, we develop an efficient optimization framework over these sets and propose a vertex-based disaggregation method, to allocate aggregate flexibility among individual DERs. Finally, we validate the optimality and computational efficiency of our approach through comparisons with existing methods.
2025-03-30 AI Agents in Engineering Design: A Multi-Agent Framework for Aesthetic and Aerodynamic Car Design Mohamed Elrefaie, Janet Qian, Raina Wu et.al. 2503.23315
Abstract (click to expand)We introduce the concept of "Design Agents" for engineering applications, particularly focusing on the automotive design process, while emphasizing that our approach can be readily extended to other engineering and design domains. Our framework integrates AI-driven design agents into the traditional engineering workflow, demonstrating how these specialized computational agents interact seamlessly with engineers and designers to augment creativity, enhance efficiency, and significantly accelerate the overall design cycle. By automating and streamlining tasks traditionally performed manually, such as conceptual sketching, styling enhancements, 3D shape retrieval and generative modeling, computational fluid dynamics (CFD) meshing, and aerodynamic simulations, our approach reduces certain aspects of the conventional workflow from weeks and days down to minutes. These agents leverage state-of-the-art vision-language models (VLMs), large language models (LLMs), and geometric deep learning techniques, providing rapid iteration and comprehensive design exploration capabilities. We ground our methodology in industry-standard benchmarks, encompassing a wide variety of conventional automotive designs, and utilize high-fidelity aerodynamic simulations to ensure practical and applicable outcomes. Furthermore, we present design agents that can swiftly and accurately predict simulation outcomes, empowering engineers and designers to engage in more informed design optimization and exploration. This research underscores the transformative potential of integrating advanced generative AI techniques into complex engineering tasks, paving the way for broader adoption and innovation across multiple engineering disciplines.
2025-03-29 Ethereum Price Prediction Employing Large Language Models for Short-term and Few-shot Forecasting Eftychia Makri, Georgios Palaiokrassas, Sarah Bouraga et.al. 2503.23190
Abstract (click to expand)Cryptocurrencies have transformed financial markets with their innovative blockchain technology and volatile price movements, presenting both challenges and opportunities for predictive analytics. Ethereum, being one of the leading cryptocurrencies, has experienced significant market fluctuations, making its price prediction an attractive yet complex problem. This paper presents a comprehensive study on the effectiveness of Large Language Models (LLMs) in predicting Ethereum prices for short-term and few-shot forecasting scenarios. The main challenge in training models for time series analysis is the lack of data. We address this by leveraging a novel approach that adapts existing pre-trained LLMs on natural language or images from billions of tokens to the unique characteristics of Ethereum price time series data. Through thorough experimentation and comparison with traditional and contemporary models, our results demonstrate that selectively freezing certain layers of pre-trained LLMs achieves state-of-the-art performance in this domain. This approach consistently surpasses benchmarks across multiple metrics, including Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE), demonstrating its effectiveness and robustness. Our research not only contributes to the existing body of knowledge on LLMs but also provides practical insights in the cryptocurrency prediction domain. The adaptability of pre-trained LLMs to handle the nature of Ethereum prices suggests a promising direction for future research, potentially including the integration of sentiment analysis to further refine forecasting accuracy.
2025-04-01 HRET: A Self-Evolving LLM Evaluation Toolkit for Korean Hanwool Lee, Soo Yong Kim, Dasol Choi et.al. 2503.22968
Abstract (click to expand)Recent advancements in Korean large language models (LLMs) have spurred numerous benchmarks and evaluation methodologies, yet the lack of a standardized evaluation framework has led to inconsistent results and limited comparability. To address this, we introduce HRET Haerae Evaluation Toolkit, an open-source, self-evolving evaluation framework tailored specifically for Korean LLMs. HRET unifies diverse evaluation methods, including logit-based scoring, exact-match, language-inconsistency penalization, and LLM-as-a-Judge assessments. Its modular, registry-based architecture integrates major benchmarks (HAE-RAE Bench, KMMLU, KUDGE, HRM8K) and multiple inference backends (vLLM, HuggingFace, OpenAI-compatible endpoints). With automated pipelines for continuous evolution, HRET provides a robust foundation for reproducible, fair, and transparent Korean NLP research.
2025-03-28 Co-design of materials, structures and stimuli for magnetic soft robots with large deformation and dynamic contacts Liwei Wang et.al. 2503.22767
Abstract (click to expand)Magnetic soft robots embedded with hard magnetic particles enable untethered actuation via external magnetic fields, offering remote, rapid, and precise control, which is highly promising for biomedical applications. However, designing such systems is challenging due to the complex interplay of magneto-elastic dynamics, large deformation, solid contacts, time-varying stimuli, and posture-dependent loading. As a result, most existing research relies on heuristics and trial-and-error methods or focuses on the independent design of stimuli or structures under static conditions. We propose a topology optimization framework for magnetic soft robots that simultaneously designs structures, location-specific material magnetization and time-varying magnetic stimuli, accounting for large deformations, dynamic motion, and solid contacts. This is achieved by integrating generalized topology optimization with the magneto-elastic material point method, which supports GPU-accelerated parallel simulations and auto-differentiation for sensitivity analysis. We applied this framework to design magnetic robots for various tasks, including multi-task shape morphing and locomotion, in both 2D and 3D. The method autonomously generates optimized robotic systems to achieve target behaviors without requiring human intervention. Despite the nonlinear physics and large design space, it demonstrates exceptional efficiency, completing all cases within minutes. This proposed framework represents a significant step toward the automatic co-design of magnetic soft robots for applications such as metasurfaces, drug delivery, and minimally invasive procedures.
2025-03-28 A high order multigrid-preconditioned immersed interface solver for the Poisson equation with boundary and interface conditions James Gabbard, Andrea Paris, Wim M. van Rees et.al. 2503.22455
Abstract (click to expand)This work presents a multigrid preconditioned high order immersed finite difference solver to accurately and efficiently solve the Poisson equation on complex 2D and 3D domains. The solver employs a low order Shortley-Weller multigrid method to precondition a high order matrix-free Krylov subspace solver. The matrix-free approach enables full compatibility with high order IIM discretizations of boundary and interface conditions, as well as high order wavelet-adapted multiresolution grids. Through verification and analysis on 2D domains, we demonstrate the ability of the algorithm to provide high order accurate results to Laplace and Poisson problems with Dirichlet, Neumann, and/or interface jump boundary conditions, all effectively preconditioned using the multigrid method. We further show that the proposed method is able to efficiently solve high order discretizations of Laplace and Poisson problems on complex 3D domains using thousands of compute cores and on multiresolution grids. To our knowledge, this work presents the largest problem sizes tackled with high order immersed methods applied to elliptic partial differential equations, and the first high order results on 3D multiresolution adaptive grids. Together, this work paves the way for employing high order immersed methods to a variety of 3D partial differential equations with boundary or inter-face conditions, including linear and non-linear elasticity problems, the incompressible Navier-Stokes equations, and fluid-structure interactions.
2025-03-28 Numerical optimization of aviation decarbonization scenarios: balancing traffic and emissions with maturing energy carriers and aircraft technology Ian Costa-Alves, Nicolas Gourdain, François Gallard et.al. 2503.22435
Abstract (click to expand)Despite being considered a hard-to-abate sector, aviation's emissions will play an important role in long-term climate mitigation of transportation. The introduction of low-carbon energy carriers and the deployment of new aircraft in the current fleet are modeled as a technology-centered decarbonization policy, and supply constraints in targeted market segments are modeled as demand-side policy. Shared socioeconomic pathways (SSP) are used to estimate the trend traffic demand and limit the sectoral consumption of electricity and biomass. Mitigation scenarios are formulated as optimization problems and three applications are demonstrated: single-policy optimization, scenario-robust policy, and multiobjective policy trade-off. Overall, we find that the choice of energy carrier to embark is highly dependent on assumptions regarding aircraft technology and background energy system, and that aligning trend scenarios with the Paris Agreement market-targeted traffic constraints are required to align trend scenarios with the Paris Agreement. The usual burdens associated with nonlinear optimization with high-dimensional variables are dealt with by jointly using libraries for Multidisciplinary Optimization (GEMSEO) and Automatic Differentiation (JAX), which resulted in speedups of two orders of magnitude at the optimization level, while reducing associated implementation efforts.
2025-03-28 Inverse design of dual-band valley-Hall topological photonic crystals with arbitrary pseudospin states Yuki Sato, Shrinathan Esaki Muthu Pandara Kone, Junpei Oba et.al. 2503.22206 13 pages, 9 figures
Abstract (click to expand)Valley photonic crystals (VPCs) offer topological kink states that ensure robust, unidirectional, and backscattering-immune light propagation. The design of VPCs is typically based on analogies with condensed-matter topological insulators that exhibit the quantum valley Hall effect; trial-and-error approaches are often used to tailor the photonic band structures and their topological properties, which are characterized by the local Berry curvatures. In this paper, we present an inverse design framework based on frequency-domain analysis for VPCs with arbitrary pseudospin states. Specifically, we utilize the transverse spin angular momentum (TSAM) at the band edge to formulate the objective function for engineering the desired topological properties. Numerical experiments demonstrate that our proposed design approach can successfully produce photonic crystal waveguides exhibiting dual-band operation, enabling frequency-dependent light routing. Our pseudospin-engineering method thus provides a cost-effective alternative for designing topological photonic waveguides, offering novel functionalities.
2025-03-28 Convolutional optimization with convex kernel and power lift Zhipeng Lu et.al. 2503.22135
Abstract (click to expand)We focus on establishing the foundational paradigm of a novel optimization theory based on convolution with convex kernels. Our goal is to devise a morally deterministic model of locating the global optima of an arbitrary function, which is distinguished from most commonly used statistical models. Limited preliminary numerical results are provided to test the efficiency of some specific algorithms derived from our paradigm, which we hope to stimulate further practical interest.
2025-03-28 A production planning benchmark for real-world refinery-petrochemical complexes Wenli Du, Chuan Wang, Chen Fan et.al. 2503.22057
Abstract (click to expand)To achieve digital intelligence transformation and carbon neutrality, effective production planning is crucial for integrated refinery-petrochemical complexes. Modern refinery planning relies on advanced optimization techniques, whose development requires reproducible benchmark problems. However, existing benchmarks lack practical context or impose oversimplified assumptions, limiting their applicability to enterprise-wide optimization. To bridge the substantial gap between theoretical research and industrial applications, this paper introduces the first open-source, demand-driven benchmark for industrial-scale refinery-petrochemical complexes with transparent model formulations and comprehensive input parameters. The benchmark incorporates a novel port-stream hybrid superstructure for modular modeling and broad generalizability. Key secondary processing units are represented using the delta-base approach grounded in historical data. Three real-world cases have been constructed to encompass distinct scenario characteristics, respectively addressing (1) a stand-alone refinery without integer variables, (2) chemical site integration with inventory-related integer variables, and (3) multi-period planning. All model parameters are fully accessible. Additionally, this paper provides an analysis of computational performance, ablation experiments on delta-base modeling, and application scenarios for the proposed benchmark.
2025-03-27 Multimodal Data Integration for Sustainable Indoor Gardening: Tracking Anyplant with Time Series Foundation Model Seyed Hamidreza Nabaei, Zeyang Zheng, Dong Chen et.al. 2503.21932 Accepted at ASCE International Conference on Computing in Civil Engineering (i3ce)
Abstract (click to expand)Indoor gardening within sustainable buildings offers a transformative solution to urban food security and environmental sustainability. By 2030, urban farming, including Controlled Environment Agriculture (CEA) and vertical farming, is expected to grow at a compound annual growth rate (CAGR) of 13.2% from 2024 to 2030, according to market reports. This growth is fueled by advancements in Internet of Things (IoT) technologies, sustainable innovations such as smart growing systems, and the rising interest in green interior design. This paper presents a novel framework that integrates computer vision, machine learning (ML), and environmental sensing for the automated monitoring of plant health and growth. Unlike previous approaches, this framework combines RGB imagery, plant phenotyping data, and environmental factors such as temperature and humidity, to predict plant water stress in a controlled growth environment. The system utilizes high-resolution cameras to extract phenotypic features, such as RGB, plant area, height, and width while employing the Lag-Llama time series model to analyze and predict water stress. Experimental results demonstrate that integrating RGB, size ratios, and environmental data significantly enhances predictive accuracy, with the Fine-tuned model achieving the lowest errors (MSE = 0.420777, MAE = 0.595428) and reduced uncertainty. These findings highlight the potential of multimodal data and intelligent systems to automate plant care, optimize resource consumption, and align indoor gardening with sustainable building management practices, paving the way for resilient, green urban spaces.
2025-03-27 Data-Driven Nonlinear Model Reduction to Spectral Submanifolds via Oblique Projection Leonardo Bettini, Bálint Kaszás, Bernhard Zybach et.al. 2503.21895
Abstract (click to expand)The dynamics in a primary Spectral Submanifold (SSM) constructed over the slowest modes of a dynamical system provide an ideal reduced-order model for nearby trajectories. Modeling the dynamics of trajectories further away from the primary SSM, however, is difficult if the linear part of the system exhibits strong non-normal behavior. Such non-normality implies that simply projecting trajectories onto SSMs along directions normal to the slow linear modes will not pair those trajectories correctly with their reduced counterparts on the SSMs. In principle, a well-defined nonlinear projection along a stable invariant foliation exists and would exactly match the full dynamics to the SSM-reduced dynamics. This foliation, however, cannot realistically be constructed from practically feasible amounts and distributions of experimental data. Here we develop an oblique projection technique that is able to approximate this foliation efficiently, even from a single experimental trajectory of a significantly non-normal and nonlinear beam.
2025-03-27 CMADiff: Cross-Modal Aligned Diffusion for Controllable Protein Generation Changjian Zhou, Yuexi Qiu, Tongtong Ling et.al. 2503.21450
Abstract (click to expand)AI-assisted protein design has emerged as a critical tool for advancing biotechnology, as deep generative models have demonstrated their reliability in this domain. However, most existing models primarily utilize protein sequence or structural data for training, neglecting the physicochemical properties of proteins.Moreover, they are deficient to control the generation of proteins in intuitive conditions. To address these limitations,we propose CMADiff here, a novel framework that enables controllable protein generation by aligning the physicochemical properties of protein sequences with text-based descriptions through a latent diffusion process. Specifically, CMADiff employs a Conditional Variational Autoencoder (CVAE) to integrate physicochemical features as conditional input, forming a robust latent space that captures biological traits. In this latent space, we apply a conditional diffusion process, which is guided by BioAligner, a contrastive learning-based module that aligns text descriptions with protein features, enabling text-driven control over protein sequence generation. Validated by a series of evaluations including AlphaFold3, the experimental results indicate that CMADiff outperforms protein sequence generation benchmarks and holds strong potential for future applications. The implementation and code are available at https://github.com/HPC-NEAU/PhysChemDiff.
2025-03-27 Large Language Models for Traffic and Transportation Research: Methodologies, State of the Art, and Future Opportunities Yimo Yan, Yejia Liao, Guanhao Xu et.al. 2503.21330
Abstract (click to expand)The rapid rise of Large Language Models (LLMs) is transforming traffic and transportation research, with significant advancements emerging between the years 2023 and 2025 -- a period marked by the inception and swift growth of adopting and adapting LLMs for various traffic and transportation applications. However, despite these significant advancements, a systematic review and synthesis of the existing studies remain lacking. To address this gap, this paper provides a comprehensive review of the methodologies and applications of LLMs in traffic and transportation, highlighting their ability to process unstructured textual data to advance transportation research. We explore key applications, including autonomous driving, travel behavior prediction, and general transportation-related queries, alongside methodologies such as zero- or few-shot learning, prompt engineering, and fine-tuning. Our analysis identifies critical research gaps. From the methodological perspective, many research gaps can be addressed by integrating LLMs with existing tools and refining LLM architectures. From the application perspective, we identify numerous opportunities for LLMs to tackle a variety of traffic and transportation challenges, building upon existing research. By synthesizing these findings, this review not only clarifies the current state of LLM adoption and adaptation in traffic and transportation but also proposes future research directions, paving the way for smarter and more sustainable transportation systems.
2025-03-27 ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition Yujie Liu, Zonglin Yang, Tong Xie et.al. 2503.21248
Abstract (click to expand)Large language models (LLMs) have demonstrated potential in assisting scientific research, yet their ability to discover high-quality research hypotheses remains unexamined due to the lack of a dedicated benchmark. To address this gap, we introduce the first large-scale benchmark for evaluating LLMs with a near-sufficient set of sub-tasks of scientific discovery: inspiration retrieval, hypothesis composition, and hypothesis ranking. We develop an automated framework that extracts critical components - research questions, background surveys, inspirations, and hypotheses - from scientific papers across 12 disciplines, with expert validation confirming its accuracy. To prevent data contamination, we focus exclusively on papers published in 2024, ensuring minimal overlap with LLM pretraining data. Our evaluation reveals that LLMs perform well in retrieving inspirations, an out-of-distribution task, suggesting their ability to surface novel knowledge associations. This positions LLMs as "research hypothesis mines", capable of facilitating automated scientific discovery by generating innovative hypotheses at scale with minimal human intervention.
2025-03-27 GPU-Accelerated Charge-Equilibration for Shadow Molecular Dynamics in Python Mehmet Cagri Kaymak, Nicholas Lubbers, Christian F. A. Negre et.al. 2503.21176
Abstract (click to expand)With recent advancements in machine learning for interatomic potentials, Python has become the go-to programming language for exploring new ideas. While machine-learning potentials are often developed in Python-based frameworks, existing molecular dynamics software is predominantly written in lower-level languages. This disparity complicates the integration of machine learning potentials into these molecular dynamics libraries. Additionally, machine learning potentials typically focus on local features, often neglecting long-range electrostatics due to computational complexities. This is a key limitation as applications can require long-range electrostatics and even flexible charges to achieve the desired accuracy. Recent charge equilibration models can address these issues, but they require iterative solvers to assign relaxed flexible charges to the atoms. Conventional implementations also demand very tight convergence to achieve long-term stability, further increasing computational cost. In this work, we present a scalable Python implementation of a recently proposed shadow molecular dynamics scheme based on a charge equilibration model, which avoids the convergence problem while maintaining long-term energy stability and accuracy of observable properties. To deliver a functional and user-friendly Python-based library, we implemented an efficient neighbor list algorithm, Particle Mesh Ewald, and traditional Ewald summation techniques, leveraging the GPU-accelerated power of Triton and PyTorch. We integrated these approaches with the Python-based shadow molecular dynamics scheme, enabling fast charge equilibration for scalable machine learning potentials involving systems with hundreds of thousands of atoms.
2025-03-26 FinAudio: A Benchmark for Audio Large Language Models in Financial Applications Yupeng Cao, Haohang Li, Yangyang Yu et.al. 2503.20990
Abstract (click to expand)Audio Large Language Models (AudioLLMs) have received widespread attention and have significantly improved performance on audio tasks such as conversation, audio understanding, and automatic speech recognition (ASR). Despite these advancements, there is an absence of a benchmark for assessing AudioLLMs in financial scenarios, where audio data, such as earnings conference calls and CEO speeches, are crucial resources for financial analysis and investment decisions. In this paper, we introduce \textsc{FinAudio}, the first benchmark designed to evaluate the capacity of AudioLLMs in the financial domain. We first define three tasks based on the unique characteristics of the financial domain: 1) ASR for short financial audio, 2) ASR for long financial audio, and 3) summarization of long financial audio. Then, we curate two short and two long audio datasets, respectively, and develop a novel dataset for financial audio summarization, comprising the \textsc{FinAudio} benchmark. Then, we evaluate seven prevalent AudioLLMs on \textsc{FinAudio}. Our evaluation reveals the limitations of existing AudioLLMs in the financial domain and offers insights for improving AudioLLMs. All datasets and codes will be released.
2025-03-26 TransDiffSBDD: Causality-Aware Multi-Modal Structure-Based Drug Design Xiuyuan Hu, Guoqing Liu, Can Chen et.al. 2503.20913
Abstract (click to expand)Structure-based drug design (SBDD) is a critical task in drug discovery, requiring the generation of molecular information across two distinct modalities: discrete molecular graphs and continuous 3D coordinates. However, existing SBDD methods often overlook two key challenges: (1) the multi-modal nature of this task and (2) the causal relationship between these modalities, limiting their plausibility and performance. To address both challenges, we propose TransDiffSBDD, an integrated framework combining autoregressive transformers and diffusion models for SBDD. Specifically, the autoregressive transformer models discrete molecular information, while the diffusion model samples continuous distributions, effectively resolving the first challenge. To address the second challenge, we design a hybrid-modal sequence for protein-ligand complexes that explicitly respects the causality between modalities. Experiments on the CrossDocked2020 benchmark demonstrate that TransDiffSBDD outperforms existing baselines.
2025-03-26 Technical Note: Continuum Theory of Mixture for Three-phase Thermomechanical Model of Fiber-reinforced Aerogel Composites Pratyush Kumar Singh, Danial Faghihi et.al. 2503.20713
Abstract (click to expand)We present a thermodynamically consistent three-phase model for the coupled thermal transport and mechanical deformation of ceramic aerogel porous composite materials, which is formulated via continuum mixture theory. The composite comprises a solid silica skeleton, a gaseous fluid phase, and dispersed solid fibers. The thermal transport model incorporates the effects of meso- and macro-pore size variations due to the Knudsen effect, achieved by upscaling phonon transport relations to derive constitutive equations for the fluid thermal conductivity. The mechanical model captures solid-solid and solid-fluid interactions through momentum exchange between phases. A mixed finite element formulation is employed to solve the multiphase model, and numerical studies are conducted to analyze key features of the computational model.
2025-03-26 General Method for Conversion Between Multimode Network Parameters Alexander Zhuravlev, Juan D. Baena et.al. 2503.20298
Abstract (click to expand)Different types of network parameters have been used in electronics since long ago. The most typical network parameters, but not the only ones, are \(S\), \(T\), \(ABCD\), \(Z\), \(Y\) , and \(h\) that relate input and output signals in different ways. There exist practical formulas for conversion between them. Due to the development of powerful software tools that can deal efficiently and accurately with higher-order modes in each port, researchers need conversion rules between multimode network parameters. However, the usual way to get each conversion rule is just developing cumbersome algebraic manipulations which, at the end, are useful only for some specific conversion. Here, we propose a general algebraic method to obtain any conversion rule between different multimode network parameters. It is based on the assumption of a state vector space and each conversion rule between network parameters can be interpreted as a simple change of basis. This procedure explains any conversion between multimode network parameters under the same algebraic steps.
2025-03-26 Dynamic Learning and Productivity for Data Analysts: A Bayesian Hidden Markov Model Perspective Yue Yin et.al. 2503.20233 29 pages; a shorter 11-page version is accepted by HCI International (HCII) 2025;
Abstract (click to expand)Data analysts are essential in organizations, transforming raw data into insights that drive decision-making and strategy. This study explores how analysts' productivity evolves on a collaborative platform, focusing on two key learning activities: writing queries and viewing peer queries. While traditional research often assumes static models, where performance improves steadily with cumulative learning, such models fail to capture the dynamic nature of real-world learning. To address this, we propose a Hidden Markov Model (HMM) that tracks how analysts transition between distinct learning states based on their participation in these activities. Using an industry dataset with 2,001 analysts and 79,797 queries, this study identifies three learning states: novice, intermediate, and advanced. Productivity increases as analysts advance to higher states, reflecting the cumulative benefits of learning. Writing queries benefits analysts across all states, with the largest gains observed for novices. Viewing peer queries supports novices but may hinder analysts in higher states due to cognitive overload or inefficiencies. Transitions between states are also uneven, with progression from intermediate to advanced being particularly challenging. This study advances understanding of into dynamic learning behavior of knowledge worker and offers practical implications for designing systems, optimizing training, enabling personalized learning, and fostering effective knowledge sharing.
2025-03-26 Solving 2-D Helmholtz equation in the rectangular, circular, and elliptical domains using neural networks D. Veerababu, Prasanta K. Ghosh et.al. 2503.20222 59 pages
Abstract (click to expand)Physics-informed neural networks offered an alternate way to solve several differential equations that govern complicated physics. However, their success in predicting the acoustic field is limited by the vanishing-gradient problem that occurs when solving the Helmholtz equation. In this paper, a formulation is presented that addresses this difficulty. The problem of solving the two-dimensional Helmholtz equation with the prescribed boundary conditions is posed as an unconstrained optimization problem using trial solution method. According to this method, a trial neural network that satisfies the given boundary conditions prior to the training process is constructed using the technique of transfinite interpolation and the theory of R-functions. This ansatz is initially applied to the rectangular domain and later extended to the circular and elliptical domains. The acoustic field predicted from the proposed formulation is compared with that obtained from the two-dimensional finite element methods. Good agreement is observed in all three domains considered. Minor limitations associated with the proposed formulation and their remedies are also discussed.
2025-03-25 Lossy Compression of Scientific Data: Applications Constrains and Requirements Franck Cappello, Allison Baker, Ebru Bozda et.al. 2503.20031 33 pages
Abstract (click to expand)Increasing data volumes from scientific simulations and instruments (supercomputers, accelerators, telescopes) often exceed network, storage, and analysis capabilities. The scientific community's response to this challenge is scientific data reduction. Reduction can take many forms, such as triggering, sampling, filtering, quantization, and dimensionality reduction. This report focuses on a specific technique: lossy compression. Lossy compression retains all data points, leveraging correlations and controlled reduced accuracy. Quality constraints, especially for quantities of interest, are crucial for preserving scientific discoveries. User requirements also include compression ratio and speed. While many papers have been published on lossy compression techniques and reference datasets are shared by the community, there is a lack of detailed specifications of application needs that can guide lossy compression researchers and developers. This report fills this gap by reporting on the requirements and constraints of nine scientific applications covering a large spectrum of domains (climate, combustion, cosmology, fusion, light sources, molecular dynamics, quantum circuit simulation, seismology, and system logs). The report also details key lossy compression technologies (SZ, ZFP, MGARD, LC, SPERR, DCTZ, TEZip, LibPressio), discussing their history, principles, error control, hardware support, features, and impact. By presenting both application needs and compression technologies, the report aims to inspire new research to fill existing gaps.
2025-03-25 A comparative study of calibration techniques for finite strain elastoplasticity: Numerically-exact sensitivities for FEMU and VFM Sanjeev Kumar, D. Thomas Seidl, Brian N. Granzow et.al. 2503.19782 44 pages, 15 figures
Abstract (click to expand)Accurate identification of material parameters is crucial for predictive modeling in computational mechanics. The two primary approaches in the experimental mechanics' community for calibration from full-field digital image correlation data are known as finite element model updating (FEMU) and the virtual fields method (VFM). In VFM, the objective function is a squared mismatch between internal and external virtual work or power. In FEMU, the objective function quantifies the weighted mismatch between model predictions and corresponding experimentally measured quantities of interest. It is minimized by iteratively updating the parameters of an FE model. While FEMU is seen as more flexible, VFM is commonly used instead of FEMU due to its considerably greater computational expense. However, comparisons between the two methods usually involve approximations of gradients or sensitivities with finite difference schemes, thereby making direct assessments difficult. Hence, in this study, we rigorously compare VFM and FEMU in the context of numerically-exact sensitivities obtained through local sensitivity analyses and the application of automatic differentiation software. To this end, both methods are tested on a finite strain elastoplasticity model. We conduct a series of test cases to assess both methods' robustness under practical challenges.
2025-03-25 Decoupled Dynamics Framework with Neural Fields for 3D Spatio-temporal Prediction of Vehicle Collisions Sanghyuk Kim, Minsik Seo, Namwoo Kang et.al. 2503.19712 24 pages, 13 figures
Abstract (click to expand)This study proposes a neural framework that predicts 3D vehicle collision dynamics by independently modeling global rigid-body motion and local structural deformation. Unlike approaches directly predicting absolute displacement, this method explicitly separates the vehicle's overall translation and rotation from its structural deformation. Two specialized networks form the core of the framework: a quaternion-based Rigid Net for rigid motion and a coordinate-based Deformation Net for local deformation. By independently handling fundamentally distinct physical phenomena, the proposed architecture achieves accurate predictions without requiring separate supervision for each component. The model, trained on only 10% of available simulation data, significantly outperforms baseline models, including single multi-layer perceptron (MLP) and deep operator networks (DeepONet), with prediction errors reduced by up to 83%. Extensive validation demonstrates strong generalization to collision conditions outside the training range, accurately predicting responses even under severe impacts involving extreme velocities and large impact angles. Furthermore, the framework successfully reconstructs high-resolution deformation details from low-resolution inputs without increased computational effort. Consequently, the proposed approach provides an effective, computationally efficient method for rapid and reliable assessment of vehicle safety across complex collision scenarios, substantially reducing the required simulation data and time while preserving prediction fidelity.
2025-03-25 Characteristic boundary conditions for Hybridizable Discontinuous Galerkin methods Jan Ellmenreich, Matteo Giacomini, Antonio Huerta et.al. 2503.19684
Abstract (click to expand)In this work we introduce the concept of characteristic boundary conditions (CBCs) within the framework of Hybridizable Discontinuous Galerkin (HDG) methods, including both the Navier-Stokes characteristic boundary conditions (NSCBCs) and a novel approach to generalized characteristic relaxation boundary conditions (GRCBCs). CBCs are based on the characteristic decomposition of the compressible Euler equations and are designed to prevent the reflection of waves at the domain boundaries. We show the effectiveness of the proposed method for weakly compressible flows through a series of numerical experiments by comparing the results with common boundary conditions in the HDG setting and reference solutions available in the literature. In particular, HDG with CBCs show superior performance minimizing the reflection of vortices at artificial boundaries, for both inviscid and viscous flows.
2025-03-25 Estimation of the Acoustic Field in a Uniform Duct with Mean Flow using Neural Networks D. Veerababu, Prasanta K. Ghosh et.al. 2503.19412 23 pages
Abstract (click to expand)The study of sound propagation in a uniform duct having a mean flow has many applications, such as in the design of gas turbines, heating, ventilation and air conditioning ducts, automotive intake and exhaust systems, and in the modeling of speech. In this paper, the convective effects of the mean flow on the plane wave acoustic field inside a uniform duct were studied using artificial neural networks. The governing differential equation and the associated boundary conditions form a constrained optimization problem. It is converted to an unconstrained optimization problem and solved by approximating the acoustic field variable to a neural network. The complex-valued acoustic pressure and particle velocity were predicted at different frequencies, and validated against the analytical solution and the finite element models. The effect of the mean flow is studied in terms of the acoustic impedance. A closed-form expression that describes the influence of various factors on the acoustic field is derived.
2025-03-24 Multi-Physics Inverse Design of Varifocal Optical Devices using Data-Driven Surrogates and Differential Modeling Zeqing Jin, Zhaocheng Liu, Nagi Elabbasi et.al. 2503.18911 15 pages, 4 figures
Abstract (click to expand)Designing a new varifocal architecture in AR glasses poses significant challenges due to the complex interplay of multiple physics disciplines, including innovated piezo-electric material, solid mechanics, electrostatics, and optics. Traditional design methods, which treat each physics separately, are insufficient for this problem as they fail to establish the intricate relationships among design parameters in such a large and sensitive space, leading to suboptimal solutions. To address this challenge, we propose a novel design pipeline, mPhDBBs (multi-Physics Differential Building Blocks), that integrates these diverse physics through a graph neural network-based surrogate model and a differentiable ray tracing model. A hybrid optimization method combining evolutionary and gradient approaches is employed to efficiently determine superior design variables that achieve desired optical objectives, such as focal length and focusing quality. Our results demonstrate the effectiveness of mPhDBBs, achieving high accuracy with minimal training data and computational resources, resulting in a speedup of at least 1000 times compared to non-gradient-based methods. This work offers a promising paradigm shift in product design, enabling rapid and accurate optimization of complex multi-physics systems, and demonstrates its adaptability to other inverse design problems.
2025-03-24 Differentiable Simulator for Electrically Reconfigurable Electromagnetic Structures Johannes Müller, Dennis Philipp, Matthias Günther et.al. 2503.18479
Abstract (click to expand)This paper introduces a novel CUDA-enabled PyTorch-based framework designed for the gradient-based optimization of such reconfigurable electromagnetic structures with electrically tunable parameters. Traditional optimization techniques for these structures often rely on non-gradient-based methods, limiting efficiency and flexibility. Our framework leverages automatic differentiation, facilitating the application of gradient-based optimization methods. This approach is particularly advantageous for embedding within deep learning frameworks, enabling sophisticated optimization strategies. We demonstrate the framework's effectiveness through comprehensive simulations involving resonant structures with tunable parameters. Key contributions include the efficient solution of the inverse problem. The framework's performance is validated using three different resonant structures: a single-loop copper wire (Unit-Cell) as well as an 8x1 and an 8x8 array of resonant unit cells with multiple inductively coupled unit cells (1d and 2d Metasurfaces). Results show precise in-silico control over the magnetic field's component normal to the surface of each resonant structure, achieving desired field strengths with minimal error. The proposed framework is compatible with existing simulation software. This PyTorch-based framework sets the stage for advanced electromagnetic control strategies for resonant structures with application in e.g. MRI, providing a robust platform for further exploration and innovation in the design and optimization of resonant electromagnetic structures.
2025-03-24 DeepFund: Will LLM be Professional at Fund Investment? A Live Arena Perspective Changlun Li, Yao Shi, Yuyu Luo et.al. 2503.18313 Work in progress
Abstract (click to expand)Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, but their effectiveness in financial decision making, particularly in fund investment, remains inadequately evaluated. Current benchmarks primarily assess LLMs understanding of financial documents rather than their ability to manage assets or analyze trading opportunities in dynamic market conditions. A critical limitation in existing evaluation methodologies is the backtesting approach, which suffers from information leakage when LLMs are evaluated on historical data they may have encountered during pretraining. This paper introduces DeepFund, a comprehensive platform for evaluating LLM based trading strategies in a simulated live environment. Our approach implements a multi agent framework where LLMs serve as both analysts and managers, creating a realistic simulation of investment decision making. The platform employs a forward testing methodology that mitigates information leakage by evaluating models on market data released after their training cutoff dates. We provide a web interface that visualizes model performance across different market conditions and investment parameters, enabling detailed comparative analysis. Through DeepFund, we aim to provide a more accurate and fair assessment of LLMs capabilities in fund investment, offering insights into their potential real world applications in financial markets.
2025-03-23 The Power of Small LLMs in Geometry Generation for Physical Simulations Ossama Shafiq, Bahman Ghiassi, Alessio Alexiadis et.al. 2503.18178 24 pages, 17 figures
Abstract (click to expand)Engineers widely rely on simulation platforms like COMSOL or ANSYS to model and optimise processes. However, setting up such simulations requires expertise in defining geometry, generating meshes, establishing boundary conditions, and configuring solvers. This research aims to simplify this process by enabling engineers to describe their setup in plain language, allowing a Large Language Model (LLM) to generate the necessary input files for their specific application. This novel approach allows establishing a direct link between natural language and complex engineering tasks. Building on previous work that evaluated various LLMs for generating input files across simple and complex geometries, this study demonstrates that small LLMs - specifically, Phi-3 Mini and Qwen-2.5 1.5B - can be fine-tuned to generate precise engineering geometries in GMSH format. Through Low-Rank Adaptation (LoRA), we curated a dataset of 480 instruction-output pairs encompassing simple shapes (squares, rectangles, circles, and half circles) and more complex structures (I-beams, cylindrical pipes, and bent pipes). The fine-tuned models produced high-fidelity outputs, handling routine geometry generation with minimal intervention. While challenges remain with geometries involving combinations of multiple bodies, this study demonstrates that fine-tuned small models can outperform larger models like GPT-4o in specialised tasks, offering a precise and resource-efficient alternative for engineering applications.
2025-03-23 Strategic Prompt Pricing for AIGC Services: A User-Centric Approach Xiang Li, Bing Luo, Jianwei Huang et.al. 2503.18168 accepted in WiOpt 2025
Abstract (click to expand)The rapid growth of AI-generated content (AIGC) services has created an urgent need for effective prompt pricing strategies, yet current approaches overlook users' strategic two-step decision-making process in selecting and utilizing generative AI models. This oversight creates two key technical challenges: quantifying the relationship between user prompt capabilities and generation outcomes, and optimizing platform payoff while accounting for heterogeneous user behaviors. We address these challenges by introducing prompt ambiguity, a theoretical framework that captures users' varying abilities in prompt engineering, and developing an Optimal Prompt Pricing (OPP) algorithm. Our analysis reveals a counterintuitive insight: users with higher prompt ambiguity (i.e., lower capability) exhibit non-monotonic prompt usage patterns, first increasing then decreasing with ambiguity levels, reflecting complex changes in marginal utility. Experimental evaluation using a character-level GPT-like model demonstrates that our OPP algorithm achieves up to 31.72% improvement in platform payoff compared to existing pricing mechanisms, validating the importance of user-centric prompt pricing in AIGC services.
2025-03-23 (G)I-DLE: Generative Inference via Distribution-preserving Logit Exclusion with KL Divergence Minimization for Constrained Decoding Hanwool Lee et.al. 2503.18050 preprint
Abstract (click to expand)We propose (G)I-DLE, a new approach to constrained decoding that leverages KL divergence minimization to preserve the intrinsic conditional probability distribution of autoregressive language models while excluding undesirable tokens. Unlike conventional methods that naively set banned tokens' logits to \(-\infty\) , which can distort the conversion from raw logits to posterior probabilities and increase output variance, (G)I-DLE re-normalizes the allowed token probabilities to minimize such distortion. We validate our method on the K2-Eval dataset, specifically designed to assess Korean language fluency, logical reasoning, and cultural appropriateness. Experimental results on Qwen2.5 models (ranging from 1.5B to 14B) demonstrate that G-IDLE not only boosts mean evaluation scores but also substantially reduces the variance of output quality.
2025-03-23 Financial Wind Tunnel: A Retrieval-Augmented Market Simulator Bokai Cao, Xueyuan Lin, Yiyan Qi et.al. 2503.17909
Abstract (click to expand)Market simulator tries to create high-quality synthetic financial data that mimics real-world market dynamics, which is crucial for model development and robust assessment. Despite continuous advancements in simulation methodologies, market fluctuations vary in terms of scale and sources, but existing frameworks often excel in only specific tasks. To address this challenge, we propose Financial Wind Tunnel (FWT), a retrieval-augmented market simulator designed to generate controllable, reasonable, and adaptable market dynamics for model testing. FWT offers a more comprehensive and systematic generative capability across different data frequencies. By leveraging a retrieval method to discover cross-sectional information as the augmented condition, our diffusion-based simulator seamlessly integrates both macro- and micro-level market patterns. Furthermore, our framework allows the simulation to be controlled with wide applicability, including causal generation through "what-if" prompts or unprecedented cross-market trend synthesis. Additionally, we develop an automated optimizer for downstream quantitative models, using stress testing of simulated scenarios via FWT to enhance returns while controlling risks. Experimental results demonstrate that our approach enables the generalizable and reliable market simulation, significantly improve the performance and adaptability of downstream models, particularly in highly complex and volatile market conditions. Our code and data sample is available at https://anonymous.4open.science/r/fwt_-E852
2025-03-22 Accelerating and enhancing thermodynamic simulations of electrochemical interfaces Xiaochen Du, Mengren Liu, Jiayu Peng et.al. 2503.17870 link 19 pages main text, 5 figures, supplementary information (SI) in ancillary files
Abstract (click to expand)Electrochemical interfaces are crucial in catalysis, energy storage, and corrosion, where their stability and reactivity depend on complex interactions between the electrode, adsorbates, and electrolyte. Predicting stable surface structures remains challenging, as traditional surface Pourbaix diagrams tend to either rely on expert knowledge or costly \(\textit{ab initio}\) sampling, and neglect thermodynamic equilibration with the environment. Machine learning (ML) potentials can accelerate static modeling but often overlook dynamic surface transformations. Here, we extend the Virtual Surface Site Relaxation-Monte Carlo (VSSR-MC) method to autonomously sample surface reconstructions modeled under aqueous electrochemical conditions. Through fine-tuning foundational ML force fields, we accurately and efficiently predict surface energetics, recovering known Pt(111) phases and revealing new LaMnO\(_\mathrm{3}\) (001) surface reconstructions. By explicitly accounting for bulk-electrolyte equilibria, our framework enhances electrochemical stability predictions, offering a scalable approach to understanding and designing materials for electrochemical applications.
2025-03-22 Generalized Scattering Matrix Synthesis for Hybrid Systems with Multiple Scatterers and Antennas Using Independent Structure Simulations Chenbo Shi, Shichen Liang, Jin Pan et.al. 2503.17616
Abstract (click to expand)This paper presents a unified formulation for calculating the generalized scattering matrix (GS-matrix) of hybrid systems involving multiple scatterers and antennas. The GS-matrix of the entire system is synthesized through the scattering matrices and GS-matrices of each independent component, using the addition theorem of vector spherical wavefunctions and fully matrix-based operations. Since our formulation is applicable to general antenna-scatterer hybrid systems, previous formulas for multiple scattering and antenna arrays become special cases of our approach. This also establishes our formulation as a universal domain decomposition method for analyzing the electromagnetic performance of hybrid systems. We provide numerous numerical examples to comprehensively demonstrate the capabilities and compatibility of the proposed formulation, including its potential application in studying the effects of structural rotation.
2025-03-21 Adjoint Sensitivities for the Optimization of Nonlinear Structural Dynamics via Spectral Submanifolds Matteo Pozzi, Jacopo Marconi, Shobhit Jain et.al. 2503.17431
Abstract (click to expand)This work presents an optimization framework for tailoring the nonlinear dynamic response of lightly damped mechanical systems using Spectral Submanifold (SSM) reduction. We derive the SSM-based backbone curve and its sensitivity with respect to parameters up to arbitrary polynomial orders, enabling efficient and accurate optimization of the nonlinear frequency-amplitude relation. We use the adjoint method to derive sensitivity expressions, which drastically reduces the computational cost compared to direct differentiation as the number of parameters increases. An important feature of this framework is the automatic adjustment of the expansion order of SSM-based ROMs using user-defined error tolerances during the optimization process. We demonstrate the effectiveness of the approach in optimizing the nonlinear response over several numerical examples of mechanical systems. Hence, the proposed framework extends the applicability of SSM-based optimization methods to practical engineering problems, offering a robust tool for the design and optimization of nonlinear mechanical structures.
2025-03-20 Accelerated Medicines Development using a Digital Formulator and a Self-Driving Tableting DataFactory Faisal Abbas, Mohammad Salehian, Peter Hou et.al. 2503.17411 link
Abstract (click to expand)Pharmaceutical tablet formulation and process development, traditionally a complex and multi-dimensional decision-making process, necessitates extensive experimentation and resources, often resulting in suboptimal solutions. This study presents an integrated platform for tablet formulation and manufacturing, built around a Digital Formulator and a Self-Driving Tableting DataFactory. By combining predictive modelling, optimisation algorithms, and automation, this system offers a material-to-product approach to predict and optimise critical quality attributes for different formulations, linking raw material attributes to key blend and tablet properties, such as flowability, porosity, and tensile strength. The platform leverages the Digital Formulator, an in-silico optimisation framework that employs a hybrid system of models - melding data-driven and mechanistic models - to identify optimal formulation settings for manufacturability. Optimised formulations then proceed through the self-driving Tableting DataFactory, which includes automated powder dosing, tablet compression and performance testing, followed by iterative refinement of process parameters through Bayesian optimisation methods. This approach accelerates the timeline from material characterisation to development of an in-specification tablet within 6 hours, utilising less than 5 grams of API, and manufacturing small batch sizes of up to 1,440 tablets with augmented and mixed reality enabled real-time quality control within 24 hours. Validation across multiple APIs and drug loadings underscores the platform's capacity to reliably meet target quality attributes, positioning it as a transformative solution for accelerated and resource-efficient pharmaceutical development.
2025-03-21 ML-Based Bidding Price Prediction for Pay-As-Bid Ancillary Services Markets: A Use Case in the German Control Reserve Market Vincent Bezold, Lukas Baur, Alexander Sauer et.al. 2503.17214
Abstract (click to expand)The increasing integration of renewable energy sources has led to greater volatility and unpredictability in electricity generation, posing challenges to grid stability. Ancillary service markets, such as the German control reserve market, allow industrial consumers and producers to offer flexibility in their power consumption or generation, contributing to grid stability while earning additional income. However, many participants use simple bidding strategies that may not maximize their revenues. This paper presents a methodology for forecasting bidding prices in pay-as-bid ancillary service markets, focusing on the German control reserve market. We evaluate various machine learning models, including Support Vector Regression, Decision Trees, and k-Nearest Neighbors, and compare their performance against benchmark models. To address the asymmetry in the revenue function of pay-as-bid markets, we introduce an offset adjustment technique that enhances the practical applicability of the forecasting models. Our analysis demonstrates that the proposed approach improves potential revenues by 27.43 % to 37.31 % compared to baseline models. When analyzing the relationship between the model forecasting errors and the revenue, a negative correlation is measured for three markets; according to the results, a reduction of 1 EUR/MW model price forecasting error (MAE) statistically leads to a yearly revenue increase between 483 EUR/MW and 3,631 EUR/MW. The proposed methodology enables industrial participants to optimize their bidding strategies, leading to increased earnings and contributing to the efficiency and stability of the electrical grid.
2025-03-21 A Comprehensive Framework for Predictive Computational Modeling of Growth and Remodeling in Tissue-Engineered Cardiovascular Implants Mahmoud Sesa, Hagen Holthusen, Christian Böhm et.al. 2503.17151 Preprint submitted to Springer Nature
Abstract (click to expand)Developing clinically viable tissue-engineered cardiovascular implants remains a formidable challenge. Achieving reliable and durable outcomes requires a deeper understanding of the fundamental mechanisms driving tissue evolution during in vitro maturation. Although considerable progress has been made in modeling soft tissue growth and remodeling, studies focused on the early stages of tissue engineering remain limited. Here, we present a general, thermodynamically consistent model to predict tissue evolution and mechanical response throughout maturation. The formulation utilizes a stress-driven homeostatic surface to capture volumetric growth, coupled with an energy-based approach to describe collagen densification via the strain energy of the fibers. We further employ a co-rotated intermediate configuration to ensure the model's consistency and generality. The framework is demonstrated with two numerical examples: a uniaxially constrained tissue strip validated against experimental data, and a biaxially constrained specimen subjected to a perturbation load. These results highlight the potential of the proposed model to advance the design and optimization of tissue-engineered implants with clinically relevant performance.
2025-03-26 Assessing Consistency and Reproducibility in the Outputs of Large Language Models: Evidence Across Diverse Finance and Accounting Tasks Julian Junyan Wang, Victor Xiaoqi Wang et.al. 2503.16974 97 pages, 20 tables, 15 figures
Abstract (click to expand)This study provides the first comprehensive assessment of consistency and reproducibility in Large Language Model (LLM) outputs in finance and accounting research. We evaluate how consistently LLMs produce outputs given identical inputs through extensive experimentation with 50 independent runs across five common tasks: classification, sentiment analysis, summarization, text generation, and prediction. Using three OpenAI models (GPT-3.5-turbo, GPT-4o-mini, and GPT-4o), we generate over 3.4 million outputs from diverse financial source texts and data, covering MD&As, FOMC statements, finance news articles, earnings call transcripts, and financial statements. Our findings reveal substantial but task-dependent consistency, with binary classification and sentiment analysis achieving near-perfect reproducibility, while complex tasks show greater variability. More advanced models do not consistently demonstrate better consistency and reproducibility, with task-specific patterns emerging. LLMs significantly outperform expert human annotators in consistency and maintain high agreement even where human experts significantly disagree. We further find that simple aggregation strategies across 3-5 runs dramatically improve consistency. We also find that aggregation may come with an additional benefit of improved accuracy for sentiment analysis when using newer models. Simulation analysis reveals that despite measurable inconsistency in LLM outputs, downstream statistical inferences remain remarkably robust. These findings address concerns about what we term "G-hacking," the selective reporting of favorable outcomes from multiple Generative AI runs, by demonstrating that such risks are relatively low for finance and accounting tasks.
2025-03-19 Reliable Radiologic Skeletal Muscle Area Assessment -- A Biomarker for Cancer Cachexia Diagnosis Sabeen Ahmed, Nathan Parker, Margaret Park et.al. 2503.16556 47 pages, 19 figures, 9 Tables
Abstract (click to expand)Cancer cachexia is a common metabolic disorder characterized by severe muscle atrophy which is associated with poor prognosis and quality of life. Monitoring skeletal muscle area (SMA) longitudinally through computed tomography (CT) scans, an imaging modality routinely acquired in cancer care, is an effective way to identify and track this condition. However, existing tools often lack full automation and exhibit inconsistent accuracy, limiting their potential for integration into clinical workflows. To address these challenges, we developed SMAART-AI (Skeletal Muscle Assessment-Automated and Reliable Tool-based on AI), an end-to-end automated pipeline powered by deep learning models (nnU-Net 2D) trained on mid-third lumbar level CT images with 5-fold cross-validation, ensuring generalizability and robustness. SMAART-AI incorporates an uncertainty-based mechanism to flag high-error SMA predictions for expert review, enhancing reliability. We combined the SMA, skeletal muscle index, BMI, and clinical data to train a multi-layer perceptron (MLP) model designed to predict cachexia at the time of cancer diagnosis. Tested on the gastroesophageal cancer dataset, SMAART-AI achieved a Dice score of 97.80% +/- 0.93%, with SMA estimated across all four datasets in this study at a median absolute error of 2.48% compared to manual annotations with SliceOmatic. Uncertainty metrics-variance, entropy, and coefficient of variation-strongly correlated with SMA prediction errors (0.83, 0.76, and 0.73 respectively). The MLP model predicts cachexia with 79% precision, providing clinicians with a reliable tool for early diagnosis and intervention. By combining automation, accuracy, and uncertainty awareness, SMAART-AI bridges the gap between research and clinical application, offering a transformative approach to managing cancer cachexia.
2025-03-20 Deep Feynman-Kac Methods for High-dimensional Semilinear Parabolic Equations: Revisit Xiaotao Zheng, Xingye Yue, Jiyang Shi et.al. 2503.16407
Abstract (click to expand)Deep Feynman-Kac method was first introduced to solve parabolic partial differential equations(PDE) by Beck et al. (SISC, V.43, 2021), named Deep Splitting method since they trained the Neural Networks step by step in the time direction. In this paper, we propose a new training approach with two different features. Firstly, neural networks are trained at all time steps globally, instead of step by step. Secondly, the training data are generated in a new way, in which the method is consistent with a direct Monte Carlo scheme when dealing with a linear parabolic PDE. Numerical examples show that our method has significant improvement both in efficiency and accuracy.
2025-03-20 Filters reveal emergent structure in computational morphogenesis Hazhir Aliahmadi, Aidan Sheedy, Greg van Anders et.al. 2503.16211 17 pages, 9 figures
Abstract (click to expand)Revolutionary advances in both manufacturing and computational morphogenesis raise critical questions about design sensitivity. Sensitivity questions are especially critical in contexts, such as topology optimization, that yield structures with emergent morphology. However, analyzing emergent structures via conventional, perturbative techniques can mask larger-scale vulnerabilities that could manifest in essential components. Risks that fail to appear in perturbative sensitivity analyses will only continue to proliferate as topology optimization-driven manufacturing penetrates more deeply into engineering design and consumer products. Here, we introduce Laplace-transform based computational filters that supplement computational morphogenesis with a set of nonperturbative sensitivity analyses. We demonstrate how this approach identifies important elements of a structure even in the absence of knowledge of the ultimate, optimal structure itself. We leverage techniques from molecular dynamics and implement these methods in open-source codes, demonstrating their application to compliance minimization problems in both 2D and 3D. Our implementation extends straightforwardly to topology optimization for other problems and benefits from the strong scaling properties observed in conventional molecular simulation.
2025-03-20 Sustainable Open-Data Management for Field Research: A Cloud-Based Approach in the Underlandscape Project Augusto Ciuffoletti, Letizia Chiti et.al. 2503.16042 8 pages, 4 figures
Abstract (click to expand)Field-based research projects require a robust suite of ICT services to support data acquisition, documentation, storage, and dissemination. A key challenge lies in ensuring the sustainability of data management - not only during the project's funded period but also beyond its conclusion, when maintenance and support often depend on voluntary efforts. In the Underlandscape project, we tackled this challenge by extensively leveraging public cloud services while minimizing reliance on complex custom infrastructure. This paper provides a comprehensive overview of the project's final infrastructure, detailing the adopted data formats, the cloud-based solutions enabling data management, and the custom applications developed for system integration.
2025-03-20 Practical Portfolio Optimization with Metaheuristics:Pre-assignment Constraint and Margin Trading Hang Kin Poon et.al. 2503.15965
Abstract (click to expand)Portfolio optimization is a critical area in finance, aiming to maximize returns while minimizing risk. Metaheuristic algorithms were shown to solve complex optimization problems efficiently, with Genetic Algorithms and Particle Swarm Optimization being among the most popular methods. This paper introduces an innovative approach to portfolio optimization that incorporates pre-assignment to limit the search space for investor preferences and better results. Additionally, taking margin trading strategies in account and using a rare performance ratio to evaluate portfolio efficiency. Through an illustrative example, this paper demonstrates that the metaheuristic-based methodology yields superior risk-adjusted returns compared to traditional benchmarks. The results highlight the potential of metaheuristics with help of assets filtering in enhancing portfolio performance in terms of risk adjusted return.
2025-03-20 WeirdFlows: Anomaly Detection in Financial Transaction Flows Arthur Capozzi, Salvatore Vilella, Dario Moncalvo et.al. 2503.15896 12 pages, 6 figures, ITADATA2024
Abstract (click to expand)In recent years, the digitization and automation of anti-financial crime (AFC) investigative processes have faced significant challenges, particularly the need for interpretability of AI model results and the lack of labeled data for training. Network analysis has emerged as a valuable approach in this context. In this paper, we present WeirdFlows, a top-down search pipeline for detecting potentially fraudulent transactions and non-compliant agents. In a transaction network, fraud attempts are often based on complex transaction patterns that change over time to avoid detection. The WeirdFlows pipeline requires neither an a priori set of patterns nor a training set. In addition, by providing elements to explain the anomalies found, it facilitates and supports the work of an AFC analyst. We evaluate WeirdFlows on a dataset from Intesa Sanpaolo (ISP) bank, comprising 80 million cross-country transactions over 15 months, benchmarking our implementation of the algorithm. The results, corroborated by ISP AFC experts, highlight its effectiveness in identifying suspicious transactions and actors, particularly in the context of the economic sanctions imposed in the EU after February 2022. This demonstrates \textit{WeirdFlows}' capability to handle large datasets, detect complex transaction patterns, and provide the necessary interpretability for formal AFC investigations.
2025-03-19 Impact of pH and chloride content on the biodegradation of magnesium alloys for medical implants: An in vitro and phase-field study S. Kovacevic, W. Ali, T. K. Mandal et.al. 2503.15700
Abstract (click to expand)The individual contributions of pH and chloride concentration to the corrosion kinetics of bioabsorbable magnesium (Mg) alloys remain unresolved despite their significant roles as driving factors in Mg corrosion. This study demonstrates and quantifies hitherto unknown separate effects of pH and chloride content on the corrosion of Mg alloys pertinent to biomedical implant applications. The experimental setup designed for this purpose enables the quantification of the dependence of corrosion on pH and chloride concentration. The in vitro tests conclusively demonstrate that variations in chloride concentration, relevant to biomedical applications, have a negligible effect on corrosion kinetics. The findings identify pH as a critical factor in the corrosion of bioabsorbable Mg alloys. A variationally consistent phase-field model is developed for assessing the degradation of Mg alloys in biological fluids. The model accurately predicts the corrosion performance of Mg alloys observed during the experiments, including their dependence on pH and chloride concentration. The capability of the framework to account for mechano-chemical effects during corrosion is demonstrated in practical orthopaedic applications considering bioabsorbable Mg alloy implants for bone fracture fixation and porous scaffolds for bone tissue engineering. The strategy has the potential to assess the in vitro and in vivo service life of bioabsorbable Mg-based biomedical devices.
2025-03-19 Shap-MeD Nicolás Laverde, Melissa Robles, Johan Rodríguez et.al. 2503.15562
Abstract (click to expand)We present Shap-MeD, a text-to-3D object generative model specialized in the biomedical domain. The objective of this study is to develop an assistant that facilitates the 3D modeling of medical objects, thereby reducing development time. 3D modeling in medicine has various applications, including surgical procedure simulation and planning, the design of personalized prosthetic implants, medical education, the creation of anatomical models, and the development of research prototypes. To achieve this, we leverage Shap-e, an open-source text-to-3D generative model developed by OpenAI, and fine-tune it using a dataset of biomedical objects. Our model achieved a mean squared error (MSE) of 0.089 in latent generation on the evaluation set, compared to Shap-e's MSE of 0.147. Additionally, we conducted a qualitative evaluation, comparing our model with others in the generation of biomedical objects. Our results indicate that Shap-MeD demonstrates higher structural accuracy in biomedical object generation.
2025-03-19 Design for Sensing and Digitalisation (DSD): A Modern Approach to Engineering Design Daniel N. Wilke et.al. 2503.14851 4 pages, conference, SACAM 2025
Abstract (click to expand)This paper introduces Design for Sensing and Digitalisation (DSD), a new engineering design paradigm that integrates sensor technology for digitisation and digitalisation from the earliest stages of the design process. Unlike traditional methodologies that treat sensing as an afterthought, DSD emphasises sensor integration, signal path optimisation, and real-time data utilisation as core design principles. The paper outlines DSD's key principles, discusses its role in enabling digital twin technology, and argues for its importance in modern engineering education. By adopting DSD, engineers can create more intelligent and adaptable systems that leverage real-time data for continuous design iteration, operational optimisation and data-driven predictive maintenance.
2025-03-18 Teaching Artificial Intelligence to Perform Rapid, Resolution-Invariant Grain Growth Modeling via Fourier Neural Operator Iman Peivaste, Ahmed Makradi, Salim Belouettar et.al. 2503.14568 link
Abstract (click to expand)Microstructural evolution, particularly grain growth, plays a critical role in shaping the physical, optical, and electronic properties of materials. Traditional phase-field modeling accurately simulates these phenomena but is computationally intensive, especially for large systems and fine spatial resolutions. While machine learning approaches have been employed to accelerate simulations, they often struggle with resolution dependence and generalization across different grain scales. This study introduces a novel approach utilizing Fourier Neural Operator (FNO) to achieve resolution-invariant modeling of microstructure evolution in multi-grain systems. FNO operates in the Fourier space and can inherently handle varying resolutions by learning mappings between function spaces. By integrating FNO with the phase field method, we developed a surrogate model that significantly reduces computational costs while maintaining high accuracy across different spatial scales. We generated a comprehensive dataset from phase-field simulations using the Fan Chen model, capturing grain evolution over time. Data preparation involved creating input-output pairs with a time shift, allowing the model to predict future microstructures based on current and past states. The FNO-based neural network was trained using sequences of microstructures and demonstrated remarkable accuracy in predicting long-term evolution, even for unseen configurations and higher-resolution grids not encountered during training.
2025-03-17 AI-Driven Rapid Identification of Bacterial and Fungal Pathogens in Blood Smears of Septic Patients Agnieszka Sroka-Oleksiak, Adam Pardyl, Dawid Rymarczyk et.al. 2503.14542
Abstract (click to expand)Sepsis is a life-threatening condition which requires rapid diagnosis and treatment. Traditional microbiological methods are time-consuming and expensive. In response to these challenges, deep learning algorithms were developed to identify 14 bacteria species and 3 yeast-like fungi from microscopic images of Gram-stained smears of positive blood samples from sepsis patients. A total of 16,637 Gram-stained microscopic images were used in the study. The analysis used the Cellpose 3 model for segmentation and Attention-based Deep Multiple Instance Learning for classification. Our model achieved an accuracy of 77.15% for bacteria and 71.39% for fungi, with ROC AUC of 0.97 and 0.88, respectively. The highest values, reaching up to 96.2%, were obtained for Cutibacterium acnes, Enterococcus faecium, Stenotrophomonas maltophilia and Nakaseomyces glabratus. Classification difficulties were observed in closely related species, such as Staphylococcus hominis and Staphylococcus haemolyticus, due to morphological similarity, and within Candida albicans due to high morphotic diversity. The study confirms the potential of our model for microbial classification, but it also indicates the need for further optimisation and expansion of the training data set. In the future, this technology could support microbial diagnosis, reducing diagnostic time and improving the effectiveness of sepsis treatment due to its simplicity and accessibility. Part of the results presented in this publication was covered by a patent application at the European Patent Office EP24461637.1 "A computer implemented method for identifying a microorganism in a blood and a data processing system therefor".
2025-03-18 Tensor-decomposition-based A Priori Surrogate (TAPS) modeling for ultra large-scale simulations Jiachen Guo, Gino Domel, Chanwook Park et.al. 2503.13933
Abstract (click to expand)A data-free, predictive scientific AI model, Tensor-decomposition-based A Priori Surrogate (TAPS), is proposed for tackling ultra large-scale engineering simulations with significant speedup, memory savings, and storage gain. TAPS can effectively obtain surrogate models for high-dimensional parametric problems with equivalent zetta-scale ( \(10^{21}\)) degrees of freedom (DoFs). TAPS achieves this by directly obtaining reduced-order models through solving governing equations with multiple independent variables such as spatial coordinates, parameters, and time. The paper first introduces an AI-enhanced finite element-type interpolation function called convolution hierarchical deep-learning neural network (C-HiDeNN) with tensor decomposition (TD). Subsequently, the generalized space-parameter-time Galerkin weak form and the corresponding matrix form are derived. Through the choice of TAPS hyperparameters, an arbitrary convergence rate can be achieved. To show the capabilities of this framework, TAPS is then used to simulate a large-scale additive manufacturing process as an example and achieves around 1,370x speedup, 14.8x memory savings, and 955x storage gain compared to the finite difference method with \(3.46\) billion spatial degrees of freedom (DoFs). As a result, the TAPS framework opens a new avenue for many challenging ultra large-scale engineering problems, such as additive manufacturing and integrated circuit design, among others.
2025-03-17 Quantum Dynamics Simulation of the Advection-Diffusion Equation Hirad Alipanah, Feng Zhang, Yongxin Yao et.al. 2503.13729
Abstract (click to expand)The advection-diffusion equation is simulated on a superconducting quantum computer via several quantum algorithms. Three formulations are considered: (1) Trotterization, (2) variational quantum time evolution (VarQTE), and (3) adaptive variational quantum dynamics simulation (AVQDS). These schemes were originally developed for the Hamiltonian simulation of many-body quantum systems. The finite-difference discretized operator of the transport equation is formulated as a Hamiltonian and solved without the need for ancillary qubits. Computations are conducted on a quantum simulator (IBM Qiskit Aer) and an actual quantum hardware (IBM Fez). The former emulates the latter without the noise. The predicted results are compared with direct numerical simulation (DNS) data with infidelities of the order \(10^{-5}\) . In the quantum simulator, Trotterization is observed to have the lowest infidelity and is suitable for fault-tolerant computation. The AVQDS algorithm requires the lowest gate count and the lowest circuit depth. The VarQTE algorithm is the next best in terms of gate counts, but the number of its optimization variables is directly proportional to the number of qubits. Due to current hardware limitations, Trotterization cannot be implemented, as it has an overwhelming large number of operations. Meanwhile, AVQDS and VarQTE can be executed, but suffer from large errors due to significant hardware noise. These algorithms present a new paradigm for computational transport phenomena on quantum computers.
2025-03-17 Competitive algorithms for calculating the ground state properties of Bose-Fermi mixtures Tomasz Świsłocki, Krzysztof Gawryluk, Mirosław Brewczyk et.al. 2503.13717
Abstract (click to expand)In this work we define, analyze, and compare different numerical schemes that can be used to study the ground state properties of Bose-Fermi systems, such as mixtures of different atomic species under external forces or self-bound quantum droplets. The bosonic atoms are assumed to be condensed and are described by the generalized Gross-Pitaevskii equation. The fermionic atoms, on the other hand, are treated individually, and each atom is associated with a wave function whose evolution follows the Hartree-Fock equation. We solve such a formulated set of equations using a variety of methods, including those based on adiabatic switching of interactions and the imaginary time propagation technique combined with the Gram-Schmidt orthonormalization or the diagonalization of the Hamiltonian matrix. We show how different algorithms compete at the numerical level by studying the mixture in the range of parameters covering the formation of self-bound quantum Bose-Fermi droplets.
2025-03-17 PERC: a suite of software tools for the curation of cryoEM data with application to simulation, modelling and machine learning Beatriz Costa-Gomes, Joel Greer, Nikolai Juraschko et.al. 2503.13329 22 pages, 4 figures
Abstract (click to expand)Ease of access to data, tools and models expedites scientific research. In structural biology there are now numerous open repositories of experimental and simulated datasets. Being able to easily access and utilise these is crucial for allowing researchers to make optimal use of their research effort. The tools presented here are useful for collating existing public cryoEM datasets and/or creating new synthetic cryoEM datasets to aid the development of novel data processing and interpretation algorithms. In recent years, structural biology has seen the development of a multitude of machine-learning based algorithms for aiding numerous steps in the processing and reconstruction of experimental datasets and the use of these approaches has become widespread. Developing such techniques in structural biology requires access to large datasets which can be cumbersome to curate and unwieldy to make use of. In this paper we present a suite of Python software packages which we collectively refer to as PERC (profet, EMPIARreader and CAKED). These are designed to reduce the burden which data curation places upon structural biology research. The protein structure fetcher (profet) package allows users to conveniently download and cleave sequences or structures from the Protein Data Bank or Alphafold databases. EMPIARreader allows lazy loading of Electron Microscopy Public Image Archive datasets in a machine-learning compatible structure. The Class Aggregator for Key Electron-microscopy Data (CAKED) package is designed to seamlessly facilitate the training of machine learning models on electron microscopy data, including electron-cryo-microscopy-specific data augmentation and labelling. These packages may be utilised independently or as building blocks in workflows. All are available in open source repositories and designed to be easily extensible to facilitate more advanced workflows if required.
2025-03-17 Magneto-thermally Coupled Field Simulation of Homogenized Foil Winding Models Silas Weinert, Jonas Bundschuh, Yvonne Späck-Leigsnering et.al. 2503.13010 6 pages, 8 figures
Abstract (click to expand)Foil windings have, due to their layered structure, different properties than conventional wire windings, which make them advantageous for high frequency applications. Both electromagnetic and thermal analyses are relevant for foil windings. These two physical areas are coupled through Joule losses and temperature dependent material properties. For an efficient simulation of foil windings, homogenization techniques are used to avoid resolving the single turns. Therefore, this paper comprises a coupled magneto-thermal simulation that uses a homogenization method in the electromagnetic and thermal part. A weak coupling with different time step sizes for both parts is presented. The method is validated on a simple geometry and showcased for a pot transformer that uses a foil and a wire winding.
2025-03-17 AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations Quang Trung Truong, Wong Yuk Kwan, Duc Thanh Nguyen et.al. 2503.12828 under review
Abstract (click to expand)Underwater video analysis, hampered by the dynamic marine environment and camera motion, remains a challenging task in computer vision. Existing training-free video generation techniques, learning motion dynamics on the frame-by-frame basis, often produce poor results with noticeable motion interruptions and misaligments. To address these issues, we propose AUTV, a framework for synthesizing marine video data with pixel-wise annotations. We demonstrate the effectiveness of this framework by constructing two video datasets, namely UTV, a real-world dataset comprising 2,000 video-text pairs, and SUTV, a synthetic video dataset including 10,000 videos with segmentation masks for marine objects. UTV provides diverse underwater videos with comprehensive annotations including appearance, texture, camera intrinsics, lighting, and animal behavior. SUTV can be used to improve underwater downstream tasks, which are demonstrated in video inpainting and video object segmentation.
2025-03-17 Cohort-attention Evaluation Metric against Tied Data: Studying Performance of Classification Models in Cancer Detection Longfei Wei, Fang Sheng, Jianfei Zhang et.al. 2503.12755
Abstract (click to expand)Artificial intelligence (AI) has significantly improved medical screening accuracy, particularly in cancer detection and risk assessment. However, traditional classification metrics often fail to account for imbalanced data, varying performance across cohorts, and patient-level inconsistencies, leading to biased evaluations. We propose the Cohort-Attention Evaluation Metrics (CAT) framework to address these challenges. CAT introduces patient-level assessment, entropy-based distribution weighting, and cohort-weighted sensitivity and specificity. Key metrics like CATSensitivity (CATSen), CATSpecificity (CATSpe), and CATMean ensure balanced and fair evaluation across diverse populations. This approach enhances predictive reliability, fairness, and interpretability, providing a robust evaluation method for AI-driven medical screening models.
2025-03-16 Discovering uncertainty: Gaussian constitutive neural networks with correlated weights Jeremy A. McCulloch, Ellen Kuhl et.al. 2503.12679 link 10 pages, 5 figures, 1 table
Abstract (click to expand)When characterizing materials, it can be important to not only predict their mechanical properties, but also to estimate the probability distribution of these properties across a set of samples. Constitutive neural networks allow for the automated discovery of constitutive models that exactly satisfy physical laws given experimental testing data, but are only capable of predicting the mean stress response. Stochastic methods treat each weight as a random variable and are capable of learning their probability distributions. Bayesian constitutive neural networks combine both methods, but their weights lack physical interpretability and we must sample each weight from a probability distribution to train or evaluate the model. Here we introduce a more interpretable network with fewer parameters, simpler training, and the potential to discover correlated weights: Gaussian constitutive neural networks. We demonstrate the performance of our new Gaussian network on biaxial testing data, and discover a sparse and interpretable four-term model with correlated weights. Importantly, the discovered distributions of material parameters across a set of samples can serve as priors to discover better constitutive models for new samples with limited data. We anticipate that Gaussian constitutive neural networks are a natural first step towards generative constitutive models informed by physical laws and parameter uncertainty.
2025-03-16 Modeling ice cliff stability using a new Mohr-Coulomb-based phase field fracture model T. Clayton, R. Duddu, T. Hageman et.al. 2503.12481
Abstract (click to expand)Iceberg calving at glacier termini results in mass loss from ice sheets, but the associated fracture mechanics is often poorly represented using simplistic (empirical or elementary mechanics-based) failure criteria. Here, we propose an advanced Mohr-Coulomb failure criterion that drives cracking based on the visco-elastic stress state in ice. This criterion is implemented in a phase field fracture framework, and finite element simulations are conducted to determine the critical conditions that can trigger ice cliff collapse. Results demonstrate that fast-moving glaciers with negligible basal friction are prone to tensile failure causing crevasse propagation far away from the ice front; whilst slow-moving glaciers with significant basal friction are likely to exhibit shear failure near the ice front. Results also indicate that seawater pressure plays a major role in modulating cliff failure. For land terminating glaciers, full thickness cliff failure is observed if the glacier exceeds a critical height, dependent on cohesive strength \(\tau_\mathrm{c}\) (\(H \approx 120\;\text{m}\) for \(\tau_\mathrm{c}=0.5\;\text{MPa}\)). For marine-terminating glaciers, ice cliff failure occurs if a critical glacier free-board (\(H-h_\mathrm{w}\)) is exceeded, with ice slumping only observed above the ocean-water height; for \(\tau_\mathrm{c} = 0.5\;\text{MPa}\), the model-predicted critical free-board is \(H-h_\mathrm{w} \approx 215\;\text{m}\) , which is in good agreement with field observations. While the critical free-board height is larger than that predicted by some previous models, we cannot conclude that marine ice cliff instability is less likely because we do not include other failure processes such as hydrofracture of basal crevasses and plastic necking.
2025-03-16 Development of a Cost-Effective Simulation Tool for Loss of Flow Accident Transients in High-Temperature Gas-cooled Reactors Bo Liu, Wei Wang, Charles Moulinec et.al. 2503.12467
Abstract (click to expand)The aim of this work is to further expand the capability of the coarse-grid Computational Fluid Dynamics (CFD) approach, SubChCFD, to effectively simulate transient and buoyancy-influenced flows, which are critical in accident analyses of High-Temperature Gas-cooled Reactors (HTGRs). It has been demonstrated in our previous work that SubChCFD is highly adaptable to HTGR fuel designs and performs exceptionally well in modelling steady-state processes. In this study, the approach is extended to simulate a Loss of Flow Accident (LOFA) transient, where coolant circulation is disrupted, causing the transition from forced convection to buoyancy-driven natural circulation within the reactor core. To enable SubChCFD to capture the complex physics involved, corrections were introduced to the empirical correlations to account for the effects of flow unsteadiness, property variation and buoyancy. A 1/12th sector of the reactor core, representing the smallest symmetric unit, was modelled using a coarse mesh of approximately 60 million cells. This mesh size is about 6% of that required for a Reynolds Averaged Navier Stokes (RANS) model, where mesh sizes can typically reach the order of 1 billion cells for such configurations. Simulation results show that SubChCFD effectively captures the thermal hydraulic behaviours of the reactor during a LOFA transient, producing predictions in good agreement with RANS simulations while significantly reducing computational cost.
2025-03-14 Adiabatic Flame Temperatures for Oxy-Methane, Oxy-Hydrogen, Air-Methane, and Air-Hydrogen Stoichiometric Combustion using the NASA CEARUN Tool, GRI-Mech 3.0 Reaction Mechanism, and Cantera Python Package Osama A. Marzouk et.al. 2503.11826 8 pages, 8 figures, 8 tables, peer-reviewed journal paper, open access
Abstract (click to expand)The Adiabatic Flame Temperature (AFT) in combustion represents the maximum attainable temperature at which the chemical energy in the reactant fuel is converted into sensible heat in combustion products without heat loss. AFT depends on the fuel, oxidizer, and chemical composition of the products. Computing AFT requires solving either a nonlinear equation or a larger minimization problem. This study obtained the AFTs for oxy-methane (methane and oxygen), oxy-hydrogen (hydrogen and oxygen), air-methane (methane and air), and air-hydrogen (hydrogen and air) for stoichiometric conditions. The reactant temperature was 298.15 K (25{\deg}C), and the pressure was kept constant at 1 atm. Two reaction mechanisms were attempted: a global single-step irreversible reaction for complete combustion and the GRI-Mech 3.0 elementary mechanism (53 species, 325 steps) for chemical equilibrium with its associated thermodynamic data. NASA CEARUN was the main modeling tool used. Two other tools were used for benchmarking: an Excel and a Cantera-Python implementation of GRI-Mech 3.0. The results showed that the AFTs for oxy-methane were 5,166.47 K (complete combustion) and 3,050.12 K (chemical equilibrium), and dropped to 2,326.35 K and 2,224.25 K for air-methane, respectively. The AFTs for oxy-hydrogen were 4,930.56 K (complete combustion) and 3,074.51 K (chemical equilibrium), and dropped to 2,520.33 K and 2,378.62 K for air-hydrogen, respectively. For eight combustion modeling cases, the relative deviation between the AFTs predicted by CEARUN and GRI-Mech 3.0 ranged from 0.064% to 3.503%.
2025-03-14 Unfitted hybrid high-order methods stabilized by polynomial extension for elliptic interface problems Erik Burman, Alexandre Ern, Romain Mottier et.al. 2503.11397
Abstract (click to expand)In this work, we propose the design and the analysis of a novel hybrid high-order (HHO) method on unfitted meshes. HHO methods rely on a pair of unknowns, combining polynomials attached to the mesh faces and the mesh cells. In the unfitted framework, the interface can cut through the mesh cells in a very general fashion, and the polynomial unknowns are doubled in the cut cells and the cut faces. In order to avoid the ill-conditioning issues caused by the presence of small cut cells, the novel approach introduced herein is to use polynomial extensions in the definition of the gradient reconstruction operator. Stability and consistency results are established, leading to optimally decaying error estimates. The theory is illustrated by numerical experiments.
2025-03-14 Corrected Riemann smoothed particle hydrodynamics method for multi-resolution fluid-structure interaction Bo Zhang, Jianfeng Zhu, Xiangyu Hu et.al. 2503.11292 link 47 pages 19 figues
Abstract (click to expand)As a mesh-free method, smoothed particle hydrodynamics (SPH) has been widely used for modeling and simulating fluid-structure interaction (FSI) problems. While the kernel gradient correction (KGC) method is commonly applied in structural domains to enhance numerical consistency, high-order consistency corrections that preserve conservation remain underutilized in fluid domains despite their critical role in FSI analysis, especially for the multi-resolution scheme where fluid domains generally have a low resolution. In this study, we incorporate the reverse kernel gradient correction (RKGC) formulation, a conservative high-order consistency approximation, into the fluid discretization for solving FSI problems. RKGC has been proven to achieve exact second-order convergence with relaxed particles and improve numerical accuracy while particularly enhancing energy conservation in free-surface flow simulations. By integrating this correction into the Riemann SPH method to solve different typical FSI problems with a multi-resolution scheme, numerical results consistently show improvements in accuracy and convergence compared to uncorrected fluid discretization. Despite these advances, further refinement of correction techniques for solid domains and fluid-structure interfaces remains significant for enhancing the overall accuracy of SPH-based FSI modeling and simulation.
2025-03-13 Predicting Stock Movement with BERTweet and Transformers Michael Charles Albada, Mojolaoluwa Joshua Sonola et.al. 2503.10957 9 pages, 4 figures, 2 tables
Abstract (click to expand)Applying deep learning and computational intelligence to finance has been a popular area of applied research, both within academia and industry, and continues to attract active attention. The inherently high volatility and non-stationary of the data pose substantial challenges to machine learning models, especially so for today's expressive and highly-parameterized deep learning models. Recent work has combined natural language processing on data from social media to augment models based purely on historic price data to improve performance has received particular attention. Previous work has achieved state-of-the-art performance on this task by combining techniques such as bidirectional GRUs, variational autoencoders, word and document embeddings, self-attention, graph attention, and adversarial training. In this paper, we demonstrated the efficacy of BERTweet, a variant of BERT pre-trained specifically on a Twitter corpus, and the transformer architecture by achieving competitive performance with the existing literature and setting a new baseline for Matthews Correlation Coefficient on the Stocknet dataset without auxiliary data sources.
2025-03-13 Design and Analysis of an Extreme-Scale, High-Performance, and Modular Agent-Based Simulation Platform Lukas Johannes Breitwieser et.al. 2503.10796 PhD Thesis submitted to ETH Zurich
Abstract (click to expand)Agent-based modeling is indispensable for studying complex systems across many domains. However, existing simulation platforms exhibit two major issues: performance and modularity. Low performance prevents simulations with a large number of agents, increases development time, limits parameter exploration, and raises computing costs. Inflexible software designs motivate modelers to create their own tools, diverting valuable resources. This dissertation introduces a novel simulation platform called BioDynaMo and its significant improvement, TeraAgent, to alleviate these challenges via three major works. First, we lay the platform's foundation by defining abstractions, establishing software infrastructure, and implementing a multitude of features for agent-based modeling. We demonstrate BioDynaMo's modularity through use cases in neuroscience, epidemiology, and oncology. We validate these models and show the simplicity of adding new functionality with few lines of code. Second, we perform a rigorous performance analysis and identify challenges for shared-memory parallelism. Provided solutions include an optimized grid for neighbor searching, mechanisms to reduce the memory access latency, and exploiting domain knowledge to omit unnecessary work. These improvements yield up to three orders of magnitude speedups, enabling simulations of 1.7 billion agents on a single server. Third, we present TeraAgent, a distributed simulation engine that allows scaling out the computation of one simulation to multiple servers. We identify and address server communication bottlenecks and implement solutions for serialization and delta encoding to accelerate and reduce data transfer. TeraAgent can simulate 500 billion agents and scales to 84096 CPU cores. BioDynaMo has been widely adopted, including a prize-winning radiotherapy simulation recognized as a top 10 breakthrough in physics in 2024.
2025-03-13 Unifying monitoring and modelling of water concentration levels in surface waters Peter B Sorensen, Anders Nielsen, Peter E Holm et.al. 2503.10285 41 pages, 11 figures, Developed to support the Danish EPA
Abstract (click to expand)Accurate prediction of expected concentrations is essential for effective catchment management, requiring both extensive monitoring and advanced modeling techniques. However, due to limitations in the equation solving capacity, the integration of monitoring and modeling has been suffering suboptimal statistical approaches. This limitation results in models that can only partially leverage monitoring data, thus being an obstacle for realistic uncertainty assessments by overlooking critical correlations between both measurements and model parameters. This study presents a novel solution that integrates catchment monitoring and a unified hieratical statistical catchment modeling that employs a log-normal distribution for residuals within a left-censored likelihood function to address measurements below detection limits. This enables the estimation of concentrations within sub-catchments in conjunction with a source/fate sub-catchment model and monitoring data. This approach is possible due to a model builder R package denoted RTMB. The proposed approach introduces a statistical paradigm based on a hierarchical structure, capable of accommodating heterogeneous sampling across various sampling locations and the authors suggest that this also will encourage further refinement of other existing modeling platforms within the scientific community to improve synergy with monitoring programs. The application of the method is demonstrated through an analysis of nickel concentrations in Danish surface waters.
2025-03-13 A Neumann-Neumann Acceleration with Coarse Space for Domain Decomposition of Extreme Learning Machines Chang-Ock Lee, Byungeun Ryoo et.al. 2503.10032 21 pages, 6 figures, 6 tables
Abstract (click to expand)Extreme learning machines (ELMs), which preset hidden layer parameters and solve for last layer coefficients via a least squares method, can typically solve partial differential equations faster and more accurately than Physics Informed Neural Networks. However, they remain computationally expensive when high accuracy requires large least squares problems to be solved. Domain decomposition methods (DDMs) for ELMs have allowed parallel computation to reduce training times of large systems. This paper constructs a coarse space for ELMs, which enables further acceleration of their training. By partitioning interface variables into coarse and non-coarse variables, selective elimination introduces a Schur complement system on the non-coarse variables with the coarse problem embedded. Key to the performance of the proposed method is a Neumann-Neumann acceleration that utilizes the coarse space. Numerical experiments demonstrate significant speedup compared to a previous DDM method for ELMs.
2025-03-12 A Deep Reinforcement Learning Approach to Automated Stock Trading, using xLSTM Networks Faezeh Sarlakifar, Mohammadreza Mohammadzadeh Asl, Sajjad Rezvani Khaledi et.al. 2503.09655
Abstract (click to expand)Traditional Long Short-Term Memory (LSTM) networks are effective for handling sequential data but have limitations such as gradient vanishing and difficulty in capturing long-term dependencies, which can impact their performance in dynamic and risky environments like stock trading. To address these limitations, this study explores the usage of the newly introduced Extended Long Short Term Memory (xLSTM) network in combination with a deep reinforcement learning (DRL) approach for automated stock trading. Our proposed method utilizes xLSTM networks in both actor and critic components, enabling effective handling of time series data and dynamic market environments. Proximal Policy Optimization (PPO), with its ability to balance exploration and exploitation, is employed to optimize the trading strategy. Experiments were conducted using financial data from major tech companies over a comprehensive timeline, demonstrating that the xLSTM-based model outperforms LSTM-based methods in key trading evaluation metrics, including cumulative return, average profitability per trade, maximum earning rate, maximum pullback, and Sharpe ratio. These findings mark the potential of xLSTM for enhancing DRL-based stock trading systems.
2025-03-18 Leveraging LLMS for Top-Down Sector Allocation In Automated Trading Ryan Quek Wei Heng, Edoardo Vittori, Keane Ong et.al. 2503.09647
Abstract (click to expand)This paper introduces a methodology leveraging Large Language Models (LLMs) for sector-level portfolio allocation through systematic analysis of macroeconomic conditions and market sentiment. Our framework emphasizes top-down sector allocation by processing multiple data streams simultaneously, including policy documents, economic indicators, and sentiment patterns. Empirical results demonstrate superior risk-adjusted returns compared to traditional cross momentum strategies, achieving a Sharpe ratio of 2.51 and portfolio return of 8.79% versus -0.61 and -1.39% respectively. These results suggest that LLM-based systematic macro analysis presents a viable approach for enhancing automated portfolio allocation decisions at the sector level.
2025-03-12 AI-based Framework for Robust Model-Based Connector Mating in Robotic Wire Harness Installation Claudius Kienle, Benjamin Alt, Finn Schneider et.al. 2503.09409 6 pages, 6 figures, 4 tables, submitted to the 2025 IEEE 21st International Conference on Automation Science and Engineering
Abstract (click to expand)Despite the widespread adoption of industrial robots in automotive assembly, wire harness installation remains a largely manual process, as it requires precise and flexible manipulation. To address this challenge, we design a novel AI-based framework that automates cable connector mating by integrating force control with deep visuotactile learning. Our system optimizes search-and-insertion strategies using first-order optimization over a multimodal transformer architecture trained on visual, tactile, and proprioceptive data. Additionally, we design a novel automated data collection and optimization pipeline that minimizes the need for machine learning expertise. The framework optimizes robot programs that run natively on standard industrial controllers, permitting human experts to audit and certify them. Experimental validations on a center console assembly task demonstrate significant improvements in cycle times and robustness compared to conventional robot programming approaches. Videos are available under https://claudius-kienle.github.io/AppMuTT.
2025-03-12 Large-scale Thermo-Mechanical Simulation of Laser Beam Welding Using High-Performance Computing: A Qualitative Reproduction of Experimental Results Tommaso Bevilacqua, Andrey Gumenyuk, Niloufar Habibi et.al. 2503.09345
Abstract (click to expand)Laser beam welding is a non-contact joining technique that has gained significant importance in the course of the increasing degree of automation in industrial manufacturing. This process has established itself as a suitable joining tool for metallic materials due to its non-contact processing, short cycle times, and small heat-affected zones. One potential problem, however, is the formation of solidification cracks, which particularly affects alloys with a pronounced melting range. Since solidification cracking is influenced by both temperature and strain rate, precise measurement technologies are of crucial importance. For this purpose, as an experimental setup, a Controlled Tensile Weldability (CTW) test combined with a local deformation measurement technique is used. The aim of the present work is the development of computational methods and software tools to numerically simulate the CTW. The numerical results are compared with those obtained from the experimental CTW. In this study, an austenitic stainless steel sheet is selected. A thermo-elastoplastic material behavior with temperature-dependent material parameters is assumed. The time-dependent problem is first discretized in time and then the resulting nonlinear problem is linearized with Newton's method. For the discretization in space, finite elements are used. In order to obtain a sufficiently accurate solution, a large number of finite elements has to be used. In each Newton step, this yields a large linear system of equations that has to be solved. Therefore, a highly parallel scalable solver framework, based on the software library PETSc, was used to solve this computationally challenging problem on a high-performance computing architecture. Finally, the experimental results and the numerical simulations are compared, showing to be qualitatively in good agreement.
2025-03-12 A 3d particle visualization system for temperature management Benoit Lange, Nancy Rodriguez, William Puech et.al. 2503.09198
Abstract (click to expand)This paper deals with a 3D visualization technique proposed to analyze and manage energy efficiency from a data center. Data are extracted from sensors located in the IBM Green Data Center in Montpellier France. These sensors measure different information such as hygrometry, pressure and temperature. We want to visualize in real-time the large among of data produced by these sensors. A visualization engine has been designed, based on particles system and a client server paradigm. In order to solve performance problems, a Level Of Detail solution has been developed. These methods are based on the earlier work introduced by J. Clark in 1976. In this paper we introduce a particle method used for this work and subsequently we explain different simplification methods we have applied to improve our solution.
2025-03-11 Capturing Lifecycle System Degradation in Digital Twin Model Updating Yifan Tang, Mostafa Rahmani Dehaghani, G. Gary Wang et.al. 2503.08953 32 pages, 25 figures
Abstract (click to expand)Digital twin (DT) has emerged as a powerful tool to facilitate monitoring, control, and other decision-making tasks in real-world engineering systems. Online update methods have been proposed to update DT models. Considering the degradation behavior in the system lifecycle, these methods fail to enable DT models to predict the system responses affected by the system degradation over time. To alleviate this problem, degradation models of measurable parameters have been integrated into DT construction. However, identifying the degradation parameters relies on prior knowledge of the system and expensive experiments. To mitigate those limitations, this paper proposes a lifelong update method for DT models to capture the effects of system degradation on system responses without any prior knowledge and expensive offline experiments on the system. The core idea in the work is to represent the system degradation during the lifecycle as the dynamic changes of DT configurations (i.e., model parameters with a fixed model structure) at all degradation stages. During the lifelong update process, an Autoencoder is adopted to reconstruct the model parameters of all hidden layers simultaneously, so that the latent features taking into account the dependencies among hidden layers are obtained for each degradation stage. The dynamic behavior of latent features among successive degradation stages is then captured by a long short-term memory model, which enables prediction of the latent feature at any unseen stage. Based on the predicted latent features, the model configuration at future degradation stage is reconstructed to determine the new DT model, which predicts the system responses affected by the degradation at the same stage. The test results on two engineering datasets demonstrate that the proposed update method could capture effects of system degradation on system responses during the lifecycle.
2025-03-11 Towards Efficient Parametric State Estimation in Circulating Fuel Reactors with Shallow Recurrent Decoder Networks Stefano Riva, Carolina Introini, J. Nathan Kutz et.al. 2503.08904 link arXiv admin note: text overlap with arXiv:2409.12550
Abstract (click to expand)The recent developments in data-driven methods have paved the way to new methodologies to provide accurate state reconstruction of engineering systems; nuclear reactors represent particularly challenging applications for this task due to the complexity of the strongly coupled physics involved and the extremely harsh and hostile environments, especially for new technologies such as Generation-IV reactors. Data-driven techniques can combine different sources of information, including computational proxy models and local noisy measurements on the system, to robustly estimate the state. This work leverages the novel Shallow Recurrent Decoder architecture to infer the entire state vector (including neutron fluxes, precursors concentrations, temperature, pressure and velocity) of a reactor from three out-of-core time-series neutron flux measurements alone. In particular, this work extends the standard architecture to treat parametric time-series data, ensuring the possibility of investigating different accidental scenarios and showing the capabilities of this approach to provide an accurate state estimation in various operating conditions. This paper considers as a test case the Molten Salt Fast Reactor (MSFR), a Generation-IV reactor concept, characterised by strong coupling between the neutronics and the thermal hydraulics due to the liquid nature of the fuel. The promising results of this work are further strengthened by the possibility of quantifying the uncertainty associated with the state estimation, due to the considerably low training cost. The accurate reconstruction of every characteristic field in real-time makes this approach suitable for monitoring and control purposes in the framework of a reactor digital twin.
2025-03-11 Nonlinear optimals and their role in sustaining turbulence in channel flow Dario Klingenberg, Rich R. Kerswell et.al. 2503.08283 link
Abstract (click to expand)We investigate the energy transfer from the mean profile to velocity fluctuations in channel flow by calculating nonlinear optimal disturbances,i.e. the initial condition of a given finite energy that achieves the highest possible energy growth during a given fixed time horizon. It is found that for a large range of time horizons and initial disturbance energies, the nonlinear optimal exhibits streak spacing and amplitude consistent with DNS at least at Re_tau = 180, which suggests that they isolate the relevant physical mechanisms that sustain turbulence. Moreover, the time horizon necessary for a nonlinear disturbance to outperform a linear optimal is consistent with previous DNS-based estimates using eddy turnover time, which offers a new perspective on how some turbulent time scales are determined.
2025-03-11 XAI4Extremes: An interpretable machine learning framework for understanding extreme-weather precursors under climate change Jiawen Wei, Aniruddha Bora, Vivek Oommen et.al. 2503.08163
Abstract (click to expand)Extreme weather events are increasing in frequency and intensity due to climate change. This, in turn, is exacting a significant toll in communities worldwide. While prediction skills are increasing with advances in numerical weather prediction and artificial intelligence tools, extreme weather still present challenges. More specifically, identifying the precursors of such extreme weather events and how these precursors may evolve under climate change remain unclear. In this paper, we propose to use post-hoc interpretability methods to construct relevance weather maps that show the key extreme-weather precursors identified by deep learning models. We then compare this machine view with existing domain knowledge to understand whether deep learning models identified patterns in data that may enrich our understanding of extreme-weather precursors. We finally bin these relevant maps into different multi-year time periods to understand the role that climate change is having on these precursors. The experiments are carried out on Indochina heatwaves, but the methodology can be readily extended to other extreme weather events worldwide.
2025-03-10 Network Analysis of Uniswap: Centralization and Fragility in the Decentralized Exchange Market Tao Yan, Claudio J. Tessone et.al. 2503.07834
Abstract (click to expand)The Uniswap is a Decentralized Exchange (DEX) protocol that facilitates automatic token exchange without the need for traditional order books. Every pair of tokens forms a liquidity pool on Uniswap, and each token can be paired with any other token to create liquidity pools. This characteristic motivates us to employ a complex network approach to analyze the features of the Uniswap market. This research presents a comprehensive analysis of the Uniswap network using complex network methods. The network on October 31, 2023, is built to observe its recent features, showcasing both scale-free and core-periphery properties. By employing node and edge-betweenness metrics, we detect the most important tokens and liquidity pools. Additionally, we construct daily networks spanning from the beginning of Uniswap V2 on May 5, 2020, until October 31, 2023, and our findings demonstrate that the network becomes increasingly fragile over time. Furthermore, we conduct a robustness analysis by simulating the deletion of nodes to estimate the impact of some extreme events such as the Terra collapse. The results indicate that the Uniswap network exhibits robustness, yet it is notably fragile when deleting tokens with high betweenness centrality. This finding highlights that, despite being a decentralized exchange, Uniswap exhibits significant centralization tendencies in terms of token network connectivity and the distribution of TVL across nodes (tokens) and edges (liquidity pools).
2025-03-10 What is missing from existing Lithium-Sulfur models to capture coin-cell behaviour? Miss. Elizabeth Olisa Monica Marinescu et.al. 2503.07684 27 pages, 7 figures, conferences presented: ModVal 2025, ECS 2025
Abstract (click to expand)Lithium-sulfur (Li-S) batteries offer a promising alternative to current lithium-ion (Li-ion) batteries, with a high theoretical energy density, improved safety and high abundance, low cost of materials. For Li-S to reach commercial application, it is essential to understand how the behaviour scales between cell formats; new material development is predominately completed at coin-cell level, whilst pouch-cells will be used for commercial applications. Differences such as reduced electrolyte-to-sulfur (E/S) ratios and increased geometric size at larger cell formats contribute to the behavioural differences, in terms of achievable capacity, cyclability and potential degradation mechanisms. This work focuses on the steps required to capture and test coin-cell behaviour, building upon the existing models within the literature, which predominately focus on pouch-cells. The areas investigated throughout this study, to improve the capability of the model in terms of scaling ability and causality of predictions, include the cathode surface area, precipitation dynamics and C-rate dependence.
2025-03-10 Simultaneous Energy Harvesting and Bearing Fault Detection using Piezoelectric Cantilevers P. Peralta-Braz, M. M. Alamdari, C. T. Chou et.al. 2503.07462
Abstract (click to expand)Bearings are critical components in industrial machinery, yet their vulnerability to faults often leads to costly breakdowns. Conventional fault detection methods depend on continuous, high-frequency vibration sensing, digitising, and wireless transmission to the cloud-an approach that significantly drains the limited energy reserves of battery-powered sensors, accelerating their depletion and increasing maintenance costs. This work proposes a fundamentally different approach: rather than using instantaneous vibration data, we employ piezoelectric energy harvesters (PEHs) tuned to specific frequencies and leverage the cumulative harvested energy over time as the key diagnostic feature. By directly utilising the energy generated from the machinery's vibrations, we eliminate the need for frequent analog-to-digital conversions and data transmission, thereby reducing energy consumption at the sensor node and extending its operational lifetime. To validate this approach, we use a numerical PEH model and publicly available acceleration datasets, examining various PEH designs with different natural frequencies. We also consider the influence of the classification algorithm, the number of devices, and the observation window duration. The results demonstrate that the harvested energy reliably indicates bearing faults across a range of conditions and severities. By converting vibration energy into both a power source and a diagnostic feature, our solution offers a more sustainable, low-maintenance strategy for fault detection in smart machinery.
2025-03-10 Early signs of stuck pipe detection based on Crossformer Bo Cao, Yu Song, Jin Yang et.al. 2503.07440 33 pages,9 figure
Abstract (click to expand)Stuck pipe incidents are one of the major challenges in drilling engineering,leading to massive time loss and additional costs.To address the limitations of insufficient long sequence modeling capability,the difficulty in accurately establishing warning threshold,and the lack of model interpretability in existing methods,we utilize Crossformer for early signs of detection indicating potential stuck events in order to provide guidance for on-site drilling engineers and prevent stuck pipe incidents.The sliding window technique is integrated into Crossformer to allow it to output and display longer outputs,the improved Crossformer model is trained using normal time series drilling data to generate predictions for various parameters at each time step.The relative reconstruction error of model is regard as the risk of stuck pipe,thereby considering data that the model can't predict as anomalies,which represent the early signs of stuck pipe incidents.The multi-step prediction capability of Crossformer and relative reconstruction error are combined to assess stuck pipe risk at each time step in advance.We partition the reconstruction error into modeling error and error due to anomalous data fluctuations,furthermore,the dynamic warning threshold and warning time for stuck pipe incidents are determined using the probability density function of reconstruction errors from normal drilling data.The results indicate that our method can effectively detect early signs of stuck pipe incidents during the drilling process.Crossformer exhibits superior modeling and predictive capabilities compared with other deep learning models.Transformer-based models with multi-step prediction capability are more suitable for stuck pipe prediction compared to the current single-step prediction models.
2025-03-10 An Analytics-Driven Approach to Enhancing Supply Chain Visibility with Graph Neural Networks and Federated Learning Ge Zheng, Alexandra Brintrup et.al. 2503.07231 15 pages, 5 figures, 5 tables, submitted to a journal
Abstract (click to expand)In today's globalised trade, supply chains form complex networks spanning multiple organisations and even countries, making them highly vulnerable to disruptions. These vulnerabilities, highlighted by recent global crises, underscore the urgent need for improved visibility and resilience of the supply chain. However, data-sharing limitations often hinder the achievement of comprehensive visibility between organisations or countries due to privacy, security, and regulatory concerns. Moreover, most existing research studies focused on individual firm- or product-level networks, overlooking the multifaceted interactions among diverse entities that characterise real-world supply chains, thus limiting a holistic understanding of supply chain dynamics. To address these challenges, we propose a novel approach that integrates Federated Learning (FL) and Graph Convolutional Neural Networks (GCNs) to enhance supply chain visibility through relationship prediction in supply chain knowledge graphs. FL enables collaborative model training across countries by facilitating information sharing without requiring raw data exchange, ensuring compliance with privacy regulations and maintaining data security. GCNs empower the framework to capture intricate relational patterns within knowledge graphs, enabling accurate link prediction to uncover hidden connections and provide comprehensive insights into supply chain networks. Experimental results validate the effectiveness of the proposed approach, demonstrating its ability to accurately predict relationships within country-level supply chain knowledge graphs. This enhanced visibility supports actionable insights, facilitates proactive risk management, and contributes to the development of resilient and adaptive supply chain strategies, ensuring that supply chains are better equipped to navigate the complexities of the global economy.
2025-03-10 Simulating programmable morphing of shape memory polymer beam systems with complex geometry and topology Giulio Ferri, Enzo Marino et.al. 2503.07150
Abstract (click to expand)We propose a novel approach to the analysis of programmable geometrically exact shear deformable beam systems made of shape memory polymers. The proposed method combines the viscoelastic Generalized Maxwell model with the Williams, Landel and Ferry relaxation principle, enabling the reproduction of the shape memory effect of structural systems featuring complex geometry and topology. Very high efficiency is pursued by discretizing the differential problem in space through the isogeometric collocation (IGA-C) method. The method, in addition to the desirable attributes of isogeometric analysis (IGA), such as exactness of the geometric reconstruction of complex shapes and high-order accuracy, circumvents the need for numerical integration since it discretizes the problem in the strong form. Other distinguishing features of the proposed formulation are: i) \({\rm SO}(3)\) -consistency for the linearization of the problem and for the time stepping; ii) minimal (finite) rotation parametrization, that means only three rotational unknowns are used; iii) no additional unknowns are needed to account for the rate-dependent material compared to the purely elastic case. Through different numerical applications involving challenging initial geometries, we show that the proposed formulation possesses all the sought attributes in terms of programmability of complex systems, geometric flexibility, and high order accuracy.
2025-03-10 Effect of Selection Format on LLM Performance Yuchen Han, Yucheng Wu, Jeffrey Willard et.al. 2503.06926
Abstract (click to expand)This paper investigates a critical aspect of large language model (LLM) performance: the optimal formatting of classification task options in prompts. Through an extensive experimental study, we compared two selection formats -- bullet points and plain English -- to determine their impact on model performance. Our findings suggest that presenting options via bullet points generally yields better results, although there are some exceptions. Furthermore, our research highlights the need for continued exploration of option formatting to drive further improvements in model performance.
2025-03-09 Modular Photobioreactor Façade Systems for Sustainable Architecture: Design, Fabrication, and Real-Time Monitoring Xiujin Liu et.al. 2503.06769 21 pages, 22 figures, 3 tables
Abstract (click to expand)This paper proposes an innovative solution to the growing issue of greenhouse gas emissions: a closed photobioreactor (PBR) fa\c{c}ade system to mitigate greenhouse gas (GHG) concentrations. With digital fabrication technology, this study explores the transition from traditional, single function building facades to multifunctional, integrated building systems. It introduces a photobioreactor (PBR) fa\c{c}ade system to mitigate greenhouse gas (GHG) concentrations while addressing the challenge of large-scale prefabricated components transportation. This research introduces a novel approach by designing the fa\c{c}ade system as modular, user-friendly and transportation-friendly bricks, enabling the creation of a user-customized and self-assembled photobioreactor (PBR) system. The single module in the system is proposed to be "neutralization bricks", which embedded with algae and equipped with an air circulation system, facilitating the photobioreactor (PBR)'s functionality. A connection system between modules allows for easy assembly by users, while a limited variety of brick styles ensures modularity in manufacturing without sacrificing customization and diversity. The system is also equipped with an advanced microalgae status detection algorithm, which allows users to monitor the condition of the microalgae using monocular camera. This functionality ensures timely alerts and notifications for users to replace the algae, thereby optimizing the operational efficiency and sustainability of the algae cultivation process.
2025-03-09 Energy-Adaptive Checkpoint-Free Intermittent Inference for Low Power Energy Harvesting Systems Sahidul Islam, Wei Wei, Jishnu Banarjee et.al. 2503.06663
Abstract (click to expand)Deep neural network (DNN) inference in energy harvesting (EH) devices poses significant challenges due to resource constraints and frequent power interruptions. These power losses not only increase end-to-end latency, but also compromise inference consistency and accuracy, as existing checkpointing and restore mechanisms are prone to errors. Consequently, the quality of service (QoS) for DNN inference on EH devices is severely impacted. In this paper, we propose an energy-adaptive DNN inference mechanism capable of dynamically transitioning the model into a low-power mode by reducing computational complexity when harvested energy is limited. This approach ensures that end-to-end latency requirements are met. Additionally, to address the limitations of error-prone checkpoint-and-restore mechanisms, we introduce a checkpoint-free intermittent inference framework that ensures consistent, progress-preserving DNN inference during power failures in energy-harvesting systems.

(back to top)

📌 Reinforcement Learning in Finance

📅 Publish Date 📖 Title 👨‍💻 Authors 🔗 PDF 💻 Code 💬 Comment 📜 Abstract
2023-04-29 Systematic Review on Reinforcement Learning in the Field of Fintech Nadeem Malibari, Iyad Katib, Rashid Mehmood et.al. 2305.07466 31 pages, 15 figures, 7 tables
Abstract (click to expand)Applications of Reinforcement Learning in the Finance Technology (Fintech) have acquired a lot of admiration lately. Undoubtedly Reinforcement Learning, through its vast competence and proficiency, has aided remarkable results in the field of Fintech. The objective of this systematic survey is to perform an exploratory study on a correlation between reinforcement learning and Fintech to highlight the prediction accuracy, complexity, scalability, risks, profitability and performance. Major uses of reinforcement learning in finance or Fintech include portfolio optimization, credit risk reduction, investment capital management, profit maximization, effective recommendation systems, and better price setting strategies. Several studies have addressed the actual contribution of reinforcement learning to the performance of financial institutions. The latest studies included in this survey are publications from 2018 onward. The survey is conducted using PRISMA technique which focuses on the reporting of reviews and is based on a checklist and four-phase flow diagram. The conducted survey indicates that the performance of RL-based strategies in Fintech fields proves to perform considerably better than other state-of-the-art algorithms. The present work discusses the use of reinforcement learning algorithms in diverse decision-making challenges in Fintech and concludes that the organizations dealing with finance can benefit greatly from Robo-advising, smart order channelling, market making, hedging and options pricing, portfolio optimization, and optimal execution.
2022-06-28 Applications of Reinforcement Learning in Finance -- Trading with a Double Deep Q-Network Frensi Zejnullahu, Maurice Moser, Joerg Osterrieder et.al. 2206.14267
Abstract (click to expand)This paper presents a Double Deep Q-Network algorithm for trading single assets, namely the E-mini S&P 500 continuous futures contract. We use a proven setup as the foundation for our environment with multiple extensions. The features of our trading agent are constantly being expanded to include additional assets such as commodities, resulting in four models. We also respond to environmental conditions, including costs and crises. Our trading agent is first trained for a specific time period and tested on new data and compared with the long-and-hold strategy as a benchmark (market). We analyze the differences between the various models and the in-sample/out-of-sample performance with respect to the environment. The experimental results show that the trading agent follows an appropriate behavior. It can adjust its policy to different circumstances, such as more extensive use of the neutral position when trading costs are present. Furthermore, the net asset value exceeded that of the benchmark, and the agent outperformed the market in the test set. We provide initial insights into the behavior of an agent in a financial domain using a DDQN algorithm. The results of this study can be used for further development.
2023-02-28 Recent Advances in Reinforcement Learning in Finance Ben Hambly, Renyuan Xu, Huining Yang et.al. 2112.04553 60 pages, 1 figure
Abstract (click to expand)The rapid changes in the finance industry due to the increasing amount of data have revolutionized the techniques on data processing and data analysis and brought new theoretical and computational challenges. In contrast to classical stochastic control theory and other analytical approaches for solving financial decision-making problems that heavily reply on model assumptions, new developments from reinforcement learning (RL) are able to make full use of the large amount of financial data with fewer model assumptions and to improve decisions in complex financial environments. This survey paper aims to review the recent developments and use of RL approaches in finance. We give an introduction to Markov decision processes, which is the setting for many of the commonly used RL approaches. Various algorithms are then introduced with a focus on value and policy based methods that do not require any model assumptions. Connections are made with neural networks to extend the framework to encompass deep RL algorithms. Our survey concludes by discussing the application of these RL algorithms in a variety of decision-making problems in finance, including optimal execution, portfolio optimization, option pricing and hedging, market making, smart order routing, and robo-advising.

(back to top)

📌 Time Series Forecasting

📅 Publish Date 📖 Title 👨‍💻 Authors 🔗 PDF 💻 Code 💬 Comment 📜 Abstract
2025-03-31 Frequency-Aware Attention-LSTM for PM\(_{2.5}\) Time Series Forecasting Jiahui LU, Shuang Wu, Zhenkai Qin et.al. 2503.24043
Abstract (click to expand)To enhance the accuracy and robustness of PM \(_{2.5}\) concentration forecasting, this paper introduces FALNet, a Frequency-Aware LSTM Network that integrates frequency-domain decomposition, temporal modeling, and attention-based refinement. The model first applies STL and FFT to extract trend, seasonal, and denoised residual components, effectively filtering out high-frequency noise. The filtered residuals are then fed into a stacked LSTM to capture long-term dependencies, followed by a multi-head attention mechanism that dynamically focuses on key time steps. Experiments conducted on real-world urban air quality datasets demonstrate that FALNet consistently outperforms conventional models across standard metrics such as MAE, RMSE, and \(R^2\) . The model shows strong adaptability in capturing sharp fluctuations during pollution peaks and non-stationary conditions. These results validate the effectiveness and generalizability of FALNet for real-time air pollution prediction, environmental risk assessment, and decision-making support.
2025-03-31 CITRAS: Covariate-Informed Transformer for Time Series Forecasting Yosuke Yamaguchi, Issei Suemitsu, Wenpeng Wei et.al. 2503.24007
Abstract (click to expand)Covariates play an indispensable role in practical time series forecasting, offering rich context from the past and sometimes extending into the future. However, their availability varies depending on the scenario, and situations often involve multiple target variables simultaneously. Moreover, the cross-variate dependencies between them are multi-granular, with some covariates having a short-term impact on target variables and others showing long-term correlations. This heterogeneity and the intricate dependencies arising in covariate-informed forecasting present significant challenges to existing deep models. To address these issues, we propose CITRAS, a patch-based Transformer that flexibly leverages multiple targets and covariates covering both the past and the future forecasting horizon. While preserving the strong autoregressive capabilities of the canonical Transformer, CITRAS introduces two novel mechanisms in patch-wise cross-variate attention: Key-Value (KV) Shift and Attention Score Smoothing. KV Shift seamlessly incorporates future known covariates into the forecasting of target variables based on their concurrent dependencies. Additionally, Attention Score Smoothing transforms locally accurate patch-wise cross-variate dependencies into global variate-level dependencies by smoothing the past series of attention scores. Experimentally, CITRAS achieves state-of-the-art performance in both covariate-informed and multivariate forecasting, demonstrating its versatile ability to leverage cross-variate dependency for improved forecasting accuracy.
2025-04-01 Time-Series Forecasting via Topological Information Supervised Framework with Efficient Topological Feature Learning ZiXin Lin, Nur Fariha Syaqina Zulkepli et.al. 2503.23757 The experiments are incomplete
Abstract (click to expand)Topological Data Analysis (TDA) has emerged as a powerful tool for extracting meaningful features from complex data structures, driving significant advancements in fields such as neuroscience, biology, machine learning, and financial modeling. Despite its success, the integration of TDA with time-series prediction remains underexplored due to three primary challenges: the limited utilization of temporal dependencies within topological features, computational bottlenecks associated with persistent homology, and the deterministic nature of TDA pipelines restricting generalized feature learning. This study addresses these challenges by proposing the Topological Information Supervised (TIS) Prediction framework, which leverages neural networks and Conditional Generative Adversarial Networks (CGANs) to generate synthetic topological features, preserving their distribution while significantly reducing computational time. We propose a novel training strategy that integrates topological consistency loss to improve the predictive accuracy of deep learning models. Specifically, we introduce two state-of-the-art models, TIS-BiGRU and TIS-Informer, designed to capture short-term and long-term temporal dependencies, respectively. Comparative experimental results demonstrate the superior performance of TIS models over conventional predictors, validating the effectiveness of integrating topological information. This work not only advances TDA-based time-series prediction but also opens new avenues for utilizing topological features in deep learning architectures.
2025-03-30 Simple Feedfoward Neural Networks are Almost All You Need for Time Series Forecasting Fan-Keng Sun, Yu-Cheng Wu, Duane S. Boning et.al. 2503.23621
Abstract (click to expand)Time series data are everywhere -- from finance to healthcare -- and each domain brings its own unique complexities and structures. While advanced models like Transformers and graph neural networks (GNNs) have gained popularity in time series forecasting, largely due to their success in tasks like language modeling, their added complexity is not always necessary. In our work, we show that simple feedforward neural networks (SFNNs) can achieve performance on par with, or even exceeding, these state-of-the-art models, while being simpler, smaller, faster, and more robust. Our analysis indicates that, in many cases, univariate SFNNs are sufficient, implying that modeling interactions between multiple series may offer only marginal benefits. Even when inter-series relationships are strong, a basic multivariate SFNN still delivers competitive results. We also examine some key design choices and offer guidelines on making informed decisions. Additionally, we critique existing benchmarking practices and propose an improved evaluation protocol. Although SFNNs may not be optimal for every situation (hence the ``almost'' in our title) they serve as a strong baseline that future time series forecasting methods should always be compared against.
2025-03-28 Density-valued time series: Nonparametric density-on-density regression Frédéric Ferraty, Han Lin Shang et.al. 2503.22904 35 pages, 10 figures, 2 tables
Abstract (click to expand)This paper is concerned with forecasting probability density functions. Density functions are nonnegative and have a constrained integral; they thus do not constitute a vector space. Implementing unconstrained functional time-series forecasting methods is problematic for such nonlinear and constrained data. A novel forecasting method is developed based on a nonparametric function-on-function regression, where both the response and the predictor are probability density functions. Through a series of Monte-Carlo simulation studies, we evaluate the finite-sample performance of our nonparametric regression estimator. Using French departmental COVID19 data and age-specific period life tables in the United States, we assess and compare finite-sample forecast accuracy between the proposed and several existing methods.
2025-03-27 LeForecast: Enterprise Hybrid Forecast by Time Series Intelligence Zheng Tan, Yiwen Nie, Wenfa Wu et.al. 2503.22747
Abstract (click to expand)Demand is spiking in industrial fields for multidisciplinary forecasting, where a broad spectrum of sectors needs planning and forecasts to streamline intelligent business management, such as demand forecasting, product planning, inventory optimization, etc. Specifically, these tasks expecting intelligent approaches to learn from sequentially collected historical data and then foresee most possible trend, i.e. time series forecasting. Challenge of it lies in interpreting complex business contexts and the efficiency and generalisation of modelling. With aspirations of pre-trained foundational models for such purpose, given their remarkable success of large foundation model across legions of tasks, we disseminate \leforecast{}, an enterprise intelligence platform tailored for time series tasks. It integrates advanced interpretations of time series data and multi-source information, and a three-pillar modelling engine combining a large foundation model (Le-TSFM), multimodal model and hybrid model to derive insights, predict or infer futures, and then drive optimisation across multiple sectors in enterprise operations. The framework is composed by a model pool, model profiling module, and two different fusion approaches regarding original model architectures. Experimental results verify the efficiency of our trail fusion concepts: router-based fusion network and coordination of large and small models, resulting in high costs for redundant development and maintenance of models. This work reviews deployment of LeForecast and its performance in three industrial use cases. Our comprehensive experiments indicate that LeForecast is a profound and practical platform for efficient and competitive performance. And we do hope that this work can enlighten the research and grounding of time series techniques in accelerating enterprise.
2025-03-26 Adaptive State-Space Mamba for Real-Time Sensor Data Anomaly Detection Alice Zhang, Chao Li et.al. 2503.22743
Abstract (click to expand)State-space modeling has emerged as a powerful paradigm for sequence analysis in various tasks such as natural language processing, time-series forecasting, and signal processing. In this work, we propose an \emph{Adaptive State-Space Mamba} (\textbf{ASSM}) framework for real-time sensor data anomaly detection. While state-space models have been previously employed for image processing applications (e.g., style transfer \cite{wang2024stylemamba}), our approach leverages the core idea of sequential hidden states to tackle a significantly different domain: detecting anomalies on streaming sensor data. In particular, we introduce an adaptive gating mechanism that dynamically modulates the hidden state update based on contextual and learned statistical cues. This design ensures that our model remains computationally efficient and scalable, even under rapid data arrival rates. Extensive experiments on real-world and synthetic sensor datasets demonstrate that our method achieves superior detection performance compared to existing baselines. Our approach is easily extensible to other time-series tasks that demand rapid and reliable detection capabilities.
2025-03-28 Long-Term Electricity Demand Prediction Using Non-negative Tensor Factorization and Genetic Algorithm-Driven Temporal Modeling Toma Masaki, Kanta Tachibana et.al. 2503.22132 17 pages, 9 figures, 10 tables
Abstract (click to expand)This study proposes a novel framework for long-term electricity demand prediction based solely on historical consumption data, without relying on external variables such as temperature or economic indicators. The method combines Non-negative Tensor Factorization (NTF) to extract low-dimensional temporal features from multi-way electricity usage data, with a Genetic Algorithm that optimizes the hyperparameters of time series models applied to the latent annual factors. We model the dataset as a third-order tensor spanning electric utilities, industrial sectors, and years, and apply canonical polyadic decomposition under non-negativity constraints. The annual component is forecasted using autoregressive models, with hyperparameter tuning guided by the prediction error or reconstruction accuracy on a validation set. Comparative experiments using real-world electricity data from Japan demonstrate that the proposed method achieves lower mean squared error than baseline approaches without tensor decomposition or evolutionary optimization. Moreover, we find that reducing the model's degrees of freedom via tensor decomposition improves generalization performance, and that initialization sensitivity in NTF can be mitigated through multiple runs or ensemble strategies. These findings suggest that the proposed framework offers an interpretable, flexible, and scalable approach to long-term electricity demand prediction and can be extended to other structured time series forecasting tasks.
2025-03-27 Dual-Splitting Conformal Prediction for Multi-Step Time Series Forecasting Qingdi Yu, Zhiwei Cao, Ruihang Wang et.al. 2503.21251 28 pages, 13 figures, 3 tables. Submitted to Applied Soft Computing. With Editor This is the first public release of the work
Abstract (click to expand)Time series forecasting is crucial for applications like resource scheduling and risk management, where multi-step predictions provide a comprehensive view of future trends. Uncertainty Quantification (UQ) is a mainstream approach for addressing forecasting uncertainties, with Conformal Prediction (CP) gaining attention due to its model-agnostic nature and statistical guarantees. However, most variants of CP are designed for single-step predictions and face challenges in multi-step scenarios, such as reliance on real-time data and limited scalability. This highlights the need for CP methods specifically tailored to multi-step forecasting. We propose the Dual-Splitting Conformal Prediction (DSCP) method, a novel CP approach designed to capture inherent dependencies within time-series data for multi-step forecasting. Experimental results on real-world datasets from four different domains demonstrate that the proposed DSCP significantly outperforms existing CP variants in terms of the Winkler Score, achieving a performance improvement of up to 23.59% compared to state-of-the-art methods. Furthermore, we deployed the DSCP approach for renewable energy generation and IT load forecasting in power management of a real-world trajectory-based application, achieving an 11.25% reduction in carbon emissions through predictive optimization of data center operations and controls.
2025-03-26 TS-Inverse: A Gradient Inversion Attack Tailored for Federated Time Series Forecasting Models Caspar Meijer, Jiyue Huang, Shreshtha Sharma et.al. 2503.20952 link
Abstract (click to expand)Federated learning (FL) for time series forecasting (TSF) enables clients with privacy-sensitive time series (TS) data to collaboratively learn accurate forecasting models, for example, in energy load prediction. Unfortunately, privacy risks in FL persist, as servers can potentially reconstruct clients' training data through gradient inversion attacks (GIA). Although GIA is demonstrated for image classification tasks, little is known about time series regression tasks. In this paper, we first conduct an extensive empirical study on inverting TS data across 4 TSF models and 4 datasets, identifying the unique challenges of reconstructing both observations and targets of TS data. We then propose TS-Inverse, a novel GIA that improves the inversion of TS data by (i) learning a gradient inversion model that outputs quantile predictions, (ii) a unique loss function that incorporates periodicity and trend regularization, and (iii) regularization according to the quantile predictions. Our evaluations demonstrate a remarkable performance of TS-Inverse, achieving at least a 2x-10x improvement in terms of the sMAPE metric over existing GIA methods on TS data. Code repository: https://github.com/Capsar/ts-inverse
2025-03-26 Addressing Challenges in Time Series Forecasting: A Comprehensive Comparison of Machine Learning Techniques Seyedeh Azadeh Fallah Mortezanejad, Ruochen Wang et.al. 2503.20148
Abstract (click to expand)The explosion of Time Series (TS) data, driven by advancements in technology, necessitates sophisticated analytical methods. Modern management systems increasingly rely on analyzing this data, highlighting the importance of effcient processing techniques. State-of-the-art Machine Learning (ML) approaches for TS analysis and forecasting are becoming prevalent. This paper briefly describes and compiles suitable algorithms for TS regression task. We compare these algorithms against each other and the classic ARIMA method using diverse datasets: complete data, data with outliers, and data with missing values. The focus is on forecasting accuracy, particularly for long-term predictions. This research aids in selecting the most appropriate algorithm based on forecasting needs and data characteristics.
2025-03-25 Towards Reliable Time Series Forecasting under Future Uncertainty: Ambiguity and Novelty Rejection Mechanisms Ninghui Feng, Songning Lai, Xin Zhou et.al. 2503.19656
Abstract (click to expand)In real-world time series forecasting, uncertainty and lack of reliable evaluation pose significant challenges. Notably, forecasting errors often arise from underfitting in-distribution data and failing to handle out-of-distribution inputs. To enhance model reliability, we introduce a dual rejection mechanism combining ambiguity and novelty rejection. Ambiguity rejection, using prediction error variance, allows the model to abstain under low confidence, assessed through historical error variance analysis without future ground truth. Novelty rejection, employing Variational Autoencoders and Mahalanobis distance, detects deviations from training data. This dual approach improves forecasting reliability in dynamic environments by reducing errors and adapting to data changes, advancing reliability in complex scenarios.
2025-03-22 A Survey on Structured State Space Sequence (S4) Models Shriyank Somvanshi, Md Monzurul Islam, Mahmuda Sultana Mimi et.al. 2503.18970 30 pages, 8 figures, 3 tables
Abstract (click to expand)Recent advancements in sequence modeling have led to the emergence of Structured State Space Models (SSMs) as an efficient alternative to Recurrent Neural Networks (RNNs) and Transformers, addressing challenges in long-range dependency modeling and computational efficiency. While RNNs suffer from vanishing gradients and sequential inefficiencies, and Transformers face quadratic complexity, SSMs leverage structured recurrence and state-space representations to achieve superior long-sequence processing with linear or near-linear complexity. This survey provides a comprehensive review of SSMs, tracing their evolution from the foundational S4 model to its successors like Mamba, Simplified Structured State Space Sequence Model (S5), and Jamba, highlighting their improvements in computational efficiency, memory optimization, and inference speed. By comparing SSMs with traditional sequence models across domains such as natural language processing (NLP), speech recognition, vision, and time-series forecasting, we demonstrate their advantages in handling long-range dependencies while reducing computational overhead. Despite their potential, challenges remain in areas such as training optimization, hybrid modeling, and interpretability. This survey serves as a structured guide for researchers and practitioners, detailing the advancements, trade-offs, and future directions of SSM-based architectures in AI and deep learning.
2025-03-24 Efficient Transformed Gaussian Process State-Space Models for Non-Stationary High-Dimensional Dynamical Systems Zhidi Lin, Ying Li, Feng Yin et.al. 2503.18309 13 pages, 6 figures
Abstract (click to expand)Gaussian process state-space models (GPSSMs) have emerged as a powerful framework for modeling dynamical systems, offering interpretable uncertainty quantification and inherent regularization. However, existing GPSSMs face significant challenges in handling high-dimensional, non-stationary systems due to computational inefficiencies, limited scalability, and restrictive stationarity assumptions. In this paper, we propose an efficient transformed Gaussian process state-space model (ETGPSSM) to address these limitations. Our approach leverages a single shared Gaussian process (GP) combined with normalizing flows and Bayesian neural networks, enabling efficient modeling of complex, high-dimensional state transitions while preserving scalability. To address the lack of closed-form expressions for the implicit process in the transformed GP, we follow its generative process and introduce an efficient variational inference algorithm, aided by the ensemble Kalman filter (EnKF), to enable computationally tractable learning and inference. Extensive empirical evaluations on synthetic and real-world datasets demonstrate the superior performance of our ETGPSSM in system dynamics learning, high-dimensional state estimation, and time-series forecasting, outperforming existing GPSSMs and neural network-based methods in both accuracy and computational efficiency.
2025-03-22 Renewable Energy Transition in South America: Predictive Analysis of Generation Capacity by 2050 Triveni Magadum, Sanjana Murgod, Kartik Garg et.al. 2503.17771 13 pages, 5 figures
Abstract (click to expand)In this research, renewable energy expansion in South America up to 2050 is predicted based on machine learning models that are trained on past energy data. The research employs gradient boosting regression and Prophet time series forecasting to make predictions of future generation capacities for solar, wind, hydroelectric, geothermal, biomass, and other renewable sources in South American nations. Model output analysis indicates staggering future expansion in the generation of renewable energy, with solar and wind energy registering the highest expansion rates. Geospatial visualization methods were applied to illustrate regional disparities in the utilization of renewable energy. The results forecast South America to record nearly 3-fold growth in the generation of renewable energy by the year 2050, with Brazil and Chile spearheading regional development. Such projections help design energy policy, investment strategy, and climate change mitigation throughout the region, in helping the developing economies to transition to sustainable energy.
2025-03-22 Sentinel: Multi-Patch Transformer with Temporal and Channel Attention for Time Series Forecasting Davide Villaboni, Alberto Castellini, Ivan Luciano Danesi et.al. 2503.17658
Abstract (click to expand)Transformer-based time series forecasting has recently gained strong interest due to the ability of transformers to model sequential data. Most of the state-of-the-art architectures exploit either temporal or inter-channel dependencies, limiting their effectiveness in multivariate time-series forecasting where both types of dependencies are crucial. We propose Sentinel, a full transformer-based architecture composed of an encoder able to extract contextual information from the channel dimension, and a decoder designed to capture causal relations and dependencies across the temporal dimension. Additionally, we introduce a multi-patch attention mechanism, which leverages the patching process to structure the input sequence in a way that can be naturally integrated into the transformer architecture, replacing the multi-head splitting process. Extensive experiments on standard benchmarks demonstrate that Sentinel, because of its ability to "monitor" both the temporal and the inter-channel dimension, achieves better or comparable performance with respect to state-of-the-art approaches.
2025-03-21 CausalRivers -- Scaling up benchmarking of causal discovery for real-world time-series Gideon Stein, Maha Shadaydeh, Jan Blunk et.al. 2503.17452 10 pages, 8 figures, ICLR2025 main track
Abstract (click to expand)Causal discovery, or identifying causal relationships from observational data, is a notoriously challenging task, with numerous methods proposed to tackle it. Despite this, in-the-wild evaluation of these methods is still lacking, as works frequently rely on synthetic data evaluation and sparse real-world examples under critical theoretical assumptions. Real-world causal structures, however, are often complex, making it hard to decide on a proper causal discovery strategy. To bridge this gap, we introduce CausalRivers, the largest in-the-wild causal discovery benchmarking kit for time-series data to date. CausalRivers features an extensive dataset on river discharge that covers the eastern German territory (666 measurement stations) and the state of Bavaria (494 measurement stations). It spans the years 2019 to 2023 with a 15-minute temporal resolution. Further, we provide additional data from a flood around the Elbe River, as an event with a pronounced distributional shift. Leveraging multiple sources of information and time-series meta-data, we constructed two distinct causal ground truth graphs (Bavaria and eastern Germany). These graphs can be sampled to generate thousands of subgraphs to benchmark causal discovery across diverse and challenging settings. To demonstrate the utility of CausalRivers, we evaluate several causal discovery approaches through a set of experiments to identify areas for improvement. CausalRivers has the potential to facilitate robust evaluations and comparisons of causal discovery methods. Besides this primary purpose, we also expect that this dataset will be relevant for connected areas of research, such as time-series forecasting and anomaly detection. Based on this, we hope to push benchmark-driven method development that fosters advanced techniques for causal discovery, as is the case for many other areas of machine learning.
2025-03-24 DiTEC-WDN: A Large-Scale Dataset of Hydraulic Scenarios across Multiple Water Distribution Networks Huy Truong, Andrés Tello, Alexander Lazovik et.al. 2503.17167 link Submitted to Nature Scientific Data. Huy Truong and Andr\'es Tello contributed equally to this work. For the dataset, see https://huggingface.co/datasets/rugds/ditec-wdn
Abstract (click to expand)Privacy restrictions hinder the sharing of real-world Water Distribution Network (WDN) models, limiting the application of emerging data-driven machine learning, which typically requires extensive observations. To address this challenge, we propose the dataset DiTEC-WDN that comprises 36,000 unique scenarios simulated over either short-term (24 hours) or long-term (1 year) periods. We constructed this dataset using an automated pipeline that optimizes crucial parameters (e.g., pressure, flow rate, and demand patterns), facilitates large-scale simulations, and records discrete, synthetic but hydraulically realistic states under standard conditions via rule validation and post-hoc analysis. With a total of 228 million generated graph-based states, DiTEC-WDN can support a variety of machine-learning tasks, including graph-level, node-level, and link-level regression, as well as time-series forecasting. This contribution, released under a public license, encourages open scientific research in the critical water sector, eliminates the risk of exposing sensitive data, and fulfills the need for a large-scale water distribution network benchmark for study comparisons and scenario analysis.
2025-03-21 MTBench: A Multimodal Time Series Benchmark for Temporal Reasoning and Question Answering Jialin Chen, Aosong Feng, Ziyu Zhao et.al. 2503.16858 link 14 pages
Abstract (click to expand)Understanding the relationship between textual news and time-series evolution is a critical yet under-explored challenge in applied data science. While multimodal learning has gained traction, existing multimodal time-series datasets fall short in evaluating cross-modal reasoning and complex question answering, which are essential for capturing complex interactions between narrative information and temporal patterns. To bridge this gap, we introduce Multimodal Time Series Benchmark (MTBench), a large-scale benchmark designed to evaluate large language models (LLMs) on time series and text understanding across financial and weather domains. MTbench comprises paired time series and textual data, including financial news with corresponding stock price movements and weather reports aligned with historical temperature records. Unlike existing benchmarks that focus on isolated modalities, MTbench provides a comprehensive testbed for models to jointly reason over structured numerical trends and unstructured textual narratives. The richness of MTbench enables formulation of diverse tasks that require a deep understanding of both text and time-series data, including time-series forecasting, semantic and technical trend analysis, and news-driven question answering (QA). These tasks target the model's ability to capture temporal dependencies, extract key insights from textual context, and integrate cross-modal information. We evaluate state-of-the-art LLMs on MTbench, analyzing their effectiveness in modeling the complex relationships between news narratives and temporal patterns. Our findings reveal significant challenges in current models, including difficulties in capturing long-term dependencies, interpreting causality in financial and weather trends, and effectively fusing multimodal information.
2025-03-19 HQNN-FSP: A Hybrid Classical-Quantum Neural Network for Regression-Based Financial Stock Market Prediction Prashant Kumar Choudhary, Nouhaila Innan, Muhammad Shafique et.al. 2503.15403 11 pages and 11 figures
Abstract (click to expand)Financial time-series forecasting remains a challenging task due to complex temporal dependencies and market fluctuations. This study explores the potential of hybrid quantum-classical approaches to assist in financial trend prediction by leveraging quantum resources for improved feature representation and learning. A custom Quantum Neural Network (QNN) regressor is introduced, designed with a novel ansatz tailored for financial applications. Two hybrid optimization strategies are proposed: (1) a sequential approach where classical recurrent models (RNN/LSTM) extract temporal dependencies before quantum processing, and (2) a joint learning framework that optimizes classical and quantum parameters simultaneously. Systematic evaluation using TimeSeriesSplit, k-fold cross-validation, and predictive error analysis highlights the ability of these hybrid models to integrate quantum computing into financial forecasting workflows. The findings demonstrate how quantum-assisted learning can contribute to financial modeling, offering insights into the practical role of quantum resources in time-series analysis.
2025-03-19 Diffusion-Based Forecasting for Uncertainty-Aware Model Predictive Control Stelios Zarifis, Ioannis Kordonis, Petros Maragos et.al. 2503.15095 5 pages, 3 figures, 3 tables. This version is submitted to the 33rd European Signal Processing Conference (EUSIPCO 2025), to be held in Isola delle Femmine - Palermo - Italy, on September 8-12, 2025
Abstract (click to expand)We propose Diffusion-Informed Model Predictive Control (D-I MPC), a generic framework for uncertainty-aware prediction and decision-making in partially observable stochastic systems by integrating diffusion-based time series forecasting models in Model Predictive Control algorithms. In our approach, a diffusion-based time series forecasting model is used to probabilistically estimate the evolution of the system's stochastic components. These forecasts are then incorporated into MPC algorithms to estimate future trajectories and optimize action selection under the uncertainty of the future. We evaluate the framework on the task of energy arbitrage, where a Battery Energy Storage System participates in the day-ahead electricity market of the New York state. Experimental results indicate that our model-based approach with a diffusion-based forecaster significantly outperforms both implementations with classical forecasting methods and model-free reinforcement learning baselines.
2025-03-18 Theoretical Foundation of Flow-Based Time Series Generation: Provable Approximation, Generalization, and Efficiency Jiangxuan Long, Zhao Song, Chiwun Yang et.al. 2503.14076 33 pages
Abstract (click to expand)Recent studies suggest utilizing generative models instead of traditional auto-regressive algorithms for time series forecasting (TSF) tasks. These non-auto-regressive approaches involving different generative methods, including GAN, Diffusion, and Flow Matching for time series, have empirically demonstrated high-quality generation capability and accuracy. However, we still lack an appropriate understanding of how it processes approximation and generalization. This paper presents the first theoretical framework from the perspective of flow-based generative models to relieve the knowledge of limitations. In particular, we provide our insights with strict guarantees from three perspectives: \(\textbf{Approximation}\), \(\textbf{Generalization}\) and \(\textbf{Efficiency}\). In detail, our analysis achieves the contributions as follows: \(\bullet\) By assuming a general data model, the fitting of the flow-based generative models is confirmed to converge to arbitrary error under the universal approximation of Diffusion Transformer (DiT). \(\bullet\) Introducing a polynomial-based regularization for flow matching, the generalization error thus be bounded since the generalization of polynomial approximation. \(\bullet\) The sampling for generation is considered as an optimization process, we demonstrate its fast convergence with updating standard first-order gradient descent of some objective.
2025-03-17 Augmented Invertible Koopman Autoencoder for long-term time series forecasting Anthony Frion, Lucas Drumetz, Mauro Dalla Mura et.al. 2503.12930 link
Abstract (click to expand)Following the introduction of Dynamic Mode Decomposition and its numerous extensions, many neural autoencoder-based implementations of the Koopman operator have recently been proposed. This class of methods appears to be of interest for modeling dynamical systems, either through direct long-term prediction of the evolution of the state or as a powerful embedding for downstream methods. In particular, a recent line of work has developed invertible Koopman autoencoders (IKAEs), which provide an exact reconstruction of the input state thanks to their analytically invertible encoder, based on coupling layer normalizing flow models. We identify that the conservation of the dimension imposed by the normalizing flows is a limitation for the IKAE models, and thus we propose to augment the latent state with a second, non-invertible encoder network. This results in our new model: the Augmented Invertible Koopman AutoEncoder (AIKAE). We demonstrate the relevance of the AIKAE through a series of long-term time series forecasting experiments, on satellite image time series as well as on a benchmark involving predictions based on a large lookback window of observations.
2025-03-18 Epidemic Forecasting with a Hybrid Deep Learning Method Using CNN-LSTM With WOA-GWO Parameter Optimization: Global COVID-19 Case Study Mousa Alizadeh, Mohammad Hossein Samaei, Azam Seilsepour et.al. 2503.12813
Abstract (click to expand)Effective epidemic modeling is essential for managing public health crises, requiring robust methods to predict disease spread and optimize resource allocation. This study introduces a novel deep learning framework that advances time series forecasting for infectious diseases, with its application to COVID 19 data as a critical case study. Our hybrid approach integrates Convolutional Neural Networks (CNNs) and Long Short Term Memory (LSTM) models to capture spatial and temporal dynamics of disease transmission across diverse regions. The CNN extracts spatial features from raw epidemiological data, while the LSTM models temporal patterns, yielding precise and adaptable predictions. To maximize performance, we employ a hybrid optimization strategy combining the Whale Optimization Algorithm (WOA) and Gray Wolf Optimization (GWO) to fine tune hyperparameters, such as learning rates, batch sizes, and training epochs enhancing model efficiency and accuracy. Applied to COVID 19 case data from 24 countries across six continents, our method outperforms established benchmarks, including ARIMA and standalone LSTM models, with statistically significant gains in predictive accuracy (e.g., reduced RMSE). This framework demonstrates its potential as a versatile method for forecasting epidemic trends, offering insights for resource planning and decision making in both historical contexts, like the COVID 19 pandemic, and future outbreaks.
2025-03-15 ChronosX: Adapting Pretrained Time Series Models with Exogenous Variables Sebastian Pineda Arango, Pedro Mercado, Shubham Kapoor et.al. 2503.12107 Accepted at the 28th International Conference on Artificial Intelligence and Statistics (AISTATS), 2025
Abstract (click to expand)Covariates provide valuable information on external factors that influence time series and are critical in many real-world time series forecasting tasks. For example, in retail, covariates may indicate promotions or peak dates such as holiday seasons that heavily influence demand forecasts. Recent advances in pretraining large language model architectures for time series forecasting have led to highly accurate forecasters. However, the majority of these models do not readily use covariates as they are often specific to a certain task or domain. This paper introduces a new method to incorporate covariates into pretrained time series forecasting models. Our proposed approach incorporates covariate information into pretrained forecasting models through modular blocks that inject past and future covariate information, without necessarily modifying the pretrained model in consideration. In order to evaluate our approach, we introduce a benchmark composed of 32 different synthetic datasets with varying dynamics to evaluate the effectivity of forecasting models with covariates. Extensive evaluations on both synthetic and real datasets show that our approach effectively incorporates covariate information into pretrained models, outperforming existing baselines.
2025-03-14 Hierarchical Information-Guided Spatio-Temporal Mamba for Stock Time Series Forecasting Wenbo Yan, Shurui Wang, Ying Tan et.al. 2503.11387
Abstract (click to expand)Mamba has demonstrated excellent performance in various time series forecasting tasks due to its superior selection mechanism. Nevertheless, conventional Mamba-based models encounter significant challenges in accurately predicting stock time series, as they fail to adequately capture both the overarching market dynamics and the intricate interdependencies among individual stocks. To overcome these constraints, we introduce the Hierarchical Information-Guided Spatio-Temporal Mamba (HIGSTM) framework. HIGSTM introduces Index-Guided Frequency Filtering Decomposition to extract commonality and specificity from time series. The model architecture features a meticulously designed hierarchical framework that systematically captures both temporal dynamic patterns and global static relationships within the stock market. Furthermore, we propose an Information-Guided Mamba that integrates macro informations into the sequence selection process, thereby facilitating more market-conscious decision-making. Comprehensive experimental evaluations conducted on the CSI500, CSI800 and CSI1000 datasets demonstrate that HIGSTM achieves state-of-the-art performance.
2025-03-13 Mamba time series forecasting with uncertainty propagation Pedro Pessoa, Paul Campitelli, Douglas P. Shepherd et.al. 2503.10873 link
Abstract (click to expand)State space models, such as Mamba, have recently garnered attention in time series forecasting due to their ability to capture sequence patterns. However, in electricity consumption benchmarks, Mamba forecasts exhibit a mean error of approximately 8\%. Similarly, in traffic occupancy benchmarks, the mean error reaches 18\%. This discrepancy leaves us to wonder whether the prediction is simply inaccurate or falls within error given spread in historical data. To address this limitation, we propose a method to quantify the predictive uncertainty of Mamba forecasts. Here, we propose a dual-network framework based on the Mamba architecture for probabilistic forecasting, where one network generates point forecasts while the other estimates predictive uncertainty by modeling variance. We abbreviate our tool, Mamba with probabilistic time series forecasting, as Mamba-ProbTSF and the code for its implementation is available on GitHub (https://github.com/PessoaP/Mamba-ProbTSF). Evaluating this approach on synthetic and real-world benchmark datasets, we find Kullback-Leibler divergence between the learned distributions and the data--which, in the limit of infinite data, should converge to zero if the model correctly captures the underlying probability distribution--reduced to the order of \(10^{-3}\) for synthetic data and \(10^{-1}\) for real-world benchmark, demonstrating its effectiveness. We find that in both the electricity consumption and traffic occupancy benchmark, the true trajectory stays within the predicted uncertainty interval at the two-sigma level about 95\% of the time. We end with a consideration of potential limitations, adjustments to improve performance, and considerations for applying this framework to processes for purely or largely stochastic dynamics where the stochastic changes accumulate, as observed for example in pure Brownian motion or molecular dynamics trajectories.
2025-03-13 Towards Efficient Large Scale Spatial-Temporal Time Series Forecasting via Improved Inverted Transformers Jiarui Sun, Chin-Chia Michael Yeh, Yujie Fan et.al. 2503.10858 10 pages
Abstract (click to expand)Time series forecasting at scale presents significant challenges for modern prediction systems, particularly when dealing with large sets of synchronized series, such as in a global payment network. In such systems, three key challenges must be overcome for accurate and scalable predictions: 1) emergence of new entities, 2) disappearance of existing entities, and 3) the large number of entities present in the data. The recently proposed Inverted Transformer (iTransformer) architecture has shown promising results by effectively handling variable entities. However, its practical application in large-scale settings is limited by quadratic time and space complexity ( \(O(N^2)\)) with respect to the number of entities \(N\). In this paper, we introduce EiFormer, an improved inverted transformer architecture that maintains the adaptive capabilities of iTransformer while reducing computational complexity to linear scale (\(O(N)\) ). Our key innovation lies in restructuring the attention mechanism to eliminate redundant computations without sacrificing model expressiveness. Additionally, we incorporate a random projection mechanism that not only enhances efficiency but also improves prediction accuracy through better feature representation. Extensive experiments on the public LargeST benchmark dataset and a proprietary large-scale time series dataset demonstrate that EiFormer significantly outperforms existing methods in both computational efficiency and forecasting accuracy. Our approach enables practical deployment of transformer-based forecasting in industrial applications where handling time series at scale is essential.
2025-03-13 Deep Learning for Time Series Forecasting: A Survey Xiangjie Kong, Zhenghao Chen, Weiyao Liu et.al. 2503.10198
Abstract (click to expand)Time series forecasting (TSF) has long been a crucial task in both industry and daily life. Most classical statistical models may have certain limitations when applied to practical scenarios in fields such as energy, healthcare, traffic, meteorology, and economics, especially when high accuracy is required. With the continuous development of deep learning, numerous new models have emerged in the field of time series forecasting in recent years. However, existing surveys have not provided a unified summary of the wide range of model architectures in this field, nor have they given detailed summaries of works in feature extraction and datasets. To address this gap, in this review, we comprehensively study the previous works and summarize the general paradigms of Deep Time Series Forecasting (DTSF) in terms of model architectures. Besides, we take an innovative approach by focusing on the composition of time series and systematically explain important feature extraction methods. Additionally, we provide an overall compilation of datasets from various domains in existing works. Finally, we systematically emphasize the significant challenges faced and future research directions in this field.
2025-03-12 Minimal Time Series Transformer Joni-Kristian Kämäräinen et.al. 2503.09791 link 8 pages, 8 figures
Abstract (click to expand)Transformer is the state-of-the-art model for many natural language processing, computer vision, and audio analysis problems. Transformer effectively combines information from the past input and output samples in auto-regressive manner so that each sample becomes aware of all inputs and outputs. In sequence-to-sequence (Seq2Seq) modeling, the transformer processed samples become effective in predicting the next output. Time series forecasting is a Seq2Seq problem. The original architecture is defined for discrete input and output sequence tokens, but to adopt it for time series, the model must be adapted for continuous data. This work introduces minimal adaptations to make the original transformer architecture suitable for continuous value time series data.
2025-03-12 LLM-PS: Empowering Large Language Models for Time Series Forecasting with Temporal Patterns and Semantics Jialiang Tang, Shuo Chen, Chen Gong et.al. 2503.09656
Abstract (click to expand)Time Series Forecasting (TSF) is critical in many real-world domains like financial planning and health monitoring. Recent studies have revealed that Large Language Models (LLMs), with their powerful in-contextual modeling capabilities, hold significant potential for TSF. However, existing LLM-based methods usually perform suboptimally because they neglect the inherent characteristics of time series data. Unlike the textual data used in LLM pre-training, the time series data is semantically sparse and comprises distinctive temporal patterns. To address this problem, we propose LLM-PS to empower the LLM for TSF by learning the fundamental \textit{Patterns} and meaningful \textit{Semantics} from time series data. Our LLM-PS incorporates a new multi-scale convolutional neural network adept at capturing both short-term fluctuations and long-term trends within the time series. Meanwhile, we introduce a time-to-text module for extracting valuable semantics across continuous time intervals rather than isolated time points. By integrating these patterns and semantics, LLM-PS effectively models temporal dependencies, enabling a deep comprehension of time series and delivering accurate forecasts. Intensive experimental results demonstrate that LLM-PS achieves state-of-the-art performance in both short- and long-term forecasting tasks, as well as in few- and zero-shot settings.
2025-03-15 Data Driven Decision Making with Time Series and Spatio-temporal Data Bin Yang, Yuxuan Liang, Chenjuan Guo et.al. 2503.08473 This paper is accepted by ICDE 2025
Abstract (click to expand)Time series data captures properties that change over time. Such data occurs widely, ranging from the scientific and medical domains to the industrial and environmental domains. When the properties in time series exhibit spatial variations, we often call the data spatio-temporal. As part of the continued digitalization of processes throughout society, increasingly large volumes of time series and spatio-temporal data are available. In this tutorial, we focus on data-driven decision making with such data, e.g., enabling greener and more efficient transportation based on traffic time series forecasting. The tutorial adopts the holistic paradigm of "data-governance-analytics-decision." We first introduce the data foundation of time series and spatio-temporal data, which is often heterogeneous. Next, we discuss data governance methods that aim to improve data quality. We then cover data analytics, focusing on five desired characteristics: automation, robustness, generality, explainability, and resource efficiency. We finally cover data-driven decision making strategies and briefly discuss promising research directions. We hope that the tutorial will serve as a primary resource for researchers and practitioners who are interested in value creation from time series and spatio-temporal data.
2025-03-11 MFRS: A Multi-Frequency Reference Series Approach to Scalable and Accurate Time-Series Forecasting Liang Yu, Lai Tu, Xiang Bai et.al. 2503.08328 link
Abstract (click to expand)Multivariate time-series forecasting holds immense value across diverse applications, requiring methods to effectively capture complex temporal and inter-variable dynamics. A key challenge lies in uncovering the intrinsic patterns that govern predictability, beyond conventional designs, focusing on network architectures to explore latent relationships or temporal dependencies. Inspired by signal decomposition, this paper posits that time series predictability is derived from periodic characteristics at different frequencies. Consequently, we propose a novel time series forecasting method based on multi-frequency reference series correlation analysis. Through spectral analysis on long-term training data, we identify dominant spectral components and their harmonics to design base-pattern reference series. Unlike signal decomposition, which represents the original series as a linear combination of basis signals, our method uses a transformer model to compute cross-attention between the original series and reference series, capturing essential features for forecasting. Experiments on major open and synthetic datasets show state-of-the-art performance. Furthermore, by focusing on attention with a small number of reference series rather than pairwise variable attention, our method ensures scalability and broad applicability. The source code is available at: https://github.com/yuliang555/MFRS
2025-03-11 LangTime: A Language-Guided Unified Model for Time Series Forecasting with Proximal Policy Optimization Wenzhe Niu, Zongxia Xie, Yanru Sun et.al. 2503.08271
Abstract (click to expand)Recent research has shown an increasing interest in utilizing pre-trained large language models (LLMs) for a variety of time series applications. However, there are three main challenges when using LLMs as foundational models for time series forecasting: (1) Cross-domain generalization. (2) Cross-modality alignment. (3) Error accumulation in autoregressive frameworks. To address these challenges, we proposed LangTime, a language-guided unified model for time series forecasting that incorporates cross-domain pre-training with reinforcement learning-based fine-tuning. Specifically, LangTime constructs Temporal Comprehension Prompts (TCPs), which include dataset-wise and channel-wise instructions, to facilitate domain adaptation and condense time series into a single token, enabling LLMs to understand better and align temporal data. To improve autoregressive forecasting, we introduce TimePPO, a reinforcement learning-based fine-tuning algorithm. TimePPO mitigates error accumulation by leveraging a multidimensional rewards function tailored for time series and a repeat-based value estimation strategy. Extensive experiments demonstrate that LangTime achieves state-of-the-art cross-domain forecasting performance, while TimePPO fine-tuning effectively enhances the stability and accuracy of autoregressive forecasting.
2025-03-06 TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster Kanghui Ning, Zijie Pan, Yu Liu et.al. 2503.07649
Abstract (click to expand)Recently, Large Language Models (LLMs) and Foundation Models (FMs) have become prevalent for time series forecasting tasks. However, fine-tuning large language models (LLMs) for forecasting enables the adaptation to specific domains but may not generalize well across diverse, unseen datasets. Meanwhile, existing time series foundation models (TSFMs) lack inherent mechanisms for domain adaptation and suffer from limited interpretability, making them suboptimal for zero-shot forecasting. To this end, we present TS-RAG, a retrieval-augmented generation based time series forecasting framework that enhances the generalization capability and interpretability of TSFMs. Specifically, TS-RAG leverages pre-trained time series encoders to retrieve semantically relevant time series segments from a dedicated knowledge database, incorporating contextual patterns for the given time series query. Next, we develop a learnable Mixture-of-Experts (MoE)-based augmentation module, which dynamically fuses retrieved time series patterns with the TSFM's representation of the input query, improving forecasting accuracy without requiring task-specific fine-tuning. Thorough empirical studies on seven public benchmark datasets demonstrate that TS-RAG achieves state-of-the-art zero-shot forecasting performance, outperforming TSFMs by up to 6.51% across diverse domains and showcasing desired interpretability.
2025-03-10 FinTSBridge: A New Evaluation Suite for Real-world Financial Prediction with Advanced Time Series Models Yanlong Wang, Jian Xu, Tiantian Gao et.al. 2503.06928 ICLR 2025 Workshop Advances in Financial AI
Abstract (click to expand)Despite the growing attention to time series forecasting in recent years, many studies have proposed various solutions to address the challenges encountered in time series prediction, aiming to improve forecasting performance. However, effectively applying these time series forecasting models to the field of financial asset pricing remains a challenging issue. There is still a need for a bridge to connect cutting-edge time series forecasting models with financial asset pricing. To bridge this gap, we have undertaken the following efforts: 1) We constructed three datasets from the financial domain; 2) We selected over ten time series forecasting models from recent studies and validated their performance in financial time series; 3) We developed new metrics, msIC and msIR, in addition to MSE and MAE, to showcase the time series correlation captured by the models; 4) We designed financial-specific tasks for these three datasets and assessed the practical performance and application potential of these forecasting models in important financial problems. We hope the developed new evaluation suite, FinTSBridge, can provide valuable insights into the effectiveness and robustness of advanced forecasting models in finanical domains.
2025-03-10 Enhancing Time Series Forecasting via Logic-Inspired Regularization Jianqi Zhang, Jingyao Wang, Xingchen Shen et.al. 2503.06867
Abstract (click to expand)Time series forecasting (TSF) plays a crucial role in many applications. Transformer-based methods are one of the mainstream techniques for TSF. Existing methods treat all token dependencies equally. However, we find that the effectiveness of token dependencies varies across different forecasting scenarios, and existing methods ignore these differences, which affects their performance. This raises two issues: (1) What are effective token dependencies? (2) How can we learn effective dependencies? From a logical perspective, we align Transformer-based TSF methods with the logical framework and define effective token dependencies as those that ensure the tokens as atomic formulas (Issue 1). We then align the learning process of Transformer methods with the process of obtaining atomic formulas in logic, which inspires us to design a method for learning these effective dependencies (Issue 2). Specifically, we propose Attention Logic Regularization (Attn-L-Reg), a plug-and-play method that guides the model to use fewer but more effective dependencies by making the attention map sparse, thereby ensuring the tokens as atomic formulas and improving prediction performance. Extensive experiments and theoretical analysis confirm the effectiveness of Attn-L-Reg.
2025-03-08 A Novel Distributed PV Power Forecasting Approach Based on Time-LLM Huapeng Lin, Miao Yu et.al. 2503.06216 23 pages, 8 figures
Abstract (click to expand)Distributed photovoltaic (DPV) systems are essential for advancing renewable energy applications and achieving energy independence. Accurate DPV power forecasting can optimize power system planning and scheduling while significantly reducing energy loss, thus enhancing overall system efficiency and reliability. However, solar energy's intermittent nature and DPV systems' spatial distribution create significant forecasting challenges. Traditional methods often rely on costly external data, such as numerical weather prediction (NWP) and satellite images, which are difficult to scale for smaller DPV systems. To tackle this issue, this study has introduced an advanced large language model (LLM)-based time series forecasting framework Time-LLM to improve the DPV power forecasting accuracy and generalization ability. By reprogramming, the framework aligns historical power data with natural language modalities, facilitating efficient modeling of time-series data. Then Qwen2.5-3B model is integrated as the backbone LLM to process input data by leveraging its pattern recognition and inference abilities, achieving a balance between efficiency and performance. Finally, by using a flatten and linear projection layer, the LLM's high-dimensional output is transformed into the final forecasts. Experimental results indicate that Time-LLM outperforms leading recent advanced time series forecasting models, such as Transformer-based methods and MLP-based models, achieving superior accuracy in both short-term and long-term forecasting. Time-LLM also demonstrates exceptional adaptability in few-shot and zero-shot learning scenarios. To the best of the authors' knowledge, this study is the first attempt to explore the application of LLMs to DPV power forecasting, which can offer a scalable solution that eliminates reliance on costly external data sources and improve real-world forecasting accuracy.
2025-03-08 Fixing the Pitfalls of Probabilistic Time-Series Forecasting Evaluation by Kernel Quadrature Masaki Adachi, Masahiro Fujisawa, Michael A Osborne et.al. 2503.06079 11 pages, 6 figures
Abstract (click to expand)Despite the significance of probabilistic time-series forecasting models, their evaluation metrics often involve intractable integrations. The most widely used metric, the continuous ranked probability score (CRPS), is a strictly proper scoring function; however, its computation requires approximation. We found that popular CRPS estimators--specifically, the quantile-based estimator implemented in the widely used GluonTS library and the probability-weighted moment approximation--both exhibit inherent estimation biases. These biases lead to crude approximations, resulting in improper rankings of forecasting model performance when CRPS values are close. To address this issue, we introduced a kernel quadrature approach that leverages an unbiased CRPS estimator and employs cubature construction for scalable computation. Empirically, our approach consistently outperforms the two widely used CRPS estimators.
2025-03-07 TS-LIF: A Temporal Segment Spiking Neuron Network for Time Series Forecasting Shibo Feng, Wanjin Feng, Xingyu Gao et.al. 2503.05108
Abstract (click to expand)Spiking Neural Networks (SNNs) offer a promising, biologically inspired approach for processing spatiotemporal data, particularly for time series forecasting. However, conventional neuron models like the Leaky Integrate-and-Fire (LIF) struggle to capture long-term dependencies and effectively process multi-scale temporal dynamics. To overcome these limitations, we introduce the Temporal Segment Leaky Integrate-and-Fire (TS-LIF) model, featuring a novel dual-compartment architecture. The dendritic and somatic compartments specialize in capturing distinct frequency components, providing functional heterogeneity that enhances the neuron's ability to process both low- and high-frequency information. Furthermore, the newly introduced direct somatic current injection reduces information loss during intra-neuronal transmission, while dendritic spike generation improves multi-scale information extraction. We provide a theoretical stability analysis of the TS-LIF model and explain how each compartment contributes to distinct frequency response characteristics. Experimental results show that TS-LIF outperforms traditional SNNs in time series forecasting, demonstrating better accuracy and robustness, even with missing data. TS-LIF advances the application of SNNs in time-series forecasting, providing a biologically inspired approach that captures complex temporal dynamics and offers potential for practical implementation in diverse forecasting scenarios. The source code is available at https://github.com/kkking-kk/TS-LIF.
2025-03-06 Boltzmann convolutions and Welford mean-variance layers with an application to time series forecasting and classification Daniel Andrew Coulson, Martin T. Wells et.al. 2503.04956 40 pages, 7 figures, 11 tables
Abstract (click to expand)In this paper we propose a novel problem called the ForeClassing problem where the loss of a classification decision is only observed at a future time point after the classification decision has to be made. To solve this problem, we propose an approximately Bayesian deep neural network architecture called ForeClassNet for time series forecasting and classification. This network architecture forces the network to consider possible future realizations of the time series, by forecasting future time points and their likelihood of occurring, before making its final classification decision. To facilitate this, we introduce two novel neural network layers, Welford mean-variance layers and Boltzmann convolutional layers. Welford mean-variance layers allow networks to iteratively update their estimates of the mean and variance for the forecasted time points for each inputted time series to the network through successive forward passes, which the model can then consider in combination with a learned representation of the observed realizations of the time series for its classification decision. Boltzmann convolutional layers are linear combinations of approximately Bayesian convolutional layers with different filter lengths, allowing the model to learn multitemporal resolution representations of the input time series, and which resolutions to focus on within a given Boltzmann convolutional layer through a Boltzmann distribution. Through several simulation scenarios and two real world applications we demonstrate ForeClassNet achieves superior performance compared with current state of the art methods including a near 30% improvement in test set accuracy in our financial example compared to the second best performing model.
2025-03-06 Hedging with Sparse Reward Reinforcement Learning Yiheng Ding, Gangnan Yuan, Dewei Zuo et.al. 2503.04218
Abstract (click to expand)Derivatives, as a critical class of financial instruments, isolate and trade the price attributes of risk assets such as stocks, commodities, and indices, aiding risk management and enhancing market efficiency. However, traditional hedging models, constrained by assumptions such as continuous trading and zero transaction costs, fail to satisfy risk control requirements in complex and uncertain real-world markets. With advances in computing technology and deep learning, data-driven trading strategies are becoming increasingly prevalent. This thesis proposes a derivatives hedging framework integrating deep learning and reinforcement learning. The framework comprises a probabilistic forecasting model and a hedging agent, enabling market probability prediction, derivative pricing, and hedging. Specifically, we design a spatiotemporal attention-based probabilistic financial time series forecasting Transformer to address the scarcity of derivatives hedging data. A low-rank attention mechanism compresses high-dimensional assets into a low-dimensional latent space, capturing nonlinear asset relationships. The Transformer models sequential dependencies within this latent space, improving market probability forecasts and constructing an online training environment for downstream hedging tasks. Additionally, we incorporate generalized geometric Brownian motion to develop a risk-neutral pricing approach for derivatives. We model derivatives hedging as a reinforcement learning problem with sparse rewards and propose a behavior cloning-based recurrent proximal policy optimization (BC-RPPO) algorithm. This pretraining-finetuning framework significantly enhances the hedging agent's performance. Numerical experiments in the U.S. and Chinese financial markets demonstrate our method's superiority over traditional approaches.
2025-03-06 TimeFound: A Foundation Model for Time Series Forecasting Congxi Xiao, Jingbo Zhou, Yixiong Xiao et.al. 2503.04118
Abstract (click to expand)We present TimeFound, an encoder-decoder transformer-based time series foundation model for out-of-the-box zero-shot forecasting. To handle time series data from various domains, TimeFound employs a multi-resolution patching strategy to capture complex temporal patterns at multiple scales. We pre-train our model with two sizes (200M and 710M parameters) on a large time-series corpus comprising both real-world and synthetic datasets. Over a collection of unseen datasets across diverse domains and forecasting horizons, our empirical evaluations suggest that TimeFound can achieve superior or competitive zero-shot forecasting performance, compared to state-of-the-art time series foundation models.
2025-03-05 Graph-Augmented LSTM for Forecasting Sparse Anomalies in Graph-Structured Time Series Sneh Pillai et.al. 2503.03729 12 pages
Abstract (click to expand)Detecting anomalies in time series data is a critical task across many domains. The challenge intensifies when anomalies are sparse and the data are multivariate with relational dependencies across sensors or nodes. Traditional univariate anomaly detectors struggle to capture such cross-node dependencies, particularly in sparse anomaly settings. To address this, we propose a graph-augmented time series forecasting approach that explicitly integrates the graph of relationships among time series into an LSTM forecasting model. This enables the model to detect rare anomalies that might otherwise go unnoticed in purely univariate approaches. We evaluate the approach on two benchmark datasets - the Yahoo Webscope S5 anomaly dataset and the METR-LA traffic sensor network - and compare the performance of the Graph-Augmented LSTM against LSTM-only, ARIMA, and Prophet baselines. Results demonstrate that the graph-augmented model achieves significantly higher precision and recall, improving F1-score by up to 10% over the best baseline
2025-03-09 Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs Haoran Fan, Bin Li, Yixuan Weng et.al. 2503.03594 link 20 pages, 10 figures
Abstract (click to expand)While LLMs have demonstrated remarkable potential in time series forecasting, their practical deployment remains constrained by excessive computational demands and memory footprints. Existing LLM-based approaches typically suffer from three critical limitations: Inefficient parameter utilization in handling numerical time series patterns; Modality misalignment between continuous temporal signals and discrete text embeddings; and Inflexibility for real-time expert knowledge integration. We present SMETimes, the first systematic investigation of sub-3B parameter SLMs for efficient and accurate time series forecasting. Our approach centers on three key innovations: A statistically-enhanced prompting mechanism that bridges numerical time series with textual semantics through descriptive statistical features; A adaptive fusion embedding architecture that aligns temporal patterns with language model token spaces through learnable parameters; And a dynamic mixture-of-experts framework enabled by SLMs' computational efficiency, adaptively combining base predictions with domain-specific models. Extensive evaluations across seven benchmark datasets demonstrate that our 3B-parameter SLM achieves state-of-the-art performance on five primary datasets while maintaining 3.8x faster training and 5.2x lower memory consumption compared to 7B-parameter LLM baselines. Notably, the proposed model exhibits better learning capabilities, achieving 12.3% lower MSE than conventional LLM. Ablation studies validate that our statistical prompting and cross-modal fusion modules respectively contribute 15.7% and 18.2% error reduction in long-horizon forecasting tasks. By redefining the efficiency-accuracy trade-off landscape, this work establishes SLMs as viable alternatives to resource-intensive LLMs for practical time series forecasting. Code and models are available at https://github.com/xiyan1234567/SMETimes.
2025-03-04 SeqFusion: Sequential Fusion of Pre-Trained Models for Zero-Shot Time-Series Forecasting Ting-Ji Huang, Xu-Yang Chen, Han-Jia Ye et.al. 2503.02836 link
Abstract (click to expand)Unlike traditional time-series forecasting methods that require extensive in-task data for training, zero-shot forecasting can directly predict future values given a target time series without additional training data. Current zero-shot approaches primarily rely on pre-trained generalized models, with their performance often depending on the variety and relevance of the pre-training data, which can raise privacy concerns. Instead of collecting diverse pre-training data, we introduce SeqFusion in this work, a novel framework that collects and fuses diverse pre-trained models (PTMs) sequentially for zero-shot forecasting. Based on the specific temporal characteristics of the target time series, SeqFusion selects the most suitable PTMs from a batch of pre-collected PTMs, performs sequential predictions, and fuses all the predictions while using minimal data to protect privacy. Each of these PTMs specializes in different temporal patterns and forecasting tasks, allowing SeqFusion to select by measuring distances in a shared representation space of the target time series with each PTM. Experiments demonstrate that SeqFusion achieves competitive accuracy in zero-shot forecasting compared to state-of-the-art methods.
2025-03-04 Lightweight Channel-wise Dynamic Fusion Model: Non-stationary Time Series Forecasting via Entropy Analysis Tianyu Jia, Zongxia Xie, Yanru Sun et.al. 2503.02609
Abstract (click to expand)Non-stationarity is an intrinsic property of real-world time series and plays a crucial role in time series forecasting. Previous studies primarily adopt instance normalization to attenuate the non-stationarity of original series for better predictability. However, instance normalization that directly removes the inherent non-stationarity can lead to three issues: (1) disrupting global temporal dependencies, (2) ignoring channel-specific differences, and (3) producing over-smoothed predictions. To address these issues, we theoretically demonstrate that variance can be a valid and interpretable proxy for quantifying non-stationarity of time series. Based on the analysis, we propose a novel lightweight \textit{C}hannel-wise \textit{D}ynamic \textit{F}usion \textit{M}odel (\textit{CDFM}), which selectively and dynamically recovers intrinsic non-stationarity of the original series, while keeping the predictability of normalized series. First, we design a Dual-Predictor Module, which involves two branches: a Time Stationary Predictor for capturing stable patterns and a Time Non-stationary Predictor for modeling global dynamics patterns. Second, we propose a Fusion Weight Learner to dynamically characterize the intrinsic non-stationary information across different samples based on variance. Finally, we introduce a Channel Selector to selectively recover non-stationary information from specific channels by evaluating their non-stationarity, similarity, and distribution consistency, enabling the model to capture relevant dynamic features and avoid overfitting. Comprehensive experiments on seven time series datasets demonstrate the superiority and generalization capabilities of CDFM.
2025-03-03 Unify and Anchor: A Context-Aware Transformer for Cross-Domain Time Series Forecasting Xiaobin Hong, Jiawen Zhang, Wenzhong Li et.al. 2503.01157 20 pages, 12 figures, 8 tables, conference under review
Abstract (click to expand)The rise of foundation models has revolutionized natural language processing and computer vision, yet their best practices to time series forecasting remains underexplored. Existing time series foundation models often adopt methodologies from these fields without addressing the unique characteristics of time series data. In this paper, we identify two key challenges in cross-domain time series forecasting: the complexity of temporal patterns and semantic misalignment. To tackle these issues, we propose the ``Unify and Anchor" transfer paradigm, which disentangles frequency components for a unified perspective and incorporates external context as domain anchors for guided adaptation. Based on this framework, we introduce ContexTST, a Transformer-based model that employs a time series coordinator for structured representation and the Transformer blocks with a context-informed mixture-of-experts mechanism for effective cross-domain generalization. Extensive experiments demonstrate that ContexTST advances state-of-the-art forecasting performance while achieving strong zero-shot transferability across diverse domains.

(back to top)