| 1 |
Beyond Benchmark Islands: Toward Representative Trustworthiness Evaluation for Agentic AI
Jinhu Qi, Yifan Li, ... (+5 more)
|
|
cs.CL
|
0 |
2 months ago |
| 2 |
SciZoom: A Large-scale Benchmark for Hierarchical Scientific Summarization across the LLM Era
Han Jang, Junhyeok Lee, Kyu Sung Choi
|
|
cs.CL
|
0 |
2 months ago |
| 3 |
EviCare: Enhancing Diagnosis Prediction with Deep Model-Guided Evidence for In-Context Reasoning
Hengyu Zhang, Xuyun Zhang, ... (+6 more)
|
|
cs.CL
|
0 |
1 month ago |
| 4 |
SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
Han Liu, Haotian Gao, ... (+6 more)
|
|
cs.CL
|
0 |
2 months ago |
| 5 |
When Does Data Augmentation Help? Evaluating LLM and Back-Translation Methods for Hausa and Fongbe NLP
Mahounan Pericles Adjovi, Roald Eiselen, Prasenjit Mitra
|
|
cs.CL
|
0 |
1 month ago |
| 6 |
End-to-End Learning for Partially-Observed Time Series with PyPOTS
Wenjie Du, Yiyuan Yang, ... (+2 more)
|
|
cs.LG
|
0 |
1 month ago |
| 7 |
On Reasoning Behind Next Occupation Recommendation
Shan Dong, Palakorn Achananuparp, ... (+4 more)
|
|
cs.CL
|
0 |
1 month ago |
| 8 |
DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories
Neemesh Yadav, Palakorn Achananuparp, ... (+2 more)
|
|
cs.CL
|
0 |
1 month ago |
| 9 |
SENSE: Satellite-based ENergy Synthesis for Sustainable Environment
Kailai Sun, Mingyi He, ... (+6 more)
|
|
cs.CV
|
0 |
23 days ago |
| 10 |
TeleCom-Bench: How Far Are Large Language Models from Industrial Telecommunication Applications?
Jieting Xiao, Yun Lin, ... (+11 more)
|
|
cs.AI
|
0 |
23 days ago |
| 11 |
BacktestBench: Benchmarking Large Language Models for Automated Quantitative Strategy Backtesting
Zhensheng Wang, Wenmian Yang, ... (+4 more)
|
|
cs.CL
|
0 |
23 days ago |
| 12 |
Uncertainty-Calibrated Recommendations for Low-Active Users
Bob Junyi Zou, Sai Li, ... (+3 more)
|
|
cs.IR
|
0 |
23 days ago |
| 13 |
Text-Guided Visual Representation Learning for Robust Multimodal E-Commerce Recommendation
Yufei Guo, Jing Ma, ... (+6 more)
|
|
cs.IR
|
0 |
24 days ago |
| 14 |
Rethinking Weak Supervision in Anomaly Detection: A Comprehensive Benchmark
Xu Yao, Siyuan Zhou, ... (+7 more)
|
|
cs.LG
|
0 |
16 days ago |
| 15 |
Causal methods for LLM development and evaluation
Dennis Frauen, Marie Brockschmidt, ... (+11 more)
|
|
cs.LG
|
0 |
16 days ago |
| 16 |
NPSolver: Neural Poisson Solver with Iterative Physics Supervision
Bocheng Zeng, Rui Zhang, ... (+6 more)
|
|
cs.LG
|
0 |
16 days ago |
| 17 |
DeGRe: Dense-supervised Generative Reranking for Recommendation
Chaotian Song, Jingyao Zhang, ... (+7 more)
|
|
cs.IR
|
0 |
16 days ago |
| 18 |
Learning Latent Dynamical Causal Processes for Single-Cell Perturbation Prediction
Wenkang Jiang, Yuhang Liu, ... (+4 more)
|
|
cs.LG
|
0 |
16 days ago |
| 19 |
MindAdapter: Few-Shot Parameter-Efficient Residual Calibration of Cross-Subject Brain-to-Visual Decoding Models
Jiaxiang Liu, Jiawei Du, ... (+5 more)
|
|
cs.CV
|
0 |
18 days ago |
| 20 |
VaaWIT: Visual-Aware Adaptation of Large Language Models for Multilingual Web Image Translation
Bo Li, Ronghao Chen, ... (+4 more)
|
|
cs.CV
|
0 |
18 days ago |
| 21 |
Treatment Effect Estimation with Differentiated Networked Effect on Graph Data
Xiaofeng Lin, Han Bao, Hisashi Kashima
|
|
cs.LG
|
0 |
18 days ago |
| 22 |
Dynamic Spectral Denoising with Global-Context Attention for Multi-Behavior Recommendation
Miaomiao Cai, Yunshan Ma, ... (+6 more)
|
|
cs.IR
|
0 |
9 days ago |
| 23 |
TimeBlocks: Foundational and Continual Time-Series Blockbase -- Extended Version
David Campos, Bin Yang, ... (+4 more)
|
|
cs.LG
|
0 |
9 days ago |
| 24 |
G2LoRA: Gradient Orthogonal Low-Rank Adaptation Framework for Graph Continual Learning on Text-Attributed Graphs
Yuhan Wang, Yibo Ding, ... (+5 more)
|
|
cs.LG
|
0 |
9 days ago |
| 25 |
Scalable Counterfactual Risk Estimation for Rare Events in Longitudinal Data
Xiaohui Yin, Avijit Mitra, ... (+3 more)
|
|
stat.ME
|
0 |
9 days ago |
| 26 |
When Hard Negatives Hurt: Bridging the Generative-Discriminative Gap in Hard Negative Synthesis for Retrieval
Zhicheng Zhang, Jiwei Tang, ... (+10 more)
|
|
cs.LG
|
0 |
10 days ago |
| 27 |
TravelEval: A Comprehensive Benchmarking Framework for Evaluating LLM-Powered Travel Planning Agents
Weiyi Chen, Shuaixiong Wang, ... (+6 more)
|
|
cs.AI
|
0 |
10 days ago |
| 28 |
ProductWebGen: Benchmarking Multimodal Product Webpage Generation
Zhihong Liu, Siqi Kou, ... (+6 more)
|
|
cs.CV
|
0 |
10 days ago |
| 29 |
PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects
Sicheng Yang, Shulan Ruan, ... (+5 more)
|
|
cs.CL
|
0 |
10 days ago |
| 30 |
NBQ: Next-Best-Question for Dynamic Profiling
Yimin Shi, Clarice Wang, ... (+2 more)
|
|
cs.AI
|
0 |
11 days ago |
| 31 |
SCOPE: Cost-Efficient Model Selection for Compound AI Systems under Quality Constraints
Yiqian Huang, Shiqi Zhang, ... (+2 more)
|
|
cs.DB
|
0 |
11 days ago |
| 32 |
One Model, Multiple Goals: Adaptive Multi-Objective Learning for E-commerce Dialogue Systems
Mingzhe Li, Jing Xiang, ... (+6 more)
|
|
cs.CL
|
0 |
2 days ago |
| 33 |
Cross-Source Reasoning-based Correction for Author Name Disambiguation
Fanjin Zhang, Yunhe Pang, ... (+5 more)
|
|
cs.CL
|
0 |
3 days ago |
| 34 |
Explaining Black-Box Language Models: Learning to Optimize Linguistically-Structured Word Subsets
Minyoung Hwang, Seokhyun Lee, Changhee Lee
|
|
cs.AI
|
0 |
3 days ago |
| 35 |
Biological Reasoning-Informed Regression for Interpretable Regulatory DNA Activity Prediction
Yi Duan, Zhao Yang, ... (+4 more)
|
|
q-bio.GN
|
0 |
4 days ago |
| 36 |
SafeECGMatch: Calibration-Aware Joint Frequency and Time Space Semi-Supervised Learning for Open-Set ECG Classification
Hongkyu Koh, Ikbeom Jang
|
|
cs.LG
|
0 |
4 days ago |
| 37 |
Unsupervised Continual Clustering via Forward-Backward Knowledge Distillation
Mohammadreza Sadeghi, Sareh Soleimani, ... (+2 more)
|
|
cs.LG
|
0 |
5 days ago |
| 38 |
Video-Based Prediction of In-Flight Particle Characteristics in Atmospheric Plasma Spraying
Abhijeet Praveen, Sareh Soleimani, ... (+5 more)
|
|
cs.LG
|
0 |
5 days ago |
| 39 |
DEFINED: A Data-Efficient Computational Framework for Fine-Grained Creativity Assessment in Debate Scenarios
Tongzhou Yu, Mingjia Li, ... (+7 more)
|
|
cs.LG
|
0 |
5 days ago |
| 40 |
The Sim-to-Real Gap of Foundation Model Agents: A Unified MDP Perspective
Xiaoou Liu, Tiejin Chen, ... (+3 more)
|
|
cs.AI
|
0 |
5 days ago |
| 41 |
SVHighlights: Towards Extremely Long Sport Video Highlight Detection
Donggyu Lee, Youngbin Ki, ... (+2 more)
|
|
cs.CV
|
0 |
5 days ago |
| 42 |
A Sliced-Wasserstein Framework on Correlation Matrices for EEG Decoding
Chen Hu, Rui Wang, ... (+5 more)
|
|
cs.LG
|
0 |
6 days ago |
| 43 |
Causal Scaffolding for Physical Reasoning: A Benchmark for Causally-Informed Physical World Understanding in VLMs
Tianyi Tang, Zhuoyi Lin, ... (+5 more)
|
|
cs.DB
|
0 |
6 days ago |
| 44 |
CausalPOI: Spatio-Temporal Graph-Based Causal Modeling for Cold-Start POI Check-in Forecasting
Zhaoqi Zhang, Miao Xie, ... (+4 more)
|
|
cs.LG
|
0 |
7 days ago |
| 45 |
AdaKoop: Efficient Modeling of Nonlinear Dynamics from Nonstationary Data Streams with Koopman Operator Regression
Naoki Chihara, Ren Fujiwara, ... (+2 more)
|
|
cs.LG
|
0 |
7 days ago |
| 46 |
ALINC: Active Learning for Inductive Node Classification via Graph Sampling
Pascal Plettenberg, Denis Huseljic, ... (+3 more)
|
|
cs.LG
|
0 |
7 days ago |
| 47 |
SCI-PRM: A Tool Aware Process Reward Model for Scientific Reasoning Verification
Xiangyu Zhao, Hengyuan Zhao, ... (+8 more)
|
|
cs.AI
|
0 |
7 days ago |
| 48 |
MapAgent: An Industrial-Grade Agentic Framework for City-scale Lane-level Map Generation
Deguo Xia, Zihan Li, ... (+7 more)
|
|
cs.AI
|
0 |
7 days ago |
| 49 |
Expectations vs. Realities: The Cost of MSE-Optimal Forecasting Under Conditional Uncertainty
Riku Green, Zahraa S. Abdallah, Telmo M Silva Filho
|
|
cs.LG
|
0 |
7 days ago |
| 50 |
Stationarity-Aware Retrieval-Augmented Time Series Forecasting
Shiqiao Zhou, Holger Schöner, ... (+4 more)
|
|
cs.LG
|
0 |
8 days ago |