2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
Hyatt Regency Waikiki Beach Resort and Spa 2424 Kalākaua Ave, HonoluluWeek of Events
IEEE Hawaii YP December ExCom
December ExCom 733 Bishop Street, Suite 2000, Honolulu, Hawaii, United States, Virtual: https://events.vtools.ieee.org/m/485247
Uncanny Discourse: A new perspective of reality-bending AI's impact on computer-human interaction
Ever have a frustrating or even bizarre conversation online? Helpdesk, chatbot, scammer - any of these can make you feel uncomfortable. Not knowing whether something is human or not can be uncanny. This is interaction, or discourse, that structures power and control. We create a new concept, Uncanny Discourse, that frames computer and human interaction, informed by the reality-bending nature of AI. Learn how to face the challenging space of designing interactive systems in the age of AI by grappling with Uncanny Discourse. Mitigate it or lean in and run with it. But know that it is out there, weird and waiting. Co-sponsored by: Mantech Speaker(s): David Conner, Room: Suite 103, Bldg: SALT at Our Kaka'ako, 680 Ala Moana Blvd, #609, Honolulu, Hawaii, United States, 96822
High Performance Inferencing for LLMs
[] Inferencing has become ubiquitous across cloud, regional, edge, and device environments, powering a wide spectrum of AI use cases spanning vision, language, and traditional machine learning applications. In recent years, Large Language Models (LLMs), initially developed for natural language tasks, have expanded to multimodal applications including vision speech, reasoning and planning each demanding distinct service-level objectives (SLOs). Achieving high-performance inferencing for such diverse workloads requires both model-level and system-level optimizations. This talk focuses on system-level optimization techniques that maximize token throughput , achieve user experience metrics and inference service-provider efficiency. We review several recent innovations including KV caching, Paged/Flash/Radix Attention, Speculative Decoding, P/D Disaggregation, KV Routing and Parallelism, and explain how these mechanisms enhance performance by reducing latency, memory footprint, and compute overhead. These techniques are implemented in leading open-source inference frameworks such as vLLM, SGLang, Hugging Face TGI, and NVIDIA’s TensorRT-llm, which form the backbone of large-scale public and private LLM serving platforms. Attendees will gain a practical understanding of the challenges in delivering scalable, low-latency LLM inference, and of the architectural and algorithmic innovations driving next-generation high-performance inference systems. Virtual: https://events.vtools.ieee.org/m/516797