PITTI - Article - Fast JSON Decoding for Local LLMs with Compressed Finite State Machine

Fast JSON Decoding for Local LLMs with Compressed Finite State Machine

Artificial Intelligence,Information Processing | Computing

Date : 2024-02-05

Description

This summary was drafted with mixtral-8x7b-instruct-v0.1.Q5_K_M.gguf

Liangsheng Yin, Ying Sheng, and Lianmin Zheng (lmsys) present a novel optimization for constrained decoding of JSON or YAML in local LLMs (large language models). The method utilizes a compressed finite state machine that can be applied to any regular expression, accommodating any JSON or YAML schema. By analyzing the finite state machine of a regular expression and compressing singular transition paths, this approach decodes multiple tokens in a single step whenever feasible, significantly accelerating the decoding process. This optimization also makes constrained decoding even faster than normal decoding. The authors compare their method with existing systems such as guidance + llama.cpp and outlines + vLLM, demonstrating up to 2x reduction in latency and a 2.5x boost in throughput.

Read article here

Link

Artificial Intelligence : what everyone can agree on