MENU
VIDEO | Nathan Labenz sits down with Ronen Eldan and Yuanzhi Li of Microsoft Research to discuss the small natural language dataset they created called TinyStories.
EURACTIV - Just as the timing of the world’s first AI treaty starts aligning with the EU legislative agenda, an American-led push to exclude private companies might make it not worth the paper it is written on.
Subhabrata Mukherjee, Arindam Mitra, Ganesh Jawahar, Sahaj Agarwal, Hamid Palangi and Ahmed Awadallah introduce Orca, a 13-billion parameter model that learns to imitate the reasoning process of LFMs. Orca learns from rich signals from GPT-4 including explanation traces; step-by-step thought processes; and other complex instr...
VIDEO | Yannic Kilcher takes a look at RWKV, a highly scalable architecture between Transformers and RNNs.
Kihyuk Sohn, Nataniel Ruiz, Kimin Lee, Daniel Castro Chin, Irina Blok, Huiwen Chang, Jarred Barber, Lu Jiang, Glenn Entis, Yuanzhen Li, Yuan Hao, Irfan Essa, Michael Rubinstein and Dilip Krishnan (Google Research) present StyleDrop that enables the generation of images that faithfully follow a specific style, powered by Muse,...
Stella Biderman shares resources where she documents key features of LLMs
Paul S. Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen, Ethan Cohen, Aidan J. Dempster, Nathalie Verlinde, Elad Yundler, David Weisberg, Kenneth A. Norman and Tanishq Mathew Abraham present MindEye, a novel fMRI-to-image approach to retrieve and reconstruct viewed images from brain activity.
Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Yarin Gal, Nicolas Papernot and Ross Anderson find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear.
Zichang Liu, Aditya Desai, Fangshuo Liao, Weitao Wang, Victor Xie, Zhaozhuo Xu, Anastasios Kyrillidis and Anshumali Shrivastava propose Scissorhands, a system that maintains the memory usage of the KV cache at a fixed budget without finetuning the model.
Yao Yao, Zuchao Li and Hai Zhao propose Graph-of-Thought (GoT) reasoning, which models human thought processes not only as a chain but also as a graph. By representing thought units as nodes and connections between them as edges, the approach captures the non-sequential nature of human thinking and allows for a more realistic...
Daniel Miessler provides a first thrust at a framework for thinking about how to attack AI systems.
Guanzhi Wang and Yuqi Xie and Yunfan Jiang and Ajay Mandlekar and Chaowei Xiao and Yuke Zhu and Linxi Fan and Anima Anandkumar introduce VOYAGER, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human interven...
Arnav Gudibande, Eric Wallace, Charlie Snell, Xinyang Geng, Hao Liu, Pieter Abbeel, Sergey Levine and Dawn Song critically analyze the emerging method to cheaply improve a weaker language model is to finetune it on outputs from a stronger model, such as a proprietary system like ChatGPT (e.g., Alpaca, Self-Instruct, and others).
About six-in-ten U.S. adults (58%) are familiar with ChatGPT, though relatively few have tried it themselves, according to a Pew Research Center survey conducted in March. Among those who have tried ChatGPT, a majority report it has been at least somewhat useful
Nature - Chia-Yen Chen, Ruoyu Tian, Tian Ge, Max Lam, Gabriela Sanchez-Andrade, Tarjinder Singh, Lea Urpa, Jimmy Z. Liu, Mark Sanderson, Christine Rowley, Holly Ironfield, Terry Fang, Biogen Biobank Team, The SUPER-Finland study, The Northern Finland Intellectual Disability study, Mark Daly, Aarno Palotie, Ellen A. Tsai, Hail...
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman and Luke Zettlemoyer present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance.
Fuzhao Xue, Yao Fu, Wangchunshu Zhou, Zangwei Zheng and Yang You empirically investigate three key aspects under the approach of repeating the pre-training data for additional epochs
Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón and Sumit Sanghai introduce grouped-query attention (GQA), a generalization of multi-query attention which uses an intermediate (more than one, less than number of query heads) number of key-value heads.
Yoshua Bengio, most known for his pioneering work in deep learning, earning him the 2018 A.M. Turing Award, “the Nobel Prize of Computing,” with Geoffrey Hinton and Yann LeCun, analyses the AI risk
Hritvik Taneja, Jason Kim, Jie Jeff Xu, Stephan van Schaik, Daniel Genkin and Yuval Yarom investigate the susceptibility of Arm SoCs and GPUs to information leakage via power, temperature and frequency, as measured via internal sensors
Xingang Pan, Ayush Tewari, Thomas Leimkühler, Lingjie Liu, Abhimitra Meka, Christian Theobalt propose DragGAN, which consists of two main components including: 1) a feature-based motion supervision that drives the handle point to move towards the target position, and 2) a new point tracking approach that leverages the discrim...
VIDEO | Aza Raskin, co-founder of Earth Species Project, shares how the latest advances in AI help us to better understand and learn from other species
Hackable implementation of state-of-the-art open-source LLMs based on nanoGPT. Supports flash attention, 4-bit and 8-bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed
Zijiao Chen, Jiaxin Qing and Juan Helen Zhou propose Mind-Video that learns spatiotemporal information from continuous fMRI data of the cerebral cortex progressively through masked brain modeling, multimodal contrastive learning with spatiotemporal attention, and co-training with an augmented Stable Diffusion model that incor...
Joseph Thacker (rez0) breaks down and explains the best self-contained proof of concept for how indirect prompt injection can lead to plugin-hijacking with severe consequences.