OpenAssistant – an open Alternative to ChatGPT: Conversational AI for Everyone
The LAION community, together with Andreas Köpf and Yannic Kilcher, built an open-source alternative to ChatGPT: OpenAssistant is available with its data sets.
- Silke Hahn
(Diesen Artikel gibt es auch auf Deutsch.)
OpenAssistant has been released under the slogan "Conversational AI for everyone": According to its publishers, it is the first fully open-source, instruction-tuned model trained on human data (a qualifying note on this below). The open-source AI chatbot enters the race as an open alternative to ChatGPT. Behind the project is the association for open-source AI LAION e.V. (Large-Scale Artificial Intelligence Network), whose data sets made possible, among other things, Stable Diffusion, an open AI system for image synthesis that has been in circulation since August 2022. In addition to the chatbot, the publishers are handing over the dataset used for training as well as several pre-trained models to the public as open source. The code and data are freely available on Hugging Face.
Freely available data set as a tool for AI developers
The driving force behind the project are developer Andreas Köpf and YouTube-known tech influencer Yannic Kilcher. Together with the LAION community, they had collected text-based input and feedback over the past months to create a high-quality training data set. According to the publishers, it is a developer tool for creating more contemporary (State-of-the-Art, SOTA) models.
The data covers a wide range of topics and writing styles, and over 600,000 human-generated data points are said to have been incorporated into the dataset and model training, according to Kilcher. Like LAION's large image datasets, the conversationally calibrated dataset is intended as a starting point for training further language models and AI applications, and developer projects can freely access it. All models are licensed under the Apache 2.0 licence (except for those based on LLaMA, which the project has not yet published due to licensing issues).