Are LLMs good at multi-label classification?
Published:
In machine learning, multi-label classification is a challenging task. It involves assigning multiple labels to an instance, which is a common requirement in many real-world scenarios. One such application is the classification of research papers into multiple categories based on their content. This exploratory project discusses the use of LLMs for this task, with a specific focus using the Mistral LLM.
What are LLMs?
Large Language Models (LLMs) are a type of AI/NLP model that have been trained on a diverse range of internet text. They are capable of generating human-like text based on the input provided to them. LLMs have shown remarkable performance in a variety of tasks, including translation, question-answering, and even writing code.
Multi-label Classification with LLMs
The use of LLMs for multi-label classification is a relatively new approach. The idea is to provide the LLM with a prompt that includes the content to be classified, and then use the model’s output to determine the appropriate labels. This approach can be particularly effective when the LLM is fine-tuned on a specific task or domain. However, I am interested in the zero-shot abilities of the LLM in classification which would prove to be cheaper and faster in terms of output.
In the case of classifying research papers, the prompt provided to the LLM includes the title and abstract of the paper. The LLM then generates a response which would be the relevant categorie(s) for the paper.
Case Study: Classifying Research Papers with Mistral LLM
I tried an experiment, using Mistral LLM for the multi-label classification of research papers. The model was provided with a prompt that included the title and abstract of each paper. Chain of thought (COT) prompting style was used, which involves providing the model with a series of examples, allowing it to generate more accurate and contextually relevant responses.
The inference was done using vLLM which is a fast and easy-to-use library for LLM inference and serving. vLLM is flexible and easy to use with and seamlessly supports many Hugging Face models. The results of this experiment were impressive, with the model achieving an accuracy of around 80%. This demonstrates the potential of LLMs for multi-label classification tasks, with room for improvement on the accuracy.
Code: Repo
Conclusion
This case study highlights the potential of LLMs for multi-label classification tasks. While the approach is still relatively new, the results are promising, suggesting that LLMs could play a significant role in the future of machine learning and AI.
It’s important to note that while LLMs can be powerful tools, they are not a one-size-fits-all solution. The effectiveness of an LLM for a particular task will depend on a variety of factors, including the quality of the input data, the specifics of the task, and the way the model is used. Nevertheless, the success of the Mistral LLM in this experiment is a positive sign for the future of LLMs in multi-label classification.