FINE-TUNING LARGE LANGUAGE MODELS
FINE-TUNING LARGE LANGUAGE MODELS
- September 7, 2023
- 5 Min Read
- Arun Krishnan
INTRODUCTION
One of the benefits of large language models (LLMs) is that they are widely applicable across a range of different applications. Unlike traditional Machine Learning approaches where the model has to be trained separately for different types of datasets, LLMs, because of their architecture and the vast corpus of data that they have been trained on, can be used without any further training.
However, there are still situations where LLMs need to be trained on data specific to organizations. The RAG (Retrieval Augmented Generation) method that suffices for most applications, still suffers from the small context window that LLMs provide. Open source LLMs typically have 4K-token windows while GPT has windows ranging from 4K-32K tokens. What this implies is that for organizations having large amounts of domain-specific data, the responses returned using RAG will not be complete since the context size is a limitation.
FINE TUNING
This is where model fine-tuning makes its appearance. One of the other benefits of LLMs is the ability to do transfer learning. This involves fine-tuning a pretrained model on a smaller, task-specific dataset to achieve high performance on the task. However, many fine-tuning approaches are still computationally intensive or pose a trade-off between efficiency and model quality.
In order to reduce the time, compute and memory complexity of fine tuning models, Houlsby et. al. came up with an approach called Parameter Efficient Fine Tuning (PEFT). PEFT uses a new architecture that works with the original transformer architecture by modifying it. The authors introduced the concept of adapters which are interleaved with the transformer architecture as shown in the figure below:
The adaptors are the only ones whose weights are re-trained. All previously trained weights of the original transformer model remain fixed, thereby ensuring that the previous learning is not lost.
LoRA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS
To make this process more efficient, Hu et. al. proposed an approach called LoRA that “freezes the pretrained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks”. In other words, they reduce the number of trainable parameters in the adaptors. They also modify the architecture. Whereas Houlsby and team had used a serial approach by interleaving their adapters within the transformer modules, Hu and co-workers take a different approach as shown below:
Figure credit: https://arxiv.org/pdf/2304.01933.pdf
The adapters (A and B, on the right extreme) are placed in parallel to the pre-trained transformers, and are added at the end. This leads to higher validation accuracies at much lower numbers of trainable parameters as shown below:
The pink triangles are the performance by LoRA and as you can see, higher validation accuracies are achieved at significantly lower numbers of trainable parameters. This also implies that the costs associated with fine-tuning can be significantly reduced.
CONCLUSION
For nearly 95% of all use cases that are currently being observed, the RAG approach suffices to provide solutions that work remarkably well. However, LLMs, however large they are, still have an inherent limitation in terms of the context size. For certain cases, it is much better to fine tune models using data specific for a task. This however, can become quite expensive. Researchers have fortunately been working on ways and techniques to make this process more efficient. Approaches like PEFT and LoRA can help to make the fine-tuning process truly efficient and reduce time, computation and memory costs.
REFERENCES
https://doi.org/10.48550/arXiv.2106.09685
About Author
SHARE
Related Blog Posts
Top Technology Trends in Automotive Industry
Technological innovations like AI, autonomous vehicles, and AR are revolutionizing the automotive in...
Telecom Industry Trends: Shaping the Future in 2024
Explore the top 5 telecom trends for 2024: 5G expansion, network virtualization, edge computing, cyb...
6 Benefits of Adopting Low-Code No-Code Platforms for Businesses
Unlock business potential with low-code/no-code platforms: fast development, cost savings, accessibi...
Revolutionizing Industries with Power Platform: Case Studies and Insights
Explore transformative technologies like AI, Quantum Computing, and Industry Cloud Platforms, set to...
Top 6 Emerging Technologies in 2024: A Glimpse into the Future
Explore transformative technologies like AI, Quantum Computing, and Industry Cloud Platforms, set to...
Top Technology Trends of 2023: A Year in Review
Explore 2023's pivotal tech trends: Generative AI's impact, Blockchain's trust-building, Low/No Code...
How Top Industries can benefit most from Data Science & AI
Explore the revolutionary role of Data Science and AI in propelling industries forward. From reimagi...
6 Guided Strategies for Microsoft Power Platform Implementation
The Microsoft Power Platform offers organizations the ability to accelerate digital transformation w...
Choosing the Right Cybersecurity Services Partner: Step-by-Step Guide
In this blog, we'll guide you through the crucial process of selecting the perfect cybersecurity all...
The Value of Regular Security Audits: Safeguarding Your Digital Fortress
Imagine your company's digital infrastructure as a castle and its data as your most treasured posses...
Cybersecurity Awareness Training: Arm Your Team Against Digital Threats
While most organizations invest in state-of-the-art security solutions, there’s often an overlooked...
The Financial Impact of Cyber Breaches on Businesses: Direct & Hidden Expenses
Cyber breaches cost businesses millions, with both immediate and long-term financial impacts. Beyond...
How Technology can help to Bolster Employee Engagement and Happiness
Unlock employee happiness and engagement with technology. Discover strategies like flexible work, co...
Why is Beak the Ultimate AI-Based Solution for Your IT Infrastructure Challenges
Discover Beak - An Intelligent GPS for Infrastructure Monitoring, SOC, NOC & RMM. Streamline ope...
Microsoft Fabric: Unleashing the Power of Next-generation Data Analytics with AI Capabilities
Explore Microsoft Fabric, the cutting-edge data analytics platform that combines AI capabilities wit...
Streamlining Your Migration from Crystal Reports to Power BI
iLink Digital specializes in seamless Crystal Reports to Power BI migration. Explore feature compari...
Streamline Your Business with ServiceNow Bonding: Simplifying Integrations
In today's interconnected business landscape, seamless data exchange between systems is crucial for...
A New Way of Building Attended Automations with UiPath Apps, UiPath Forms & Triggers, and FromIo
Building attended automation is crucial for businesses seeking operational efficiency and improved u...
Conversation AI Vs. Generative AI: Decoding the Difference
In this blog post, we delve into the unique realms of conversational AI and generative AI. We explor...
5 Tips to keep your Salesforce Org Health in Top Shape
As a business leader, it's crucial to prioritize the health of your Salesforce org to ensure optimal...
Ace your Qlik to Power BI Migration in 10 Steps
Are you planning to migrate from Qlik to Power BI? The process can be challenging, requiring careful...
Why Your Business Should Migrate from Cognos to Power BI?
Learn why businesses are choosing to migrate from Cognos to Power BI and how it can maximize the val...
Aligning DevOps with AWS: Development Stage [Part 4 of 9]
Discover the power of DevOps with AWS in the Development stage! Leverage services like AWS Cloud9, C...
Modernization to Elevate IT Resilience: Answering Why & How?
Discover how modernizing your systems can significantly improve your business's IT resilience. In to...
Chatbots for Customer Service: A Must in 2023?
Driven by artificial intelligence, chatbots are shaping the future of customer service with their tr...
5 Strategies for Maximizing Business Value on Your Cloud Journey
In today’s digital era, harnessing the power of the cloud has become an indispensable element for bu...
Maximizing Revenue and Driving Growth with Salesforce Revenue Intelligence
In today's data-driven business landscape, maximizing revenue and driving growth is crucial for comp...
Why Power BI is a Game-Changer for Your Business Intelligence Needs
Power BI is a powerful business intelligence tool that enables organizations to make data-driven dec...
MULTI-TENANCY ON OUTSYSTEMS: Answering How & Why?
OutSystems is a low-code platform that offers multi-tenancy support, a critical feature for modern a...
Greening the Cloud: How Cloud Computing Can Help the Environment?
Cloud migrations have the potential to reduce energy consumption by 65% and carbon emissions by 84%...
Conversational AI in Insurance Industry: Top Use Cases to Explore
Looking to explore the potential of Conversational AI in the insurance industry? Our in-depth blog p...
5 Ways Companies can lower their Carbon Footprint and Contribute to a Greener Future
As we navigate through the climate crisis, the need for businesses to prioritize carbon management h...
Mastering Salesforce Queues: A Comprehensive Guide to Boosting Your Productivity
As a sales professional, you're always looking for ways to streamline your work and be more producti...
6 Technologies to help your Business Achieve Sustainability Goals in 2023
Many corporate leaders are also discovering that sustainability can deepen their organization’s sens...
How IoT is reinventing Manufacturing and Supply Chains Industries in 2023?
IoT has transformed manufacturing operations and supply chain management by increasing operational s...
The Future of IoT: Trends and Predictions for 2023
The Internet of Things (IoT) has come a long way since its inception over a decade ago.
Aligning DevOps with AWS: Planning Stage [Part 3 of 9]
In this article, 3rd in the series, we will discuss the Planning stage of DevOps using AWS and intro...
Cloud-Based RPA: The Next Frontier in Automation
Automation has become a buzzword in the business world, and for a good reason. Companies are embraci...
How can Businesses use ChatGPT to upgrade their Customer Services?
Businesses can leverage ChatGPT to take their clients’ experience to the next level. For example, re...
Agile Technologies: Revolutionizing Business Efficiency and Innovation
Businesses that use agile technologies have gained insights, worked faster, and built stronger relat...
Top 7 Salesforce Trends To Follow in 2023
As one of the most powerful CRM platforms, Salesforce assists businesses to build customer databases...
What is Data Mesh? | Architecture, Principles, and Benefits
What is Data Mesh? Data mesh is a decentralized data architecture that groups data according to a pa...
Understanding Data Fabric, its Key Components & Benefits.
Data fabric integrates and connects to your organization’s data while removing the complexities invo...
What is IoT Analytics and Why Business Leaders should care?
48% of companies use IoT in their business. Imagine the amount of customer data being collected. Wit...
Aligning Services with DevOps Stages [DevOps with AWS – Part 2 of 9]
One popular platform for implementing DevOps practices is Amazon Web Services (AWS). In this article...
9 Best Practices for Protecting Data Privacy in 2023 and Why they shouldn’t be disregarded.
The average cost of a data breach is currently $4.35 million, and that amount will only increase. Al...
Empowering Defenders with Microsoft Security Copilot: Everything You Need to Know
Introduction Did you know that Cybercrime costs are expected to exceed $10.5 trillion annually? A...
How to Secure Cloud-Native Applications in Hybrid and Multi-Cloud Environments?
Introduction Approximately 86% of organizations have already adopted a multi-cloud strategy, utiliz...
Real Time Intelligence for the Rest of Us with Microsoft Fabric
In the wake of the Build conference back in May, we highlighted how Microsoft Fabric was ramping up...