Large Language Models: Fight Them or Join Them?

The Internet is abuzz with keywords like ChatGPT, Bard, LLaMA, etc. A major chunk of the discussion is focused on how smart conversational artificial intelligence (AI) models can make people obsolete and take over tasks like programming.

This post is an attempt to understand what happens behind the scenes and make some inferences on how these models will impact processes as we see them today.

A Quick Background

Text and voice inputs are considered revolutionary in user interaction (UI) designs since they improve the usability and accessibility of the application by adopting spoken language rather than predefined input patterns. In other words, instead of users having to conform to specific pre-established formats or commands, they can now interact with applications using natural language, making the process more intuitive and user-friendly.

Since spoken language cannot be exhaustively represented by a series of patterns, the functionality of such mechanisms has limitations. The remedy is the use of machine learning (ML) models that perform Natural Language Processing (NLP) in order to create conversational AI tools (commonly called chatbots).

According to IBM, “Conversational artificial intelligence (AI) refers to technologies, like chatbots or virtual agents, which users can talk to. They use large volumes of data, machine learning, and natural language processing to help imitate human interactions, recognizing speech and text inputs, and translating their meanings across various languages.”

The conversational AI process includes four steps.

Step 1: Input generation: Input statement given by the user as text or voice via a suitable UI.

Step 2: Input analysis: Perform Natural Language Understanding (NLU) to understand the input sentence by extracting the lexemes and their semantics. If the input is by voice, speech recognition is also required. This step results in identifying the intention of the user.

Step 3: Dialogue management: Formulate an appropriate response using Natural Language Generation (NLG).

Step 4: Reinforcement learning: Improve the model by continuously taking feedback.

NLU and NLG have nowadays moved beyond regular ML into the domain of Large Language Models (LLMs) that denote complex models trained on massive volumes of language data. LLMs are now defined for conversations, technical support, or even simple reassurance.

LLMs are now defined for conversations, technical support, or even simple
reassurance.

Large Language Models: What Are They?

As mentioned before, LLMs denote complex NLU+NLG models that have been trained on massive volumes of language data. NVIDIA considers it as “a major advancement in AI, with the promise of transforming domains through learned knowledge.” It lists the general scope of such LLMs as:

Text generation for content copyright
Summarization
Image generation
Chatbots
Generating code in different programming languages for function templating
Translation.

NVIDIA estimates that LLM sizes have been growing 10x annually for the last few years. Some popular models in use are GPT, LaMDA, LLaMA, Chinchilla, Codex, and Sparrow. Nowadays, LLMs are being further refined using domain-specific training for better contextual alignment. Examples of the same are NVIDIA’s BioNeMo and Microsoft’s BioGPT.

LLMs work predominantly by understanding the underlying context within sentences. Nowadays the implementation is dominated by the use of transformers, which extract context using a process called “attention.” The importance of generating the context is illustrated below, where it is critical to identify the meaning of the word “bank.”

The importance of generating the context

In the upcoming sections, we will explore four models that changed the public landscape of LLMs, followed by a training framework provided by NVIDIA.

Five Game-Changing Models Revolutionizing the Public Landscape of LLMs

In this section, we will explore five remarkable models that have reshaped the public landscape of LLMs, ushering in a new era of conversational artificial intelligence. These models have made significant strides in advancing the capabilities and possibilities of LLMs. Through their unique features and functionalities, they have captivated the attention of researchers, developers, and enthusiasts alike.

1. GPT 3.5: Powering ChatGPT

GPT stands for Generative Pre-trained Transformer, an NLG model based on deep learning. Given an initial text as a prompt, it will keep predicting the next token in the sequence, thereby generating a human-like output. Transformers use attention mechanisms to maintain contextual information for the input text.

GPT 3 is a decoder-only transformer network with a 2048-token context and 175 billion training parameters. It needs a total of 800 GB to store. GPT 3.5 is an enhancement of GPT 3 with word-level filters to remove toxic words.

ChatGPT is an initiative of OpenAI (backed by Elon Musk and invested in by Microsoft) that forms a chat-based conversational frontend to GPT 3.5. The model used for this purpose is named Davinci, which was trained using Reinforcement Learning from Human Feedback (RLHF ), tuned by Proximal Policy Optimization (PPO). The process followed by RLHF+PPO is given in the illustration below.

The process followed by RLHF+PPO is given in the illustration

The capability of transformer models is based on judging the significance of each input part (or token). In order to do that, the entire input is taken together (which is possible in platforms like ChatGPT since the input is completely available in the prompt). An example is given below, where the term “it” has to choose between the monkey and the banana to set the context.

An example is given below, where the term “it” has to choose between the monkey and the banana to set the context.

A detailed explanation of the model is available in “Improving Language Understanding by Generative Pre-Training” by Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. As mentioned before, GPT 3 works on a 2048-token context. Hence, it is necessary to pad the input appropriately towards the right.

OpenAPI is releasing its APIs for ChatGPT for further development and use.

2. LaMDA: The Model Behind Bard

The language Model for Dialogue Applications (LaMDA) is another model that is based on the concept of transformers. (The concept of transformers itself was developed by Google back in 2017, but OpenAI released it as an application first!). The project was announced by Google in 2021.

Contrary to regular transformer models, Google trained its model entirely on dialogue. Hence it is potentially able to create a better context for conversational purposes, rather than limiting itself to fully structured sentences.

The LaMDA training process is split into two phases: pre-training and fine-tuning. Pre-training was done on a corpus comprised of more than 1.56 trillion words, leveraging the vast search query database available within Google. This data allows LaMDA to create a number of candidate answers for each input dialogue, which are then ranked in the fine-tuning phase to create the best answer.

The entire model is monitored based on the aspects of quality (similarity to human language output), safety (causes no harm or bias), and groundedness (answers verifiable with openly available facts).

The entire background of LaMDA is available in “LaMDA: Language Models for Dialog Applications” by Romal Thoppilan, et al. A simple example extracted from the paper is given below.

LaMDA is gearing up for a closed-group release for further fine-tuning.

3. Amazon Joins the Bandwagon

Amazon developed an LLM using the “chain of thought” (CoT) philosophy and has been successful in exceeding the performance of GPT 3.5 by more than 15%. The CoT prompting approach integrates language and vision through a two-stage architecture that distinguishes between generating rationalities and inferring answers.

Given the inputs in different modalities, Multimodal-CoT decomposes multi-step problems into intermediate reasoning steps (rationale) and then infers the answer. It is demonstrated in “Multimodal Chain-of-Thought Reasoning in Language Models” by Zhuosheng Zhang, et al.

It is demonstrated in “Multimodal Chain-of-Thought Reasoning in Language Models” by Zhuosheng Zhang, et al. — Image Source: Multimodal Chain-of-Thought Reasoning in Language Models

This model is awaiting public implementation release and is expected in the near future.

4. LLaMA: Meta’s answer

Large Language Model Meta AI (LLaMA) is a highly flexible and adaptable LLM introduced by Meta. It has four versions (7B, 13B, 33B, and 65B parameters), which can be trained and deployed with a smaller infrastructure, making it lightweight compared to others. The 7B model has been trained on 1 trillion tokens, while the 65B model is trained on 1.4 trillion tokens.

The current release is still in its foundational stage and requires more effort on word filtering and context awareness. It has been released for limited access. The advantage posed by LLaMA is that its training has been done exclusively on publicly available data like GitHub, Wikipedia, Gutenberg journals, ArXiv, and Stack Exchange.

LLaMA is fully open source, and even its 13B model is performing on par with the likes of GPT 3.

5. NVIDIA’s NeMo Megatron

NVIDIA has developed an LLM service of its own, called NVIDIA NeMo. It runs on NVIDIA’s own AI platform to customize and deploy LLM services on the cloud, packaged with API access.

NeMo Megatron provides an end-to-end framework for training and deploying LLMs with up to trillions of parameters. It is deployed as containers with access to GPU-backed and MPI-enabled multi-node compute power provided by NVIDIA Triton Inference Server, enabling massively parallel training and tuning.

Making the Big Decision

The likes of ChatGPT have created a revolution by which AI models are getting more publicly accessible, thereby democratizing the ecosystem. The impact has been so large that ChatGPT was recently featured on the cover page of TIME Magazine.

With the arrival of virtually unlimited computation and storage capabilities, these models are poised to get better with time, increasing our dependency on them. With this great power comes great responsibility to use them ethically. While the models can potentially replace redundant tasks, they are not expected to overpower human thought in any current timeline, primarily since they are only as good or as bad as the data we feed into them.

Hence it may be good to join forces with such models rather than fighting them. Some purposes are:

Generating code templates for coding tasks
Code reviews
Reducing ramp-up time for new technologies during training
A general learning tool
Language practice and translation.

The adaptation curve now falls on the users to identify use cases where it is worth using these tools and to make sure all of us level up our skills and requirements to make the tools worth their existence.

Ready to Build
Something
Extraordinary?

Join 300+ companies who trust us to turn their biggest ideas into market-leading solutions.

Our Global Team
500+ Engineers Worldwide
SOC 2 Certified

Get in Touch with Us

Our Global Team
500+ Engineers Worldwide
SOC 2 Certified

Get a Free Consultation

Our Offices Worldwide

Our Solutions

Services

Industries

Technologies

Our Solutions

Services

Industries

Technologies

About Us

InApp India Office

121 Nila, Technopark Campus Trivandrum, Kerala 695581
+91 (471) 277 -1800
mktg@inapp.com

InApp USA Office

999 Commercial St. Ste 210 Palo Alto, CA 94303
+1 (650) 283-7833
mktg@inapp.com

InApp Japan Office

6-12 Misuzugaoka, Aoba-ku
Yokohama,225-0016
+81-45-978-0788
mktg@inapp.com

Drive measurable business impact with intelligent systems.

RECOMMENDED BLOGS

Featured Insights

Modernize your system incrementally to improve performance.

RECOMMENDED BLOGS

Featured Insights

Transform workflows into scalable, product-ready solutions.

RECOMMENDED BLOGS

Featured Insights

Turn ideas into intelligent solutions with automation.

RECOMMENDED BLOGS

Featured Insights

Custom Software Development

We build tailor-made applications that enable seamless integration and scalability.

Mobile App Development

We develop user-friendly and feature-rich mobile apps using advanced technologies.

Data Analytics Services

We offer customized data analytics solutions that add business value.

Software Product Development

We build market-ready software products that are powered by advanced technologies.

Blockchain Development Services

We offer Blockchain solutions that drive innovation and accelerate business growth.

Cloud Application Development

We build applications on leading Cloud platforms to streamline operations.

Software Testing Services

Our customized software testing services help you mitigate risks and elevate performance.

AI & ML Solutions

We offer bespoke AL & ML solutions that drive lasting transformation.

AR/VR Development Services

Our AR/VR development services help you gain a competitive advantage.

Empowering Businesses Through Artificial Intelligence

From AI chaos to clarity in Just 6 Weeks

Our team digitalizes workflows to improve project delivery.

RECOMMENDED BLOGS

Featured Insights

We modernize systems to provide reliable digital services.

RECOMMENDED BLOGS

Featured Insight

We develop software that simplifies workflows

RECOMMENDED BLOGS

Featured Insights

We help you modernize systems to boost operational efficiency.

RECOMMENDED BLOGS

Featured Insights

Cloud Technology Services

Our Cloud services to modernize your network and help your business migrate to the Cloud.

Java Development Services

Our tailor-made services can help you build high-performance, interactive applications.

JavaScript Frameworks

We create responsive and interactive applications tailored to your specific business needs.

Test Automation Tools

Our testers use advanced automation frameworks to detect and minimize deployment errors.

Microsoft Technology Services

We use robust technologies to build new applications, driving digital transformation.

PHP Development Services

Our team builds high-performance website solutions optimized for scalability.

iOS & Android App Development

We design and develop custom applications using scalable technologies.

Open-Source Technologies

We use advanced open-source platforms to develop high-quality applications.

Python Development Services

Our Python services power innovation through high-quality, feature-rich application capabilities.

DevOps Services

Our solutions streamline pipelines, improve system stability, and mitigate vulnerabilities.

Customer Testimonials

Our leaders set the vision and guide our team. They use their technical skills and strategic thinking to turn challenges into opportunities and deliver results that last.

We create solutions that grow with your business, whether you are updating, trying new ideas, or expanding.

For 25 years, we have built strong partnerships, created real impact, and delivered solutions that matter.

We address significant challenges, celebrate achievements, and pursue continuous learning. Our work is purposeful and fosters curiosity and creativity.

Explore career growth opportunities with us. Join a team that supports you at every stage with guidance and mentorship.

Join our expert-led webinars for insights into emerging technologies and best practices.

Discover how we help businesses overcome challenges and achieve measurable results.

Explore our in-depth research and expert perspectives on key industry trends.

Access videos featuring thought leaders, demonstrations, and expert discussions.

Gain clear insights with our visually appealing infographics. ​

Explore our blogs for insights that drive smarter decisions.

Drive measurable business impact with intelligent systems.

RECOMMENDED BLOGS

Gain clear insights with our visually appealing infographics.