Large Language Models (LLMs): What, Where, How, and When Not!

13 min readJan 25, 2024

Executive Summary

The realm of generative AI, LLMs, and chatbots is no longer exclusively governed by OpenAI. Other prominent players such as Google, Meta, and AWS have joined the fray, accompanied by a burgeoning number of startups around the globe.

LLMs are the powerful models behind generative AI’s chatbots. These models, trained on massive datasets of text and code, can generate text, translate languages, write different types of creative content, and answer your questions in an informative way.

LLMs can do more than create content; they can orchestrate interactions with search engines, data bases, knowledge bases, and external tools, enabling the creation of intelligent agents capable of perceiving, understanding, and executing tasks.

LLMs come in different shapes and forms. They are accessible under different licenses, aimed at different purposes, and are of different sizes.

Implementing LLMs involves choosing between established cloud-based solutions or open-source models, each offering distinct advantages in reliability, customization, and data handling capabilities. A critical factor in LLM projects is data governance to ensure responsible and ethical usage, with data quality being paramount.

While LLMs hold immense promise, their implementation incurs costs such as fine-tuning, prompt engineering, inference, cloud expenses, and transformation costs.

Not every problem needs an LLM solution. In scenarios requiring transparency and fairness, symbolic decision models might be preferred over LLMs due to their interpretable nature.

LLMs: Engine of Generative AI

The generative AI landscape is undergoing a rapid transformation, as OpenAI’s pioneering role is being challenged by a surge of new entrants. Major tech companies like Microsoft, Google, and Meta are pouring resources into generative AI, while innovative startups are emerging across the globe. This globalized landscape is poised to drive fierce competition and foster exciting innovations that will shape the future of this transformative technology.

Behind the scenes of the generative AI revolution are LLMs — sophisticated AI engines that fuel chatbots. Much like cloud computing empowers web and mobile applications, these AI powerhouses provide the computational strength and linguistic finesse necessary for chatbots to engage users in natural conversations. LLMs aren’t new[1] but their significant progress is due to a deep learning model released by Google’s research team and named Transformer[2]. The model was enhanced by other big techs and it now powers chatbots such as ChatGPT by OpenAI, Bard by Google, Claude by Anthropic, Pi by Inflection AI, Grok by xAI, and more recently, Q by AWS.

The internal functioning of LLMs

The ability of LLMs to generate coherent and relevant text is based on a process called “embeddings” (see illustration below). This process involves transforming words into numerical representations that only machines understand. These representations allow LLMs to discern the relationships between words to predict new words and new expressions. This transformation is comparable to an exceptionally intelligent semi-automatic typing function. It allows LLMs to generate text that is not only coherent, but also relevant to the user’s query.

Illustration from How ChatGPT turned generative AI into an “anything tool”, Haomiao Huang, Ars Technica, 8/23/2023.

However, due to their reliance on statistical data and probabilities, LLMs can produce incoherent responses to the same query, generate erroneous information, or even hallucinate. This limitation is inherent in their design, hence the emergence of prompt engineering.

Prompt engineering is a new approach to guided input that aims to address these challenges. This technique involves effectively translating high-level human requests into precise machine instructions, much like BI tools translate business analyst requests into database queries. Prompt engineering thus guides LLMs towards specific and accurate results, minimizing the likelihood of ambiguous interpretations or inaccuracies.

Different forms of LLMs

LLMs come in different shapes and forms. They are accessible under different licenses which, much like other software, essentially make the distinction between closed-source LLMs owned by companies — like OpenAI’s GPT-4 and Google’s Gemini — and open-source LLMs freely available to individuals and organizations — such as Meta’s Llama 2, TII’s Falcon 180B, and Mistral AI’s 7B model.

Both closed-source and open-source large language models (LLMs) have their internal working, but they differ in terms of user-friendliness, technical expertise required, and intended purpose.

Closed source LLMs often provide a more user-friendly experience and require less technical expertise. They are commonly used for commercial purposes, such as providing customer service, generating marketing materials, or creating personalized experiences for users.

On the other hand, open-source LLMs offer more flexibility and customization but may require more technical knowledge to use effectively. They are typically used for research purposes, such as exploring new techniques for natural language processing or building custom applications.

LLMs also come in different sizes. Initially, there was a perception that bigger LLMs were better such as Falcon 180B, but recently, the trend has shifted towards smaller, less resource intensive LLMs such as Mistral 7B that can perform just as well as their larger counterparts. This development has the potential to make LLMs less resource-intensive to deploy and use.

Beyond content creation

LLMs can do more than just respond to prompts: they can be programmed to interact with databases, search engines, and other applications. This orchestration capability allows LLMs to receive user requests, dispatch them to other applications, and assemble the results into a coherent response for the user. Tools like LangChain have carved a new frontier in LLM technology, opening possibilities for the creation of intelligent agents[3]: software that can perceive and understand their environment, and seamlessly accomplish tasks through planning and reasoning (see illustration below).

Illustration reworked from https://www.datascienceengineer.com/blog/post-multiple-pdfs-with-gpt.

The Road Ahead

Predicting the evolution of technology remains a complex task, yet insights from industry leaders such as OpenAI/Microsoft, Meta, and Google highlight ongoing trends:

Affordable models: One promising trend anticipates reduced costs for learning and usage. This stems from improved model optimizations and the potential development of new chips designed for LLMs[4]. These advancements could enhance efficiency and make language models more accessible.
Specialized models: The future might witness LLMs venturing into specific fields, leveraging diverse data and expert documents to craft specialized business models. This suggests tailored solutions and innovative approaches for various industries[5],[6],[7].
Intelligent agents: Looking forward, LLMs will integrate sophisticated capabilities like reasoning and planning. All you will have to do is define your problem, and an intelligent agent will seamlessly handle the orchestration of the online services required to address your needs.

For sure, LLMs are paving the way for a new software fabric, to the point that a novel breed of LLMs is anticipated to facilitate interactions with databases, knowledge bases, search engines, and external tools[8].

LLM Strategy: Where to Start

Within a company, generative AI will enhance productivity through chatbots like ChatGPT for text or code and foster creativity with tools such as DALL-E for generating images in both core and support functions:

In design and innovation, it has the potential to create new, surprising, and valuable visual designs in less time. This potential appeals to multiple industries such as automotive, electronics, furniture, toys, fashion, luxury, and cosmetics.
In research and development, it can help developers write code and documentation, generate synthetic data, and test scenarios. It can also identify anomalies and defects in the development process.
In data engineering, science, and governance, it can be used to generate datasets when real data is rare or too sensitive to share like in pharmaceutical and healthcare sectors.
In marketing and sales, it can assist in creating marketing and sales materials that incorporate text, images, and videos. It also helps organizations generate product user guides and aid in analyzing customer feedback.
In finance, legal, and human resources, it can streamline the drafting and review process for financial statements, annual reports, and legal documents. It can automatically summarize large bodies of regulatory documents, answer questions from many legal documents, and even create interview questionnaires for candidate assessment.

Yet, the profound impact of generative AI truly lies in the use of LLMs as knowledge-powered assistants, as demonstrated by Morgan Stanley’s innovative use case.

As part of its wealth management content library, Morgan Stanley owns a repository of over 100,000 pages of knowledge spread across multiple internal sites and written in different formats. This heterogeneity used to impede client advisors trying to retrieve information for investment sales or client inquiries, to the point where complex requests often necessitated involvement from multiple document authors, resulting in high risks and time-consuming communications.

With OpenAI’s GPT-4, Morgan Stanley developed an internal-facing chatbot that provides its advisors with actionable knowledge sourced from internal documents. These knowledge databases have evolved beyond mere document repositories, granting advisors access to the expertise of the foremost authority in wealth management[9].

What this illustrates is that even for a company that is not in the technology, entertainment, or media sector, the benefit of LLMs can be enormous, and that in this case the main interest is the use of LLMs as knowledge-powered assistants.

LLM Implementation: What to Plan

Once you have identified a use case for generative AI, two primary pathways for implementation emerge. The first consists of leveraging established vendors like OpenAI/Microsoft, Google, or AWS, who offer a comprehensive suite of AI services through cloud-based solutions. This approach provides a broad spectrum of capabilities and established reliability.

The second consists of using open-source LLMs like Falcon, Llama, and Mistral, which all offer the flexibility to create custom or vertical models tailored to specific needs, such as processing confidential data in sensitive industries like defense. This approach prioritizes adaptability and customization.

The strategic decision between these two paths hinges on the balance between established reliability and cutting-edge adaptability. For projects requiring a robust foundation and proven performance, cloud-based solutions from established vendors are an ideal choice. Conversely, for projects that demand customization, agility, and the ability to handle sensitive data, open-source LLMs present a more suitable option.

The table below offers an overview of the benefits and constraints of different types of LLMs regarding this decision: general purpose versus custom and closed versus open source. This breakdown aids in understanding the diverse types of LLMs and their optimal applications.

LLMs classified along two dimensions: general purpose vs custom and closed- vs. open-source.

Finally, it’s crucial not to overlook the foundational aspect of data governance to consistently manage and protect data as part of an LLM project implementation. No LLM remains immune to environmental changes, as data, documents, and regulations evolve. Only through data governance can we stay abreast of these changes and ensure responsible and ethical data usage[10]. The success of Morgan Stanley in its usage of GPT-4 is partly indebted to the quality of the LLM but mostly to the quality of the training data obtained after preprocessing their 100,000 documents[11].

LLM Costs

LLMs hold promise, but their implementation and operations come with expenses. Five key costs arise in running an LLM:

Fine-tuning cost: This is the cost of training a general LLM on a specific dataset of text and code to make it better suited for a particular task or domain. This can be a significant cost, as it can take a lot of time and compute resources to train a large language model.
Prompt engineering cost: This is the cost of creating prompts that are specifically engineered for an LLM to respond well and with the intended answer to your business needs. This can be a time-consuming process, but it is important to get right to ensure that the LLM is able to provide accurate and useful responses.
Inference cost: This is the cost of processing queries and generating responses. This cost can be minimized by architecting the system to offload some of the processing to other components, and by using prompts that are clear, concise, and easy for the LLM to understand.
Cloud expense: This is the cost of hosting the LLM on a cloud platform, such as Microsoft Azure, Google Cloud Platform, or Amazon Web Services. This includes the upfront cost of the GPUs that LLMs use.
Transformation costs: These are the costs associated with changing your business processes to take advantage of the capabilities of an LLM. This can include employee training, workflow changes, and application and user interface development to integrate the LLM into your existing systems[12].

When LLMs Are Not the Solution

While LLMs boast immense power in processing vast amounts of data, their inherent opacity poses a challenge in guaranteeing fairness and transparency. These systems, reliant on intricate patterns within training data, might inadvertently perpetuate biases or errors, potentially affecting pivotal decisions[13].

In instances demanding utmost transparency and traceability, such as in credit, evaluation, insurance underwriting, and candidate hiring, symbolic decision models emerge as a preferable alternative to LLMs[14].

Symbolic decision models offer a structured and interpretable approach to decision-making. They operate on explicit rules and logical representations, providing a clear trace of how conclusions are derived. This transparency becomes crucial in domains where decisions heavily impact individuals’ lives or business outcomes.

Can we use LLMs and symbolic models together? The answer is yes: for example, LLMs could be used to generate rules directly from documents, preserving links to the original sources for clear explanations. They also can be used the other way around to generate a narrative of the decision-making process[15].

About The Author

As a consultant and interim executive specializing in data & AI transformation, I assist scaleups, growing ventures, and established firms in their strategic initiatives such as managing a complex project, developing a go-to-market strategy, and carrying an audit. With over 15 years of experience, I help clients leverage the capabilities of data and AI across marketing, sales, CRM, and supply chain management. In retail, luxury, cosmetics, finance, and insurance. Complementing this practical experience, I hold a PhD, a post-doc, and two patents in AI. You can reach me on LinkedIn[16].

Acknowledgments

This article follows up on my previous piece about generative AI, introducing the concept of text-to-X chatbots[17]. While the earlier article gave an overview of this new technology, this piece dives deeper into LLMs — the powerful engines propelling conversational user interface chatbots.

It gathers insights from interviews with early adopters of LLMs across the USA, Switzerland, France, and Singapore. It also incorporates a thorough analysis of over 100 scientific, technical, and business articles published since OpenAI ChatGPT’s launch in November 2022. Valuable input from discussions on my LinkedIn posts has also enriched its content.

Both articles aim to provide executives with a concise yet comprehensive understanding of this technology, representing the next phase in data-driven digital transformation.

Bonus Q&A: Top 5 Questions

How do LLMs handle multilingual conversations or queries in languages they were not specifically trained on? Handling multilingual conversations or queries in languages not specifically trained on is a complex challenge for LLMs. While these models are typically trained on vast datasets that include multiple languages, their proficiency in languages they were not explicitly trained on may vary. LLMs may struggle with accuracy and coherence when processing queries in unfamiliar languages, potentially leading to misinterpretations or errors in responses. However, ongoing research and development efforts are focused on improving multilingual capabilities to enhance the performance of LLMs across diverse linguistic contexts.
Are there any concerns regarding the environmental impact of training and running large language models, and what efforts are being made to mitigate these concerns? Concerns regarding the environmental impact of training and running large language models have gained attention in discussions about AI sustainability. The computational resources required for training and inference processes contribute to significant energy consumption and carbon emissions. Efforts to mitigate these concerns include optimizing model architectures and algorithms to improve efficiency, exploring renewable energy sources for data centers, and developing hardware specifically designed for AI workloads with lower energy consumption. Additionally, initiatives promoting responsible AI usage advocate for ethical considerations in AI development to balance innovation with environmental sustainability.
What are the potential risks associated with prompt engineering, and how can organizations ensure that prompts are designed ethically and responsibly? Prompt engineering poses challenges related to ensuring that prompts effectively guide LLMs toward accurate and relevant responses while minimizing the risk of biased or inappropriate outputs. Organizations must carefully design prompts to align with ethical principles and desired outcomes, considering factors such as clarity, inclusivity, and cultural sensitivity. Additionally, ongoing monitoring and evaluation of prompt performance are essential to identify and address any issues or biases that may arise during LLM interactions. Collaborative efforts between domain experts, AI researchers, and ethicists can help establish guidelines and best practices for responsible prompt engineering.
Can LLMs be used in highly regulated industries such as healthcare or finance, and if so, what are the specific challenges and considerations LLMs have the potential to be utilized in highly regulated industries such as healthcare and finance, but their deployment involves unique challenges and considerations. Regulatory requirements, data privacy concerns, and security considerations present significant hurdles for integrating LLMs into these sectors. Organizations must adhere to strict compliance standards, such as HIPAA in healthcare or GDPR in finance, to safeguard sensitive information and ensure legal compliance. Furthermore, LLMs used in regulated industries may require additional validation, auditing, and transparency measures to meet regulatory expectations and maintain trust among stakeholders.
Are there any ongoing efforts or initiatives aimed at improving the interpretability and explainability of LLM-generated outputs, especially in critical decision-making scenarios? Ongoing efforts and initiatives are underway to enhance the interpretability and explainability of LLM-generated outputs, particularly in critical decision-making scenarios where transparency is essential. Researchers are exploring various techniques, such as attention mechanisms, saliency maps, and model introspection methods, to provide insights into how LLMs arrive at their predictions or responses. Additionally, interdisciplinary collaborations between AI researchers, ethicists, and domain experts aim to develop frameworks and tools for evaluating and interpreting LLM outputs in a manner that promotes accountability and trust. These efforts seek to address concerns related to bias, fairness, and accountability in LLM-based decision-making processes, ultimately advancing the responsible deployment of AI technologies.