AI and Machine Learning in Re/Insurance series: Generative AI in (Re)insurance

For many decades machine learning models have predominantly been employed to make certain types of predictions (generally speaking):

Categorical predictions - ‘yes/no/maybe’ (classification models)
Quantitative predictions - e.g. the price of a house or a stock (regression models)

A relatively new way of utilizing machine learning (chatbots have been around since the 1960’s) has emerged, spurred on by an increase in computational ability (GPUs) and new model architectures (Ian Goodfellow et al. 2014 GANs, Vishwani et al. 2017 Transformers). This artificial intelligence subset, known as Generative AI, represents arguably the most disruptive artificial intelligence technology that the world has seen since the beginnings of the internet.

What is Generative AI?

Generative AI is a subset of artificial intelligence that focuses on creating models capable of generating new content. These models are trained on large datasets and learn patterns and data distributions, enabling them to generate original content such as text, images, videos, or audio. Newer multi-modal models have the ability to receive input and generate output in the aforementioned data types.

Generative AI model architecture types

There are different types and styles of generative AI, including GANs (Generative Adversarial Networks) and LLMs (Large Language Models) like OpenAI’s GPT, Meta’s LLaMA, and and The Technology Innovation Institute's Falcon family of models. These models aim to produce output that is indistinguishable from human-generated content, whether the output is image, video, or text.

GANs

GANs work by taking a competitive (adversarial) approach to new data generation in which a generator model creates instances of new data which a classifier discriminator model seeks to distinguish as real or fake. Over repeated iterations the generator and discriminator take turns training one another until they converge to a place where the discriminator model is unable to accurately classify the generator output any better than a coin flip.

LLMs

Large Language Models like GPT ingest vast portions of the internet to train neural net transformer models to probabilistically and contextually predict what word comes next in a sequence of words.

LLMs are very good at “few shot” or even “zero shot” learning in which the model is able to figure out how to answer a question despite few or no examples in the ingested training data from which to draw upon. This has made LLMs closer to general purpose computers than question and answer chatbots. LLMs are still prone to “hallucinating” which is making definitive sounding but very incorrect statements.

Generative AI in Reinsurance

Generative AI has the potential to immediately effect and improve various aspects of the (re)insurance industry workflow. Additional applications will be added as these models evolve and grow. Some initial applications include:

Named Entity Recognition (NER): LLMs can help with named entity recognition to reconcile entities across both policy/submission data and claims.

Synthetic Data for Risk Modeling: LLMs can assist in generating synthetic data to simulate and model various risk scenarios within a single business vertical or across multiple disparate lines. These models can generate realistic risk scenarios and/or think of new potential emerging risk profiles, allowing (re)insurers to better understand potential risks and improve risk assessment accuracy.

Synthetic Data to Feed Machine Learning: Synthetic data may also be generated to aid machine learning model training and testing of discriminative models.

Natural Language Processing Applications: LLMs can be utilized to automate and enhance any task involving a text corpus- pdf Q&A, comparison, summary, or extraction of relevant data- including policy submissions, claims, and legal documents.

USER RISKS OF USING LLMs

Data Leakage: Loading proprietary or non-public information from your company to another companies’ servers while using a 3rd party LLM interface such as ChatGPT.

Hallucinations: Using/citing LLM data without checking the veracity of the output.

Data Bias: This is considered a general machine learning risk, but specific to generative AI - fine tuning an LLM with data that represents only a subset of the overall data population can lead to predictions and output that are skewed because they do not adequately represent the distribution of the whole population, e.g. building models with the 10 cedents a company does business with when there are 50 cedents transacting for that line of business.

Copyright or Patent Risks: Generative AI creates new instances from a large corpus of data. Users that create new images, videos, or text and seek to monetize them may find that the model has been trained and taken “pieces” from current copyrighted or patented material.

Adversarial Risks to Generative AI

Any technological improvement or innovation brings along bad actors that will seek to use the new technology. On the morning of May 22nd, 2023, a picture showing an explosion near the Pentagon appeared on several verified Twitter accounts. The picture was quickly proven to be an AI generated fake which caused a temporary drop in the stock market of over $500B. Deep video fakes and deep audio fakes will continue to improve and will lead to new methods to attempt to extract money from victims. This past January, a mother in Arizona received a call in her daughter's voice explaining that she had been kidnapped and asked for a $1M ransom. The call was identified as a deep fake voice hoax that had used a sample of the daughter's voice to generate the kidnapping script.

Regulatory Concerns of using Generative AI in Reinsurance

The deployment of generative AI in the reinsurance industry brings many of the same regulatory considerations when using machine learning models in general:

Data Privacy: Generative AI models such as LLMs often require access to large amounts of data to test, train, and fine tune the models for varied specific use cases. (Re)insurance companies must adhere to privacy regulations, such as GDPR, to ensure proper handling, storage, and protection of personal and confidential information.

Explainability: Regulatory bodies may require increased transparency and more intuitively understandable model output. A growing subset within artificial intelligence is explainable AI or XAI, which seeks to better explain machine learning model output. LLMs are in the early stages of explainability. Their rapid growth in popularity and utilization are spurring companies like OpenAI to build out tools which seek to identify which neurons are being activated for a given text sequence.

Data Quality and Provenance: Regulatory bodies will ask users to prove out the quality and provenance of the datasets that the model was trained on.

Human-Generated Watermark: As more users begin to use LLMs to generate content, there will be a need to prove output/content has been created without any algorithmic aid, requiring a type of watermark to prove that the content was generated by a human.

What’s Next for Generative AI?

Any prognostication on the future of generative AI moves quickly into the realm of science fiction- omnipresent AI assistants, autonomous AI agents solving seemingly impossible tasks, new model architectures that allow for increased model parameters and less computationally expensive model training. As generative AI evolves and becomes integrated so too will the applications for applying generative AI within the (re)insurance industry.