Why You Should Think Twice Before Relying on ChatGPT for Financial Advice

A study led by Washington State University assessed the performance of AI models, including ChatGPT, on financial exam questions. While ChatGPT excelled in multiple-choice questions and synthesized broad concepts well, it struggled with nuanced tasks like determining insurance coverage or evaluating merger deals. Credit: SciTechDaily.com

A WSU-led study found ChatGPT effective at broad financial tasks but weak in nuanced areas. While ChatGPT 4.0 excelled, a fine-tuned ChatGPT 3.5 rivaled its accuracy. AI is seen as a tool to assist professionals rather than replace them but could impact entry-level finance roles performing repetitive tasks.

Large language models, such as ChatGPT, demonstrate strong performance on multiple-choice questions from financial licensing exams but struggle with more nuanced tasks.

A study led by Washington State University evaluated over 10,000 responses from AI models BARD, Llama, and ChatGPT to financial exam questions. The researchers went beyond assessing the models’ ability to select correct answers, also asking them to provide explanations for their choices. These responses were then compared to explanations from human professionals.

Among the models tested, two versions of ChatGPT performed the best overall. However, even these models displayed significant inaccuracies when addressing more complex and advanced topics.

“It’s far too early to be worried about ChatGPT taking finance jobs completely,” said study author DJ Fairhurst of WSU’s Carson College of Business. “For broad concepts where there have been good explanations on the internet for a long time, ChatGPT can do a very good job at synthesizing those concepts. If it’s a specific, idiosyncratic issue, it’s really going to struggle.”

The Scope and Findings of the Study

For this study, published in the Financial Analysts Journal, Fairhurst and co-author Daniel Greene of Clemson University used questions from licensing exams including the Securities Industry Essentials exam as well as the Series 6, 7, 65, and 66.

To move beyond the AI models’ ability to simply pick the right answer, the researchers asked the models to provide written explanations. They also chose questions based on specific job tasks financial professionals might actually perform.

“Passing certification exams is not enough. We really need to dig deeper to get to what these models can really do,” said Fairhurst.

Of all the models, the paid version of ChatGPT, version 4.0, performed the best, providing answers that were the most similar to human experts. Its accuracy was also 18 to 28 percentage points higher than the other models. However, this changed when the researchers fine-tuned the earlier, free version of ChatGPT 3.5, by feeding it examples of correct responses and explanations. After this tuning, it came close to ChatGPT 4.0 in accuracy and even surpassed it in providing answers’ that were similar to those of human professionals.

AI’s Weaknesses in Specialized Financial Tasks

Both models still fell short, though, when it came to certain types of questions. While they did well reviewing securities transactions and monitoring financial market trends, the models gave more inaccurate answers for specialized situations such as determining clients’ insurance coverage and tax status.

Fairhurst and Greene, along with WSU doctoral student Adam Bozman, are now working on other ways to determine what ChatGPT can and cannot do with a project that asks it to evaluate potential merger deals. For this, they are taking advantage of the fact that ChatGPT is trained on data up until September 2021, and using deals made after that date where the result is known. Preliminary findings are showing that so far, the AI model isn’t very good at this task.

Overall, the researchers said that ChatGPT is still probably better used as a tool to assist rather than as a replacement for an established financial professional. On the other hand, AI may change the way some investment banks employ entry-level analysts.

“The practice of bringing a bunch of people on as junior analysts, letting them compete and keeping the winners – that becomes a lot more costly,” said Fairhurst. “So it may mean a downturn in those types of jobs, but it’s not because ChatGPT is better than the analysts, it’s because we’ve been asking junior analysts to do tasks that are more menial.”

Reference: “How Much Does ChatGPT Know about Finance?” by Douglas (DJ) Fairhurst and Daniel Greene, 18 November 2024, Financial Analysts Journal.
DOI: 10.1080/0015198X.2024.2411941

SomaDerm, SomaDerm CBD, SomaDerm AWE (by New U Life).

Somaderm Gel is an advanced scientific formulation created to support your body’s natural growth hormone production. Somaderm is based on the latest research and technology in the field of nutritional supplements and is designed to help you feel and look your best.