Stop Treating LLMs As Knowledgeable

Quick Rant

I want to preface this article by saying that I am writing from the perspective of a user.
Although I am more knowledgable than the average person about LLMs, and I do understand at least the basics of how they work, I am by no means an expert on the subject. Some of the things I will say might be inaccurate, or out of date by the time you are reading this.
What I am an expert in, is being bitter. And god am I bitter about AI stuff.

The other day, I started scaffolding the project of a new website I am working on (more details eventually™️). The tech stack was one I am mostly familiar with - NextJS, Typescript, Tailwind, and Drizzle. Usually, I like to use Theo's create-t3-app template to get things started, as it ships all of these by default, with a few extra bonuses. But I did add a few technologies that are new to me into the mix.

I've been looking for an auth alternative to AuthJS and I've been meaning to try Better Auth for quite a while now, so this was a great opportunity to do so. I also wanted to try and use oRPC, as a tRPC alternative with built in OpenAPI support (which deserves a whole separate blogpost eventually™️). These 2 additions, alongside shadcn, meant that I have to open way more chrome tabs for this setup than I would like. And that would totally ruin the feng shui of my screen.

I needed a way to condense all of the different packages I need to install, and files I need to create, into a neatly organized step by step list. Luckily, my intern Chad Gippity, is great at summarizing and organizing information. After all, that is part of what Chad was trained to do. And oh boy did I end up opening those chrome tabs after all.

Gatekeep, Gaslight, Girlboss.

Surprisingly enough, Chad nailed the oRPC setup, even though it's a very new library with not a lot of users. But I was surprised to find out that Better Auth doesn't actually exist, despite being one of the only node libraries to include a file in Chad's native language. No, apparently I was looking for a better (i.e. of a more excellent or effective type or quality.) auth library.
Kind of like AuthJS! Wow! Thanks Chad!

ChatGPT recommends using AuthJS in its BetterAuth guide.

Here's a fun experiment. Try asking your favorite LLM to explain to you how to use any technology that has had some major update in the last few years. Some of my favorites are Tailwind with v4 and Tauri with v2. Now try and see if the guide they provide you with is still up to date. Even with access to the web and reasoning, ChatGPT still insists that I make a tailwind.config.js file. Even though I explicitly mentioned that the guide should be up to date with v4, which no longer recommends using a JavaScript based config. In fact, it is disabled by default. Sometimes, even explaining this very fact to it, would result in ChatGPT insisting I enable it in my project.

ChatGPT gives the setup instructions for tailwind v3, even when explicitly asked for v4.

Okay.. why the heck is the robot lying to me?
Although these are a few really small and cherry picked examples, they demonstrate something that I believe to be the biggest fault with LLMs as a whole. The model thinks it's more knowledgeable than it actually is. If you've worked with LLMs before, especially while coding, you'd know that this leads to 3 major problems:

Gatekeep - The model ignores a detail in the program unless explicitly asked about it, because it has a higher chance of getting it wrong.
Gaslight - The model pretends a mistake was intentional, and looks for reasons why it's true instead of fixing it.
Girlboss - Fake it 'till you make it. Make up a solution that feels correct. Hallucinate a library that doesn't exist.

The Tesla Effect

One day we'll have fully self driving cars. It's an undeniable truth. Tesla has, by far, more data that could power such a car than any other company. Such a car, of course, doesn't really exist yet. And yet Tesla is worth over a trillion dollars. Even though their self driving technology is currently way worse than some competitors that are worth much much less.

Men judge generally more by the eye than by the hand, for everyone can see and few can feel.
- Niccolò Machiavelli

Let's give credit where it's due. AI nowadays is really impressive. The fact that it could answer so many complex questions, sometimes even better than a human, is absolutely mind blowing. My goal is not to discredit any of the incredible work done by so many individuals. Nor is it meant to downplay the intelligence that LLMs do possess nowadays. But I do fear the future where the current direction some AI companies are headed takes us, and our planet. And especially some really shitty apps and AI slop that will be made along the way.

LLMs are not knowledgeable.

They're like a brain without a hippocampus. And that's not necessarily a bad thing. In fact, it might be the biggest gift to humankind in terms of AI safety, if used correctly.

The trend right now, as it seems, is to optimize LLMs for benchmarks. Make them seem as smart as possible, and have them be able to answer as many questions that make them seem like they possess actual knowledge. Of course, in a very simplified way, all of that knowledge is just hardcoded into the LLMs probability weights. That is exactly what causes them to be so unreliable, and annoying to work with. And the most unfortunate of side effects are experienced by our planet.

Do You Know The Way?

Much like a lot of other things in life, it's a game of balance.

An AI needs to be confident in its knowledge. Otherwise, it'll spend more time using the search tool, which takes time, and costs money. But if you make it too confident, once its data becomes outdated, so will the AI. Worse, it'll start hallucinating knowledge that looks correct, but isn't based in any truth. And re-training costs time and money, and a massive environmental impact. Pre-program (train on) too much knowledge, and the model becomes slow. Not enough knowledge, and it becomes stupid.

Developers and users keep treating LLMs as if they're a complete suite of tools by themselves. And in my opinion, this causes the companies training them to optimize LLMs to seem like they're the entire machine, when they're only a part of it. What we should be doing is making AI models a lot smaller, and instead focus on improving the tools they use, making them faster, and more well integrated. Put the LLMs in the driver seat instead of making them the entire machine.

Well What Should I Do?

If you make a product that relies on AI - don't. assume. it's. knowledgeable.

LLMs are ridiculously good at understanding the semantics of human language, and that is the most powerful and reliable tool they possess. That is what they were trained to do. They are given a bunch of examples of human speech, and they attempt to understand the relationships between different words, and generate entirely new speech from those examples. But they do not have logic or facts coded into them. They only have knowledge they can recall from the examples they were trained on. They do not understand your code base without help. They cannot diagnose a patient without a database of symptoms and previous cases. They are not the brain, they're just a part of it.

Make the amount of knowledge generated by the LLM minimal. Use it only for processing of existing information if possible. The best LLM projects I've encountered, use it only for:

Human language processing
Formatting complex or varied data
Summarizing information
Creative work, if you're into that, I guess

Some of the worst AI projects just slap an LLM on a problem and let it figure everything out by itself. And while it might be tempting to do so, using existing, reliable, well tested tools for - scraping the web, performing calculations, generating data - and letting the AI using the data provided by those existing tools, leads to much more consistent, and overall better results.

I think it's our responsibility as consumers, and developers, to not give into the hype AI companies try to create around LLM intelligence, and "achieving AGI internally". We should pressure them into stopping the massive environmental impact they are causing just to achieve higher numbers on a benchmark. And we should use AI responsibly, and safely. And who knows, maybe it'll lead to even smarter AI in the future.