Unit testing AI apps | Stefan Judis Web Development

Published at

Updated at

Reading time
1min

How do you evaluate your software’s doing what it’s supposed to do?

Do you test all your app’s possible cases, branches and states? I don’t, at least not manually. Nobody aint time to manually click through all the edge cases. QA’ing a simple login form takes time, let alone testing complex applications.

Having robots do that helps a ton, and I recommend writing automated tests to help you sleep well at night (and release fewer bugs)!

Ignoring the burden of writing and maintaining tests, testing a “normal” web application is straightforward because it’s predictable. Throw something at your app and expect a result. It should always do the same. Most apps are CRUD apps anyway — easy peasy.

But what if there are unpredictable parts in your app’s core?

If you’re riding the AI buzzword wave, you probably implemented an “I know everything” smart-ass right in your app’s core that’s known for lying and spreading fake news. (Yes, I mean some sort of LLM.)

How would you test your app’s quality if you’re building software on top of software you probably don’t understand?

Here’s Hamel Husain’s recommendation:

There are three levels of evaluation to consider:

  • Level 1: Unit Tests
  • Level 2: Model & Human Eval (this includes debugging)
  • Level 3: A/B testing

I’m not planning to get into serious AI work or LLM programming anytime soon, but unit testing software sitting on top of LLMs is fascinating and worth more than a bookmark!

Was this post helpful?
Yes? Cool! You might want to check out Web Weekly for more WebDev shenanigans. The last edition went out 4 days ago.

Related Topics

Related Articles