Authentication for the era of bots & AI agents

There are no good ways to handle MFA, yet.

Oct 02, 2024

I’ve recently been exploring utilizing AI agents in healthcare at Substrate (particularly billing and revenue cycle management) and ran straight into the buzzsaw of handling authentication. I had seriously thought this would be well solved by bots/RPA platforms (like UIPath) historically, but from what I can tell the solutions out there are abysmal and patched together, which is surprising given how much airtime AI agents get. Having dug in, there’s enough divergence from the current password manager paradigm. I *think* what needs to exist (at least in the enterprise context) is fairly different from today’s paradigm, but curious what I’m missing (or if there’s good tools out there that I just haven’t found. Here’s how:

The basics

Today’s password managers already do a good job of storing and retrieving usernames, passwords and (recently) MFA for human users. I’ve mostly used 1Password, and I find their browser UX to be fairly slick. But their tooling for developers building agents or RPA[1] feels like an afterthought.

Simply making it possible to easily access token based authentication via an API would be phenomenal, but looking through their docs this isn’t really well supported unless you’re willing to spin up your own service. However, without some tooling for this, you can’t really utilize RPA or AI agents in trusted environments, and its even harder to have automations run in the background.

The edges

Once you have something that enables basic password management and 2FA, there are a ton of edges that current tools don’t handle at all. As a result, particularly for high volume use cases, you’ll kind of need a human to babysit.

Handling all MFA methods & backups

There’s no standard way for enterprise webapps (think of this as any web application built by a 3rd party, that business users have to use in their day to day) to handle MFA & backup codes. Google is way ahead of the curve; GMail and Google Workspace allow users to configure multiple mechanisms for signing in, including multiple types of MFA and recovery mechanisms.

In addition to this Google is one of the early adopters of Passkeys. I personally can never really tell which login model I’ll need or be able to use when I’m logging into a Google app on a new device, but you can tell it’s somewhat thoughtful. I think in future, enterprises (and their security leaders) deploying any kind of browser automations, whether they be bots or agents, will simultaneously need services that can handle all these methods for their bots or agents in a tier 1 way, and enable them to have high quality visibility into how these credentials are being used.

In practice, in the context of bots and agents, I *think* this means that we should think of the password manager for agents as a password manager + authenticator + email inbox + sms inbox. In healthcare, I’ll use Cigna’s provider interface as an example (https://cignaforhcp.cigna.com) the auth model includes

A username
A password
One time passcodes sent over email

Even if you could have your AI agent utilize something like 1Password or Lastpass, you’d still need a human monitoring the email inbox every time your agent needs to log in. If a password manager could programmatically generate an inbox specifically for this, that was bounded to your enterprise domain, this problem’s mitigated. This model could also handle magic links (similar to passing along a one time code, just passing along a sign-in URL).

Doing this creates another issue - what happens if Cigna starts sending OTHER (non MFA) notifications to that email address? I *think* these are easy/okay to forward on to a human/group inbox, and setting up forwarding at least means your automation doesn’t break simply because the person in charge of monitoring it took the day off. [2]

The same dynamic is true for phone numbers, and from what I understand, lots of sites have constraints that prevent users from setting up sms based 2FA using VOIP numbers. I think in this case the innovation is to actually be able to a) generate non VOIP numbers that can be used and b) monitor the message inboxes for those numbers when one time passcodes come in. In addition, the telephony restrictions mean there will need to be a physical deployment component, as provisioning non-VOIP SMS-capable phone numbers will typically require hardware and at least 24-48 hours of lead time. You could probably reduce this lead time with some clever BD with the telephony operators (eg Verizon). However, this physical device constraint is one reason why this functionality makes sense for a 3rd party password manager like 1Password or enterprise auth provider like Okta to build.[3]

Tracking

The other pretty material thing you’d want is to be able to log

All agent sessions
Anytime the vault was accessed by a human (or by an entity that is not the agent the vault was designed for)
Any logins/activities/sessions with behavior that deviates from what the agent was designed to do

This is kind of similar to the audit trail functionality that exists in SSO products like Okta, with the added layer of intelligence that its likely your agents will be executing at such high volumes, that the most interesting thing is your security team will want is insight into anomalous behavior - both when a non-agent is utilizing the agent’s privileges, and when the agent is doing things it shouldn’t.

Why is this all necessary

In an ideal world, every webapp/portal out there would be like Google and use passkeys. In the world we live in, enterprises (and any automations they deploy) will need to interface with aged portals built by underfunded teams, and this will be true for a long time. As such a modern authentication platform will have to be backwards compatible, with all the varying security paradigms enterprise apps have deployed over the last 2 decades.

Assuming this works (or something exists that serves this purpose) there’s an additional layer of authorization or permissioning that might need to be built. If you model an agent as an executive assistant or personal assistant, you can let it largely be autonomous, but request permission from you under certain general conditions (eg ask before spending more than $300). Not super sure whether this permissioning layer should be centralized (ie I have one area where I manage all my permissions), embedded (ie in the agent control surface area) or just via messaging/sms. But it will need to exist, and is a natural extension of the authentication layer.

How you’d build this

The fullstack way to build this would be to build a password manager from scratch. A “lite” version would be to wrap something around an open source password manager like KeePass. The main benefit of the fullstack path is you have full control, and can ensure a consistent high quality experience to all end users + buyers. The main downside of this path is; to build a high quality, performant password manager is probably quite expensive, and the odds of screwing it up rise exponentially if you’re doing it from scratch. In addition, no enterprise will trust you for a very long time.

The path I think makes the most sense is to build a layer that sits on top of corporate/enterprise password managers (like Okta, Bitwarden, Passworks, 1Password etc). This way, the enterprise maintains control over their core password management system (which is probably deeply integrated into their security/IT stack), and can give you access to a vault for their bot or agent workflows. Assuming the APIs are good enough for entering, managing and retrieving credentials, and for the session logging/tracking, this would work just fine, and enables you to focus on safely handling credential access (vs. storing, enterprise sale, etc). I’m just not yet sure if the developer access for these services is there yet. And lastly, if you screw up or anything happens with those credentials, the enterprise can easily revoke your access.

I started down this path from exploring authentication for AI agents, but in the process have realized that every Robotic Process Automation (RPA) service would heavily use similar tooling. In doing a pretty extensive investigation and talking to several companies doing RPA, their default solution to this problem is to ask customers to turn off MFA. . .

Open Questions

Having not built security products in the past I can’t tell if there are fundamental, first principles things I’m missing for why this doesn’t exist. The questions in my mind are:

Does having your password manager handle SMS leave you somewhat vulnerable to socially engineered phone number switches at the phone carrier?
Is there a similar attack vector for email?
Are enterprise bot/RPA/agent use cases high volume enough that something needs to exist to handle the edges? (I am pretty sure this is true in healthcare, but is it true in other domains as well)
Why didn’t RPA giants like UIPath build this? Looking up their approach to MFA (here) isn’t super confidence inspiring.
How do you solve Captchas? Captchas enable a website to know if you’re human, but a) if an agent needs access, is a Captcha necessary? And b) we probably are close to a world where LLMs can accurately do a Captcha - in this world, what’s the point of a Captcha? In addition there are existing providers (eg https://2captcha.com/) with humans in the background that resolve these with SLAs of a few seconds
Could a Passkey obviate the need for all this? Would you need your agent(s) to run on a specific, unique hardware device with it’s own vault?
For the provider of an enterprise webapp (like a payor portal), there’s a meta question of what exactly they think “authentication” means. Does it mean a specific human logged in? Does it mean any authorized entity logged in (regardless of whether that entity is human)? My instinct is that this answer is different depending on the enterprise context - certain enterprise context (like in a lot of financial services back-office) might just not care as long as the work is done. In more adversarial environments like healthcare (where higher provider efficiency translates into more cost of goods sold for payers) I suspect that easier, better authentication for agents and bots is actually NOT desirable at all.

Thanks to Femi Omojola, Owen LaCava, Ali Hussain, Dami Omojola, Daniele Perito, Joshua Levine and Inderpal Singh for reading this in draft form.

[1] referring to both AI agents and bots here because the problem is the same for the 2, and while AI agents are getting lots of airtime today, traditional RPA is already fairly widespread and I think will continue to grow.

[2] Considering email in particular, there are a few alternatives. I think it’s better to have a dedicated inbox for your bot or AI agent (vs giving it access to a human user’s full inbox) for a few reasons. First, full inbox access doesn’t improve the agent’s performance in any way. Second, you likely want visibility into every single time the agent auths, or accesses credentials, to tie that directly to it performing it’s tasks. Giving a human access to the agent’s inbox means a bad actor can wreak havoc in your enterprise application under the guise of being an agent, and you’d just assume your agent was broken. You’ll also want to log every time a human views the agent’s authentication credentials.

Another alternative would be to create the credentials with an email that’s an enterprise domain mailing/distribution list. This way the password managers don't have to run an inbox that users can see they just need to be allowed to join the mailing list .This will also mean any number of other users & agents can be added and removed (which again means a human can view the credentials without the enterprise really knowing). With that being said, it’s possible that humans misusing agent credentials might be a made-up attack vector that’s not worth optimizing for. Just not sure.

[3] It’s possible that we might have to just abandon SMS 2FA as a construct. Semantically (I think) the objective of SMS 2FA is to tie a login to a human that has been “authenticated” in a few ways (eg when you get a cell phone, you provide a driver’s license, credit card, etc). If you create a phone number for an agent or bot (like if you give a server a sim card) it obviously doenst have it’s own driver’s license (or other id). So is the SMS 2FA authenticating the human who's building/running/employing the bot/agent? Is there a different “signature” or authentication method that more closely ensures a) that the agent can log in this time b) that the entity logging in is the right agent (ie the one that is actually authorized by the enterprise, not the malicious one).

Kunle.app

Discussion about this post