Plugging Into the Customer's IdP: SAML and SCIM at the Trust Boundary

September 14, 2022

Plugging Into the Customer's IdP: SAML and SCIM at the Trust Boundary

The internal multi-account model — "one organization, many accounts, one human moving between them" — is the previous post in this series. That post is about how a single human stays a single human across acme-dev, acme-staging, and acme-prod once they've authenticated.

This post is about the authentication itself when an enterprise customer brings their own identity provider. AcmeCo doesn't want their employees to maintain a platform-specific password. They want Okta to be the source of truth: anyone who can log in to Okta can log in to the platform, anyone removed from Okta is gone from the platform within seconds, and anyone who shouldn't have access never gets in.

That's a federation problem. The work is in the bridge between "the IdP says this is Anita" and "the platform's organization_user record for Anita." Get that bridge right and everything downstream — multi-account sessions, role mapping, audit — keeps working. Get it wrong and the multi-account layer is propagating a lie.

What sits between the customer and the platform

The bridge has four moving parts. Each does one thing and stays out of the others' business:

flowchart TB IDP["Okta / Azure AD / Google Workspace<br/>(customer's IdP)"] IDP -- "SAML assertion at login" --> SAML["SAML verifier"] IDP -- "SCIM provisioning events" --> SCIM["SCIM endpoint"] SAML --> BRIDGE["Identity bridge<br/>(NameID → org_user_id)"] SCIM --> BRIDGE BRIDGE --> ORG[("organization_users<br/>(Mongo collection)")] BRIDGE --> ACC[("account_users<br/>(per-account membership)")]
  • SAML verifier. Verifies the assertion's signature, freshness, audience, and recipient. It does not care about who Anita is; it only cares whether the document is authentic.
  • SCIM endpoint. Receives lifecycle events — create, update, deactivate, reactivate, delete — pushed by the IdP. It does not authenticate users; it just keeps the local mirror of "who exists" in sync with the IdP's truth.
  • Identity bridge. The mapping logic that turns an IdP-asserted identity (NameID, email, group claims) into an organization_users record and the right set of account_users rows. Both SAML and SCIM funnel into this.
  • The platform's identity layer underneath. Already covered in the previous post. The bridge writes through it; it doesn't bypass it.

The trust boundary is between the first two boxes and the rest of the picture. Above the boundary, we trust the IdP's signatures and provisioning calls. Below it, we own the data and decide what it means.

NameID is what crosses the boundary

A SAML assertion from Okta carries a NameID. It's the IdP's stable, opaque identifier for the human:

<saml:Subject> <saml:NameID Format="urn:oasis:names:tc:SAML:1.1:nameid-format:persistent"> okta|acme|7c4f9a18-3b66-4f0d-8b1a-19e7f1c0f0cc </saml:NameID> </saml:Subject>

That string identifies Anita the human across every login Okta issues for her, forever. It does not change when she changes her email. It does not collide with another Anita in another company. It is the closest thing to a primary key for a person that the IdP gives us.

When NameID arrives at the bridge, the job is to find — or create — the organization_user it corresponds to. The bridge stores idp_subject_id on the organization_users document and uses that as the lookup key:

// db.organization_users (additions for federation) { _id: ObjectId("..."), organization_id: "acme", email: "anita.rao@acme.com", email_verified: true, idp_subject_id: "okta|acme|7c4f9a18-3b66-4f0d-8b1a-19e7f1c0f0cc", idp_subject_active: true, // ... } db.organization_users.createIndex({ idp_subject_id: 1 });

This is the design choice that anchors everything else. Email is not the join key. Emails get aliased, change with marriage, and get reused after off-boarding. Anchoring on email is a future incident. NameID is opaque, stable, and authoritative — it's the thing the IdP commits to never reusing.

Once the bridge has the organization_user, the rest is exactly the multi-account flow from the previous post: list the account_users, present the picker, mint the account session. The federation hasn't changed the model; it's only changed how the human got authenticated in the first place.

The login path, end to end

def login_with_saml(assertion_xml: bytes) -> LoginResult: # 1. Verify the assertion (signature, freshness, audience). verified = saml.verify(assertion_xml, organization_id="acme") if not verified.subject_active: # The IdP is asserting that the user has been disabled — even # if SCIM hasn't told us yet, the assertion itself says no. return LoginResult.denied("user_disabled_at_idp") # 2. Find the org_user by NameID. (The bridge.) org_user = db.organization_users.find_one( {"idp_subject_id": verified.name_id} ) # 3. First-time federated sign-in for this human → JIT provisioning. if org_user is None: org_user = jit_provision(verified) # 4. Hand off to the existing multi-account flow: pick an account, # mint an account session. No federation-specific logic here. org_session = sessions.mint_org_session(org_user) account_users = list(db.account_users.find( {"org_user_id": org_user["_id"], "status": "active"} )) return LoginResult.ok(org_session, account_users)

The two new responsibilities introduced by federation are SAML verification (step 1) and JIT provisioning (step 3). Steps 2 and 4 are the same code paths the platform already had — the federation reuses them rather than parallel-implementing.

jit_provision creates the organization_user and the initial account_user rows the IdP says the human should have access to. Which leads to the next problem: how does the IdP say that?

Group claims map to roles, per account

A SAML assertion can carry the IdP's group memberships:

<saml:AttributeStatement> <saml:Attribute Name="groups"> <saml:AttributeValue>eng-leads</saml:AttributeValue> <saml:AttributeValue>platform-admins</saml:AttributeValue> </saml:Attribute> </saml:AttributeStatement>

The IdP knows about its groups (eng-leads, platform-admins). It does not know what they mean inside the platform. The mapping has to live somewhere, and the question is whose somewhere.

The mapping is deliberately per-account, owned by each account's admin:

// db.idp_group_mappings { _id: ObjectId("..."), account_id: "acme-prod", idp_group: "eng-leads", role: "approver" // a role defined inside acme-prod }

The IdP sends "Anita is in eng-leads". The federation evaluates the mapping for each account she's a member of:

Accounteng-leads maps to
acme-devadmin
acme-stagingdesigner
acme-prodapprover

The reason this is per-account: the admin of acme-prod does not want IT changing what "Approver" means in her account, and the admin of acme-dev is happy giving Anita full admin rights for development. The federation respects the existing per-account autonomy from the multi-account model — it doesn't override it. IdP claims are inputs; the meaning is local.

JIT provisioning consults this table for each account the human's groups grant access to, and writes the right account_user row with the right role.

SCIM keeps the mirror in sync

SAML handles authentication at login time. SCIM handles lifecycle: who exists, who is active, who has changed groups. Without SCIM, the platform only learns about lifecycle events when the user next tries to log in. With SCIM, the IdP pushes the change immediately.

The SCIM endpoint is a small REST surface the IdP calls into:

POST /scim/v2/Users # create GET /scim/v2/Users/{id} # read PATCH /scim/v2/Users/{id} # update (incl. active=false) DELETE /scim/v2/Users/{id} # full delete

The deactivation path is the most operationally important one:

@scim_route.patch("/Users/{scim_id}") def patch_user(scim_id: str, op: ScimPatchOp) -> dict: if op.is_deactivation(): org_user = db.organization_users.find_one_and_update( {"idp_subject_id": scim_id}, {"$set": { "idp_subject_active": False, "deactivated_at": datetime.utcnow(), }}, return_document=ReturnDocument.AFTER, ) if org_user is None: return scim_404(scim_id) # Cascade: mark every account_user inactive, kill sessions, revoke tokens. db.account_users.update_many( {"org_user_id": org_user["_id"]}, {"$set": {"status": "deactivated"}}, ) sessions.invalidate_by_org_user(org_user["_id"]) refresh_tokens.revoke_by_org_user(org_user["_id"]) return scim_ok(org_user) # Other patch ops: email change, name change, group sync... return apply_other_patch(scim_id, op)

Within seconds of IT toggling Anita off in Okta, every session in acme-dev, acme-staging, and acme-prod is gone, and every refresh token is dead. The cascade through org_user_id is the same join the multi-account model already uses; SCIM just supplies the trigger.

Pull-on-login is the safety net

SCIM is a push from the IdP. Pushes are unreliable in the abstract — IdPs go offline, customers misconfigure connectors, network paths drop. So the bridge does not only trust SCIM. It also re-checks at every login.

The check is implicit and free: a SAML assertion is only valid if the IdP issued it. A user who has been disabled at the IdP cannot produce a fresh assertion, no matter what SCIM has or hasn't told us. So even if SCIM is silent, the next login fails.

This belt-and-braces design is what makes the off-boarding promise defensible: "any login attempt after off-boarding will fail, regardless of session state, regardless of SCIM state." SCIM gets the fast cases (existing sessions and tokens dropped within seconds); pull-on-login covers the slow ones (the case where SCIM never arrived). The customer doesn't have to choose which to trust; both are running at all times.

Audit: one assertion, multiple correlated events

A federated login that ends up in two accounts should be traceable as one event by an auditor, not as two unrelated events. Every audit row written downstream of a federated assertion carries a federation_event_id minted at the moment the SAML verifier accepts the assertion:

def login_with_saml(assertion_xml: bytes) -> LoginResult: verified = saml.verify(assertion_xml, organization_id="acme") federation_event_id = ObjectId() # one ID for this login, end to end audit.write({ "federation_event_id": federation_event_id, "event": "saml.assertion_accepted", "organization_id": "acme", "idp_subject_id": verified.name_id, "ts": datetime.utcnow(), }) # ... rest of the flow, with federation_event_id propagated into # the org session and every account session minted from it ...

When Anita switches from acme-dev to acme-staging later in the same browser session, those switches inherit the same federation_event_id. An auditor querying "trace this SAML assertion end to end" gets a list of events ordered by timestamp, all linked, spanning every account she touched.

For "who had access to acme-prod on March 14th?", the audit model from the multi-account post already has the answer — temporal valid_from / valid_to on every role assignment, single indexed Mongo lookup. Federation didn't change that; it just made the source of truth about Anita's existence the IdP, not a the platform-managed password.

What changes for the customer

After the federation lands, AcmeCo's IT director sees:

  • One IdP connection at the organization level, applied to every account under it. New accounts the organization creates inherit the federation automatically.
  • One off-boarding action in Okta that propagates within seconds to every account, every session, every token.
  • Group-based access that they already maintain in Okta — "members of eng-leads get these roles in these accounts" — without any per-user provisioning steps.
  • An audit trail that ties every IdP-asserted login to the downstream account-level events under a single correlation ID, queryable by their auditors directly.

What didn't change is everything below the trust boundary. The multi-account model is identical. Account autonomy is identical. The picker, the account session, the per-account role definitions — all unchanged. Federation is a layer that produces organization_user records and group-based account_user rows; everything that consumed those records before still consumes them now.

The shape of the bridge stayed small on purpose: a SAML verifier, a SCIM endpoint, a single idp_subject_id field on organization_users, and a per-account group-mapping table. Anything more would be the bridge picking sides between the IdP and the platform — and the entire design rests on the bridge picking neither.

GitHub
LinkedIn
X