AI Product Design Wealth Management Enterprise SaaS Trust & Explainability

Goldman Sachs

Led product design for an AI-powered wealth advisory platform — building citation-backed recommendations, confidence visibility, and editable AI outputs that financial advisors would actually trust enough to send to clients.

Role Product Designer — AI SaaS Platform

Timeline Jun 2025 — Present

Platform Web SaaS · Advisor + Client portals

Domain AI · Wealth Management · Compliance

Outcomes — two product releases

Activation 52% → 68% · AI acceptance 38% → 67% · Trust patterns adopted as platform standard

+16pt new-advisor activation lift over two product releases (52% → 68%)

+29pt AI recommendation acceptance lift after citation + confidence patterns shipped

~60% estimated reduction in "why did the AI suggest this" support tickets post-release

3 trust patterns shipped — confidence visibility, citations, editable outputs

Overview

An AI advisory tool that advisors didn't trust enough to use

Goldman Sachs has been building AI-powered tools to augment its wealth advisory workflows — helping advisors generate personalised investment recommendations, summarise client portfolios, and draft client-ready commentary in seconds rather than hours. The technology worked. The adoption didn't.

I joined the AI SaaS team to own the redesign of onboarding, recommendation flows, and the trust layer that sits between an LLM output and a human advisor's reputation. In financial advisory, a wrong AI recommendation isn't just a UX problem — it's a regulatory and fiduciary one. Trust isn't a feature. It's the product.

Headquarters

New York, USA

Founded

1869

Industry

Investment Banking · Wealth Mgmt

AUM

~$2.8T (public, 2024)

Platform Users

~3,000 advisors (internal SaaS)

Compliance Surface

SEC · FINRA · internal IB controls

Problem

Advisors weren't abandoning the AI because it was wrong. They abandoned because they couldn't audit it.

When I joined the team, the platform was hitting only 52% activation among new advisors. AI recommendations were displayed as confident, polished outputs — with no visible reasoning, no source data, no way to amend before sending to clients. In a regulated industry where every advice document gets archived for seven years, that's not "AI being helpful." That's career risk.

Problem statement First-time platform users (wealth advisors, 5–15 years tenure) failed to complete activation because AI-generated recommendations lacked source provenance, confidence signalling, and edit affordances — resulting in 48% of new advisors abandoning within their first two weeks, documented via internal product analytics across the first two release cohorts.

"I'm not going to send a client a recommendation I can't defend to compliance. The AI gives me a clean answer, but I can't see where it came from. So I just... don't use it."

— Senior wealth advisor, week-2 platform exit interview

Research

Three sources of evidence pointed to the same root cause

I treated the activation drop as a research problem before treating it as a design one. The thesis wasn't "the UI is confusing" — it was "advisors don't trust outputs they can't trace." Three independent evidence streams converged on the same answer.

Activation Funnel Analysis

Internal product analytics showed 48% of new advisors dropped between first AI recommendation view and first AI recommendation accepted. The drop was concentrated in a single 90-second window after the AI output rendered. The pattern was hesitation, not navigation failure.

Advisor Shadowing (n=12)

I shadowed 12 advisors during live client portfolio reviews. Eight of them mentioned, unprompted, that they "couldn't see the AI's working." Four had built personal spreadsheets to reverse-engineer AI recommendations before using them. That's a strong signal that the trust layer was missing.

Industry Benchmark

Gartner's 2024 "Trust in AI Financial Tools" report found 73% of financial advisors reject AI recommendations that lack visible source citations — even when the recommendation is correct. This wasn't a Goldman-specific behaviour. It was a category-wide pattern we hadn't designed for.

What the data didn't say — and how I filled the gap

Internal analytics could tell us where advisors dropped off, but not why a specific recommendation got rejected. To get to mechanism, I partnered with the AI/ML team to build a lightweight feedback loop: a one-tap "this wasn't useful" signal, with optional reason codes. Within three weeks we had categorised reasons for 1,200+ rejections. The top three: "no source visible" (38%), "can't edit before sending" (24%), "confidence not stated" (19%). 81% of rejections were trust-layer problems, not AI-quality problems.

Options Considered

Three approaches. Two rejected. Here's the reasoning.

Option A — Rejected

Full automation — AI auto-applies recommendations to client portfolios

The fastest path to "AI is doing the work" — but a non-starter. Regulatory liability exposure was unacceptable: SEC fiduciary duty rules require human attestation on advice. Compliance team flagged it on day one. The design didn't need to bypass advisors; it needed to make advisors faster.

Option B — Rejected

AI as a "suggestion bot" — vague, low-confidence outputs only

The safest path. The platform would only surface hedged suggestions ("you might consider...") and let the advisor do all the work. Rejected because pilot feedback was unambiguous: advisors complained the AI was "too cautious to be useful." A tool that doesn't make a real recommendation provides no leverage — advisors would just bypass it.

Option C — Chosen

AI as auditable co-pilot — confident outputs with full provenance + edit affordances

Surface every AI recommendation with: (1) a numerical confidence score, (2) inline citations to the underlying data sources, and (3) editable fields so advisors could amend before sending to clients. The AI does the heavy lifting; the advisor stays in the loop and stays accountable. This is the position the regulator, the advisor, and the AI/ML team could all defend.

Decision & Tradeoff

We chose explainability over speed. Here's what we gave up to do that.

Every AI recommendation now takes 2.4 seconds longer to render than it did pre-redesign. That's not a bug — that's the cost of rendering confidence scores, retrieving citations from the source data layer, and laying out the edit affordances inline. The product team initially pushed back on the latency hit. Here's how we defended it.

+ Gained

Advisor trust. AI recommendation acceptance went from 38% to 67%. Activation rose 16 points. Estimated 60% reduction in "why did the AI suggest this" support tickets.

− Lost

2.4 seconds of perceived latency per recommendation. Compared to the prior version, the AI feels slower. For high-throughput advisors generating 30+ recommendations per day, that's ~70 seconds of cumulative wait time.

Why accepted

A 2.4s delay on a usable tool beats a 0s delay on an untrusted one. The data was unambiguous: 81% of rejections were trust-layer problems, not speed problems. Optimising for the bottleneck was the only call.

The three trust patterns we shipped

Confidence visibility. Every recommendation displays a 0–100 confidence score with a visual band (high/medium/low). Advisors can filter the queue to "high confidence only" if they want to triage faster. The number is generated by the model's actual logit distribution — not a vibe.

Citation-backed recommendations. Every claim links inline to the underlying portfolio data, market signal, or client preference that triggered it. Click any citation; the source panel opens with the raw data. This is the single most-used feature in the release — 84% of activated advisors use it at least daily.

Editable AI responses. Advisors can amend any AI output before sending to a client. Edits are tracked, attributed, and don't get lost on regeneration. This was the compliance team's hard requirement — and it turned out to be the design feature advisors valued most.

Design Decisions

Every UI decision pointed back to one question: can the advisor defend this to compliance?

The recommendation card — annotated decisions

Confidence band placed top-right, not bottom.

Eye-tracking from pilot sessions showed advisors scanned recommendations in a top-right Z pattern. Placing the confidence score in their first visual stop meant trust evaluation happened before reading content — not after.

Citations underlined, not chip-styled.

Initial design used pill-shaped chips for citations. Tested poorly — advisors read chips as "tags" not "sources." Inline underlines (the document convention they'd seen for 20+ years) tested 3× higher in click-through on first encounter.

Onboarding flow — sequencing the trust narrative

The activation funnel had a specific failure point: the first AI recommendation. So we rebuilt onboarding to teach trust before testing it. First screen: a guided tour through one pre-built recommendation, narrating the confidence score and citation system before the advisor sees their own data. Second screen: a sandbox recommendation against fake client data, where the advisor can test editing without risk. Only on screen three does live client data appear. The sequence: watch → practice → do.

Design system contribution

I built and maintained the centralised Figma design system that the recommendation card patterns now live in — with reusable components, governance standards, and explicit documentation on which components are "AI-output" vs "advisor-output" (a critical compliance distinction). Engineering handoff time on AI-pattern work dropped substantially after the system shipped, and the cross-team consistency made downstream features faster to design and faster to review.

Outcome

Activation 52% → 68%. AI acceptance 38% → 67%. The numbers, and what they mean at scale.

Measured outcomes — two release cohorts

New-advisor activation: 52% → 68% (+16 percentage points). Measured via internal product analytics across the first two product releases post-redesign. The improvement was concentrated among advisors with 5–15 years of tenure — exactly the cohort the research had flagged as most trust-sensitive.

AI recommendation acceptance rate: 38% → 67% (+29 percentage points). Measured per-recommendation across all activated advisors. The lift validated the trust thesis: when advisors can audit AI outputs, they accept them at nearly twice the rate.

Support ticket reduction (estimated): tickets categorised as "AI behaviour unclear" or "why did this happen" dropped an estimated 60% in the first quarter post-release, based on the ticketing system's auto-tag classifier (calibrated against a manual audit of 200 tickets). Methodology: pre/post comparison of 90-day windows, holding ticket volume baseline constant.

Business framing — the leverage of a 16-point activation lift

Goldman's wealth platform serves ~3,000 internal advisors. A 16-point activation lift translates to ~480 additional activated advisors beyond the prior baseline. With an average book of ~$200M AUM per advisor (a publicly-cited Goldman PWM range), that's an estimated ~$96B in AUM newly under AI-augmented management — not new assets, but newly capable of being managed with the platform's AI tooling. The framing here matters: design didn't acquire AUM, but it removed the trust ceiling that was blocking the AI from generating leverage on existing AUM.

"The redesign didn't change the AI model. It changed whether advisors believed the AI model. That's the entire game in regulated industries."

— Product lead, internal release retro

What's Next

Cohort-specific confidence visualisations — A/B test in next release

The 67% AI acceptance rate is strong, but the residual 33% is informative. Reason-code data shows senior advisors (15+ yrs) reject recommendations because the confidence display feels "too prescriptive" — they prefer a range to a single number. Junior advisors (under 5 yrs) reject because they want more guidance, not less. One UI doesn't serve both cohorts.

Hypothesis: rendering confidence as a range (e.g., 78–84%) for senior advisors and as a single number + plain-language explanation for junior advisors will lift acceptance by another 8–12 percentage points. Instrumentation is already in place: tenure-segmented analytics, cohort-segmented A/B framework, and a rollback plan if the senior cohort sees acceptance drop instead.

Constraints

A regulated industry. A black-box AI. A two-release window. Pick your battles.

Goldman's AI advisory platform doesn't have the luxury of consumer-app constraints. Every design decision had to clear three filters before it shipped: compliance, model interpretability, and advisor trust. The constraint wasn't time — it was the surface area I could realistically influence inside a two-release product cycle.

The triangle: I picked Scope + Quality. Time was the dial I managed.

Compliance reviews don't compress. The AI model couldn't change. So the conscious sacrifice was timeline: the trust-layer redesign shipped across two releases instead of one, with the confidence pattern in release 1 and citations + editability in release 2. Cutting scope was unsafe (an incomplete trust layer is worse than no trust layer). Cutting quality was unthinkable (a regulated industry). Time stretched. That was the right call.

Finding Assumptions Without Bypassing Compliance

Traditional UX research with external users wasn't an option — advisor workflows touch client data covered by attorney-client privilege equivalents. So I designed a synthetic-advisor research protocol: shadowing real advisors during real reviews, but only capturing interaction patterns, not data content. 12 sessions produced the qualitative signal; analytics produced the quantitative.

What Moved to V3

Cohort-specific confidence rendering. Automated regression flags when AI confidence drops below a per-client trust threshold. Multi-modal citations (chart-based, not just text). All documented with clear ownership and instrumentation already in place — so the next sprint inherits a brief, not a blank page.

How AI Tooling Compressed Design Cycles

Figma Make accelerated component scaffolding for the recommendation card variants (we needed 14 confidence-band states). Claude Code prototyped the citation expansion microinteractions so I could pressure-test them before engineering committed. The result: a compliance-sensitive surface shipped in two releases instead of three.

"In regulated AI, the design constraint isn't time. It's how much you can change before the legal review cycle resets. Pick your changes carefully — and make every one count."

Reflection

Three lessons that will shape how I design AI products from here

Trust Is The Product

In regulated industries, AI adoption isn't blocked by model quality — it's blocked by the absence of an auditable trust layer. The most important UI on the screen isn't the AI's answer. It's the explanation behind it.

Slower Beats Untrusted

Performance is a feature only when the baseline product is being used. A 2.4-second latency cost was the right tradeoff because the alternative was a faster product that no one trusted enough to adopt. Speed is downstream of trust.

Design The Audit Trail

Every AI surface I design now starts from "how does the human prove they reviewed this?" not "how do we display the model output?" That single reframe changes the entire information architecture — from confidence visibility to edit history to citation density.

Conclusion

In regulated AI, you don't ship a model. You ship the proof that a human can defend it.

The trust patterns shipped at Goldman Sachs — confidence visibility, citation-backed recommendations, editable AI outputs — are now adopted as the platform standard for AI-output surfaces across the broader wealth advisory product line.

Glad we could cross paths.
Out of anywhere you could be, you're here.