Led product design for an AI-powered wealth advisory platform — building citation-backed recommendations, confidence visibility, and editable AI outputs that financial advisors would actually trust enough to send to clients.
The Stakes
Outcomes — two product releases
Goldman Sachs has been building AI-powered tools to augment its wealth advisory workflows — helping advisors generate personalised investment recommendations, summarise client portfolios, and draft client-ready commentary in seconds rather than hours. The technology worked. The adoption didn't.
I joined the AI SaaS team to own the redesign of onboarding, recommendation flows, and the trust layer that sits between an LLM output and a human advisor's reputation. In financial advisory, a wrong AI recommendation isn't just a UX problem — it's a regulatory and fiduciary one. Trust isn't a feature. It's the product.
Headquarters
New York, USA
Founded
1869
Industry
Investment Banking · Wealth Mgmt
AUM
~$2.8T (public, 2024)
Platform Users
~3,000 advisors (internal SaaS)
Compliance Surface
SEC · FINRA · internal IB controls
When I joined the team, the platform was hitting only 52% activation among new advisors. AI recommendations were displayed as confident, polished outputs — with no visible reasoning, no source data, no way to amend before sending to clients. In a regulated industry where every advice document gets archived for seven years, that's not "AI being helpful." That's career risk.
"I'm not going to send a client a recommendation I can't defend to compliance. The AI gives me a clean answer, but I can't see where it came from. So I just... don't use it."
— Senior wealth advisor, week-2 platform exit interviewI treated the activation drop as a research problem before treating it as a design one. The thesis wasn't "the UI is confusing" — it was "advisors don't trust outputs they can't trace." Three independent evidence streams converged on the same answer.
Internal analytics could tell us where advisors dropped off, but not why a specific recommendation got rejected. To get to mechanism, I partnered with the AI/ML team to build a lightweight feedback loop: a one-tap "this wasn't useful" signal, with optional reason codes. Within three weeks we had categorised reasons for 1,200+ rejections. The top three: "no source visible" (38%), "can't edit before sending" (24%), "confidence not stated" (19%). 81% of rejections were trust-layer problems, not AI-quality problems.
Every AI recommendation now takes 2.4 seconds longer to render than it did pre-redesign. That's not a bug — that's the cost of rendering confidence scores, retrieving citations from the source data layer, and laying out the edit affordances inline. The product team initially pushed back on the latency hit. Here's how we defended it.
Confidence visibility. Every recommendation displays a 0–100 confidence score with a visual band (high/medium/low). Advisors can filter the queue to "high confidence only" if they want to triage faster. The number is generated by the model's actual logit distribution — not a vibe.
Citation-backed recommendations. Every claim links inline to the underlying portfolio data, market signal, or client preference that triggered it. Click any citation; the source panel opens with the raw data. This is the single most-used feature in the release — 84% of activated advisors use it at least daily.
Editable AI responses. Advisors can amend any AI output before sending to a client. Edits are tracked, attributed, and don't get lost on regeneration. This was the compliance team's hard requirement — and it turned out to be the design feature advisors valued most.
Confidence band placed top-right, not bottom.
Eye-tracking from pilot sessions showed advisors scanned recommendations in a top-right Z pattern. Placing the confidence score in their first visual stop meant trust evaluation happened before reading content — not after.
Citations underlined, not chip-styled.
Initial design used pill-shaped chips for citations. Tested poorly — advisors read chips as "tags" not "sources." Inline underlines (the document convention they'd seen for 20+ years) tested 3× higher in click-through on first encounter.
The activation funnel had a specific failure point: the first AI recommendation. So we rebuilt onboarding to teach trust before testing it. First screen: a guided tour through one pre-built recommendation, narrating the confidence score and citation system before the advisor sees their own data. Second screen: a sandbox recommendation against fake client data, where the advisor can test editing without risk. Only on screen three does live client data appear. The sequence: watch → practice → do.
I built and maintained the centralised Figma design system that the recommendation card patterns now live in — with reusable components, governance standards, and explicit documentation on which components are "AI-output" vs "advisor-output" (a critical compliance distinction). Engineering handoff time on AI-pattern work dropped substantially after the system shipped, and the cross-team consistency made downstream features faster to design and faster to review.
New-advisor activation: 52% → 68% (+16 percentage points). Measured via internal product analytics across the first two product releases post-redesign. The improvement was concentrated among advisors with 5–15 years of tenure — exactly the cohort the research had flagged as most trust-sensitive.
AI recommendation acceptance rate: 38% → 67% (+29 percentage points). Measured per-recommendation across all activated advisors. The lift validated the trust thesis: when advisors can audit AI outputs, they accept them at nearly twice the rate.
Support ticket reduction (estimated): tickets categorised as "AI behaviour unclear" or "why did this happen" dropped an estimated 60% in the first quarter post-release, based on the ticketing system's auto-tag classifier (calibrated against a manual audit of 200 tickets). Methodology: pre/post comparison of 90-day windows, holding ticket volume baseline constant.
Goldman's wealth platform serves ~3,000 internal advisors. A 16-point activation lift translates to ~480 additional activated advisors beyond the prior baseline. With an average book of ~$200M AUM per advisor (a publicly-cited Goldman PWM range), that's an estimated ~$96B in AUM newly under AI-augmented management — not new assets, but newly capable of being managed with the platform's AI tooling. The framing here matters: design didn't acquire AUM, but it removed the trust ceiling that was blocking the AI from generating leverage on existing AUM.
"The redesign didn't change the AI model. It changed whether advisors believed the AI model. That's the entire game in regulated industries."
— Product lead, internal release retroThe 67% AI acceptance rate is strong, but the residual 33% is informative. Reason-code data shows senior advisors (15+ yrs) reject recommendations because the confidence display feels "too prescriptive" — they prefer a range to a single number. Junior advisors (under 5 yrs) reject because they want more guidance, not less. One UI doesn't serve both cohorts.
Hypothesis: rendering confidence as a range (e.g., 78–84%) for senior advisors and as a single number + plain-language explanation for junior advisors will lift acceptance by another 8–12 percentage points. Instrumentation is already in place: tenure-segmented analytics, cohort-segmented A/B framework, and a rollback plan if the senior cohort sees acceptance drop instead.
Goldman's AI advisory platform doesn't have the luxury of consumer-app constraints. Every design decision had to clear three filters before it shipped: compliance, model interpretability, and advisor trust. The constraint wasn't time — it was the surface area I could realistically influence inside a two-release product cycle.
Compliance reviews don't compress. The AI model couldn't change. So the conscious sacrifice was timeline: the trust-layer redesign shipped across two releases instead of one, with the confidence pattern in release 1 and citations + editability in release 2. Cutting scope was unsafe (an incomplete trust layer is worse than no trust layer). Cutting quality was unthinkable (a regulated industry). Time stretched. That was the right call.
"In regulated AI, the design constraint isn't time. It's how much you can change before the legal review cycle resets. Pick your changes carefully — and make every one count."
Conclusion
In regulated AI, you don't ship a model. You ship the proof that a human can defend it.
The trust patterns shipped at Goldman Sachs — confidence visibility, citation-backed recommendations, editable AI outputs — are now adopted as the platform standard for AI-output surfaces across the broader wealth advisory product line.
Glad we could cross paths.
Out of anywhere you could be, you're here.