AMD’s next Radeon might go 512‑bit: 96 CUs and 32–48GB VRAM? I’m shocked (in a good way)

**New RDNA 5 leaks point to a flagship Radeon with 96 CUs, 16 memory controllers (potentially 512‑bit), and 32-48GB VRAM. If true, AMD finally has a realistic shot at Nvidia’s next-gen “80-class.” Treat all details as unconfirmed but intriguing.**

My gut reaction: 16 memory controllers on a gaming Radeon? Hello, 512-bit nostalgia

I did a double take when I saw the block diagrams. Sixteen unified memory controllers ringing the die like a stadium crowd? If those are 32-bit each, you’re staring at a 512-bit bus on a mainstream gaming Radeon again. That’s the kind of spec that makes GPU nerds like me grin and immediately reach for a napkin to scribble bandwidth math. Sure, these are leaks and not gospel, but the shape of this rumor hits a nerve: AMD might be gearing up to bulldoze its bandwidth bottlenecks, feed a lot more compute, and actually push high-end ray tracing without wheezing. About time.

The source is AnandTech forum regular Kepler_L2, who has a solid track record (they pegged core counts and memory on the RDNA 4 top chip-what became the RX 9070 XT-months before it was public). This time, the diagrams show four chips under an “AT” codename umbrella (Alpha Trion, allegedly): a 96‑CU beast at the top (likely AT0), and three mortals beneath it-AT2 (40 CUs), AT3 (24 CUs), and AT4 (12 CUs). There’s even chatter that the flagship could eclipse the 9070 XT by 50%+, positioning it to square up against Nvidia’s rumored RTX 6080. All unconfirmed, obviously, but the architecture choices they’re hinting at are the most interesting part anyway.

My first impression wasn’t “ooh, big number go brrr.” It was more, “wait, is AMD actually prioritizing feeding the cores this time?” Because a fat memory bus changes the kind of GPU you can build. It’s not the only path to performance, but it’s one of the most bluntly effective ones—especially if you want to keep ray tracing and AI workloads from choking at high resolutions.

Specifications

Chip (Rumored)	AT0 (Flagship RDNA 5)
Compute Units (CUs)	96
Unified Memory Controllers (UMCs)	16
Possible Bus Width	Up to 512-bit (if 32-bit per UMC)
VRAM Possibilities	32GB (16x2GB) or 48GB (16x3GB) if densities align
Positioning	“80-class” contender; >50% over RX 9070 XT suggested
My Take	The headline is the bandwidth. 96 CUs only shine if you feed them; 16 controllers suggests AMD knows it.
Chip (Rumored)	AT2
Compute Units (CUs)	40
Unified Memory Controllers (UMCs)	6
Possible Bus Width	~192-bit
VRAM Possibilities	12GB/24GB depending on memory type/density
Positioning	Upper midrange; between RX 9060 XT (32 CUs) and RX 9070 (56 CUs)
My Take	Odd duck: more compute than budget, but a narrower bus. Feels like 1440p sweet spot if clocks/IPC land.
Chip (Rumored)	AT3
Compute Units (CUs)	24 (conflicting chatter up to 48 due to dual-issue confusion)
Unified Memory Controllers (UMCs)	8
Possible Bus Width	~256-bit
VRAM Possibilities	8-16GB on GDDR; rumor mill even floated LPDDR5X pairings
Positioning	Bread-and-butter 1080p/1440p at sane prices
My Take	Wider bus than AT2 with fewer CUs = bandwidth-biased design. Could be surprisingly punchy if priced right.
Chip (Rumored)	AT4
Compute Units (CUs)	12
Unified Memory Controllers (UMCs)	4
Possible Bus Width	~128-bit
VRAM Possibilities	8GB typical
Positioning	Entry-level, eSports and HTPC territory
My Take	The kind of card that lives or dies by efficiency and price. Please, please skip 6GB configs forever.
Context Note	All specs are unconfirmed and based on leaked block diagrams and forum discussion. Bus width depends on controller width; performance depends on clocks, architecture, and memory type (GDDR6/7 or others).

The detail that hooked me: bandwidth-first design

I’ve been daily-driving a 7900 XTX in one rig and a 4070 Ti Super in a small-form-factor build. The biggest difference I feel in modern games isn’t raw raster, it’s how quickly each card runs out of breath when the VRAM and bandwidth demands spike—think path tracing in Cyberpunk 2077, heavy texture packs, or modded Starfield. You can muscle through a lot with smarts (caches, compression, frame gen), but there are moments where raw throughput wins. A 512-bit bus is a sledgehammer.

Do some back-of-the-napkin math: even with last-gen GDDR6 at 24Gbps, 512-bit nets ~1.5TB/s of bandwidth (512 x 24 / 8 = 1536 GB/s). If GDDR7 at 28-32Gbps shows up on this card, you’re eyeballing ~1.8–2.0TB/s. That’s daft for a gaming GPU—daft in a very good way—especially if AMD keeps some form of Infinity Cache onboard. The cache reduces trips to VRAM; the fat bus slams the door on worst-case misses. It’s the “belt and suspenders” approach to memory starvation.

It took me a minute to understand why I’m so excited about the memory controllers rather than the 96-CU headline. CUs aren’t a universal yardstick. AMD’s been running dual-issue compute units since RDNA 3, and on paper you can double a lot of internal numbers without seeing double the frame rate. What clicks here is pairing more (and hopefully smarter) compute with a plan to keep them busy at 4K with lots of ray queries and big textures. If AMD wants to go toe-to-toe with Nvidia’s “80-class,” this is where you start.

Real-world context: the high end is AMD’s missing chapter

Right now, AMD’s top RDNA 4 gaming card—the RX 9070 XT—trades punches with Nvidia’s 70-class and is great value in the right build. But there’s a void above it. Nvidia’s been dictating the halo narrative since the 4090, and even the 80-class has felt untouchable price-wise. Rumors that AMD’s RDNA 5 is targeting an “80-class” fight rather than a direct 90-class slugfest sound… sane. Win back mindshare where it’s still brutally competitive, not where every extra 5% performance costs a kidney in die area and power.

We’ve also seen this dance before: a generation lands big raster gains, RT still lags, AI upscaling moves the goalposts. If AMD’s flagship really is 50%+ over 9070 XT at stock—and there’s a rumored ~10% IPC/efficiency bump between like-for-like RDNA 4 and RDNA 5—that’s a recipe for comfortable 4K raster and “actually playable” RT settings without falling off a cliff. Whether it’s enough to match Nvidia’s latest RT hardware is another story, but the bandwidth and VRAM story would finally be unambiguously in AMD’s favor if these numbers are right.

Deep dive: what 96 CUs really implies (and what it doesn’t)

Let’s decode CUs in practical terms. Each compute unit houses SIMD arrays for shaders, dedicated ray traversal hardware, and those AI/Matrix accelerators AMD’s been tucking in since RDNA 3. Ninety-six of them is 50% more than the 64-CU 9070 XT. But a simple 96 vs 64 comparison is misleading unless we know three things:

Clock targets: If the flagship clocks higher and sustains it (big cooler, sane power), your effective throughput rises fast.
Front-end improvements: Dispatch, scheduling (the HWS block), and geometry/command throughput can bottleneck “paper TFLOPs.”
Cache/memory behavior: If the L2 and Infinity Cache are bigger/faster and you’ve got that wide bus, shader bubbles shrink.

The diagram places six CUs behind each Render Backend (RB), which loosely maps to ROP throughput—the bits that finalize pixels. That ratio matters for 4K fill rate and post-processing. If AMD balanced ROPs better this gen, you’re less likely to find yourself memory and ROP-bound at high resolutions with heavy AA and post effects. I’ve hit those walls on both my 7900 XTX and 4070 Ti Super, just in different games for different reasons.

Ray tracing and AI: the elephant, meet the room

AMD’s RT hardware has improved generation-over-generation, but Nvidia still enjoys a comfortable lead in many titles—especially those with heavy path tracing or denoising pipelines tuned for CUDA. More CUs means more RT blocks, but RT scaling isn’t linear with CU count; traversal efficiency, cache behavior, and BVH builder performance all matter. The bright side of a 512-bit bus is that RT’s worst-case behavior—lots of random memory fetches—hurts less. If RDNA 5 also brings a smarter RT pipeline and better denoisers in drivers, this could be the first Radeon in a while where turning on RT doesn’t feel like dropping a gear on the highway.

On the AI side, AMD’s “AI accelerators” per CU are real, but the game ecosystem largely optimizes for DLSS today. FSR has come a long way, and the newer iterations are much improved, but DLSS still wins in consistency and image stability in many titles. If RDNA 5 adds stronger matrix throughput and more robust software support, great—that can help FSR’s quality ceiling and future AI-assisted features. But this is the one area where I’ll keep my skepticism hat on until I see frame-time charts and image comparisons. AI features are as much about software shaders and model quality as they are about silicon.

VRAM: 32–48GB on a gaming card isn’t overkill anymore (for some of us)

My biggest personal bias: I like headroom. I run texture mods. I poke at Blender. I tinker with Stable Diffusion and local LLMs when the gaming backlog guilt gets too loud. On my 7900 XTX (24GB), I can load stupidly high-res assets without worrying; on my 4070 Ti Super (16GB), I sometimes have to dial back mods or watch frametime spikes as the driver shuffles memory. A 32GB (or even 48GB) Radeon is catnip for that kind of mixed workload.

For pure gaming at 1440p, 16GB is still fine. But path tracing, 4K with high-quality RT, and emerging texture standards are merciless. If AMD puts 32GB on its flagship and keeps bandwidth high, it changes the vibe of “max everything and go” builds. It also opens the door for prosumer crossover—content creators and AI hobbyists who balk at workstation pricing but need more than 16–20GB. I’ve hit 20–22GB in some Stable Diffusion graphs; 24GB works, but 32GB would be a no-brainer comfort zone.

Why the midrange diagrams are weird (and why I kind of love it)

AT2 allegedly has 40 CUs and just six memory controllers (so ~192-bit), while AT3 has fewer CUs (24) but more controllers (eight, so ~256-bit). That’s not the typical “more compute = wider bus” we see in clean product stacks. My read: AMD is experimenting with bandwidth-to-compute ratios per price tier to avoid the failure mode we’ve seen too often—midrange parts getting crushed at higher resolutions because their VRAM and bandwidth are an afterthought.

If AT3 really gets a 256-bit bus at the 24‑CU tier, it could be a sleeper pick for 1440p ultra with better minimums than you’d expect, especially in newer RT-forward titles. And if the rumor about LPDDR5X ever materialized (say, for mobile or ultra-budget desktop variants), a wide bus could offset LPDDR’s latency/bandwidth characteristics enough to make it viable. I’m not betting my build on that, but I can squint and see the product planning logic.

Could this be chiplet or monolithic? Does it matter?

I’ve learned to care less about the packaging religion and more about what the product lets me do. Chiplets can help with yield and cost; monolithic can help with latency and routing. The diagram calling out “unified memory controllers” doesn’t answer the packaging question either way. But the presence of 16 controllers implies a beefy board with lots of memory packages around the die—think 16 modules for 32GB (if 2GB chips) or 16 higher-density packages for 48GB. That’s a big-boy PCB and cooler no matter how you slice it.

Practical concern: 16 chips means more power and more heat spread across the board. Add in the likely 2.0–2.5GHz+ core clocks and you’re in triple-slot, heavy heatsink land, maybe with vapor chamber and a backplate that actually needs thermal pads. I don’t mind that; I just hope AMD nails fan acoustics and coil whine. Comparing my two rigs, the 7900 XTX I use is quieter under load, but the coil whine roulette is real on both sides of the aisle.

Power, connectors, and the “please just pick a standard” rant

I will die on this hill: I want AMD’s next high-end card to adopt the 12V‑2×6 connector cleanly, not a mix of daisy-chained 8-pins with adapters. The cable drama era needs to end, and tidy cable management in modern cases is way easier with a single, well-engineered power plug. If this rumored flagship is genuinely “80-class competitive,” it’ll pull enough power that proper connectors and quality cables matter for both safety and aesthetics. Less spaghetti, more sanity.

Driver and software: performance is half the battle

I’ve had solid experiences with AMD drivers the last couple of years, especially for raster performance and features like HYPR‑RX and Anti‑Lag. The AV1 encoder is finally where it needs to be for streaming, and ReLive is fine for most creators. But Nvidia still has the better “it just works” story in a few pro workflows and in RT-heavy games at launch. If AMD really brings a bandwidth monster to market, they need to back it with killer day-one profiles, consistent frame pacing, and FSR quality that closes the remaining gap to DLSS in the tough scenes (vegetation, high-frequency detail, thin geometry).

The other piece is game dev relations. A 512-bit bus and 32GB VRAM can let engines pursue higher-res textures and more ambitious RT techniques. But engines don’t auto-magically scale. If AMD seeds hardware and tooling early and loudly, we might see a wave of titles where Radeon owners get more than just “it runs.” I want to see AMD showing up in patch notes with explicit optimizations called out—less marketing, more measurable wins.

What this means for buyers: who should wait, who should buy now

If you’re sitting on an RX 6800 XT, RTX 3080, or newer and you’re happy at 1440p, I wouldn’t slam the brakes on your upgrade plans solely because of this rumor. Today’s 70/70‑Ti class cards are excellent at that resolution and often cheaper than launch. If you’re targeting 4K with RT enabled and you’re allergic to compromises, this is a “maybe wait” moment. A 32GB, bandwidth-rich Radeon could finally make 4K RT settings feel less like flipping a self-destruct switch, and it might put genuine price pressure on Nvidia’s next-gen stack.

For creators and AI hobbyists: if your work actually uses >16GB VRAM (you know who you are), the possibility of a 32GB flagship at a consumer price point is huge. Even if you don’t buy AMD, the existence of that card can move the whole market. I’ve done enough Blender cycles and SDXL workflows to know that extra VRAM doesn’t just make things “faster,” it makes them “possible” without resorting to tiling or CPU fallbacks.

Pricing and positioning: the only prediction I’ll risk

I won’t pretend to know AMD’s BOM or margins. But given the parts list implied by 16 memory controllers and 32–48GB VRAM, the flagship won’t be cheap. My gut says AMD aims at the “80-class” price tier rather than 90-class—think a premium price, but not moonshot. The midrange AT2 and AT3 feel like the make-or-break value plays. If AT3 ships with a 256-bit bus at mainstream pricing, a lot of gamers who are sick of VRAM roulette will flock there. AT4 is the volume card that needs to be efficient, quiet, and “just run eSports without drama.”

Potential pitfalls I’m watching for

Power draw creep: If the flagship chases Nvidia’s top TDPs, we’re back to diminishing returns and hot boxes. Bandwidth helps you avoid that—use it wisely.
RT still lagging: More CUs won’t fix everything. If RT hardware or denoisers don’t jump, Nvidia keeps the RT narrative.
Supply and yields: A big die (or complex chiplet) plus 16 high-speed memory chips is a recipe for launch scarcity if planning isn’t airtight.
Marketing vs. reality: Don’t turn “up to 48GB” into the default expectation. If it’s an expensive variant, say so early to avoid backlash.
Software fit: FSR quality and motion stability have to keep rising. Frame generation needs to be optional sweetener, not required medicine.

A note on the AT3 “24 vs 48 CU” confusion

This rumor thread gets tangled because AMD’s recent CUs are dual-issue. Some people see “48” when the diagram shows “24,” because they’re thinking about two instructions per clock per CU—functionally “doubling” parts of the CU. The clean way to think about it: count CUs as blocks, not issue lanes. Doubling issue width is great for throughput if the workload feeds it, but it’s not the same as doubling CUs with their own fixed-function units and caches. If AT3 truly has 24 CUs behind a 256-bit bus, it could still punch above its weight in bandwidth-bound games.

What I’d personally build if these leaks land

If the AT0 flagship really ships with 32GB VRAM and a 512-bit bus, that’s my next 4K build anchor—no question. I’d pair it with a high-airflow case (think front mesh, at least two 140mm intakes), a quality 850–1000W PSU with native 12V‑2×6, and a CPU that won’t bottleneck in simulation-heavy games (a modern 8C/16T+ chip). If I’m targeting the best price-to-sanity ratio at 1440p, AT3 looks compelling on paper: fewer CUs but a wider bus for stable minimums, ideally packing 12–16GB VRAM. I’d throw that into a compact case with a 650–750W PSU and call it a day.

Why this leak feels different from the usual “big number” rumor

It’s the coherence. A lot of leaks scream “more cores!” and stop. This one whispers “more cores, yes—but also the memory system to feed them, and the backend to finish the pixels.” The diagram’s block-level balance—CUs per RB, central command/scheduler, the L2, the ring of memory controllers—reads like a team optimizing for practical next-gen gaming workloads, not just chasing synthetic benchmarks. That’s why I’m optimistically skeptical rather than eyeroll skeptical.

Memory types and the GDDR7 question (without overpromising)

Do I think this flagship needs GDDR7 to shine? Not necessarily. GDDR6 at 24Gbps on a 512-bit bus is already bonkers. GDDR7 would push it into “why are we even arguing” territory, but it’s not a requirement for a great card. What matters more is how the cache hierarchy, compression, and memory timings play together. If AMD nails the balance, you won’t be able to tell which memory type it uses from the seat of your chair—you’ll just notice the frame-time graphs are flat and the settings sliders stay to the right.

What this could do to the used market

If AMD shows up with a credible “80-class” challenger and aggressive midrange cards that don’t skimp on VRAM, the used market is going to get spicy. Expect a wave of 24GB 7900-series cards at better prices, which is great news for creators and modders. Nvidia owners hovering on the fence might wait to see how the RT story shakes out, but even then, competition tends to nudge prices and bundles. I still remember when 8GB was “enough” for 1440p and how quickly that changed once games had room to stretch.

The wishlist: little things that would make this launch feel different

A clear, honest naming scheme—no “XT of an XT” confusion.
FSR defaults that prioritize image stability, with advanced toggles for tinkerers.
Partner coolers that focus on acoustics first, RGB second.
Day-one, well-communicated RT improvements with real game examples.
Explicit VRAM configs per SKU. No stealth 8GB where 12GB should be the floor.

✓ PROS

+
Potential 512-bit bus with 16 memory controllers
+
32–48GB VRAM headroom
+
96 CUs with rumored IPC uplift
+
More balanced midrange (wider bus on lower-CU parts)
+
Realistic shot at Nvidia’s next “80-class” if priced right

✗ CONS

–
All unconfirmed; risk of overhype
–
Possible high power and large coolers
–
RT and AI software ecosystem may still favor Nvidia
–
Cost and supply complexity with 16 memory packages
–
Confusing CU counts (dual-issue vs physical) could muddy expectations

FAQ: fast answers to the obvious questions

Is the 512-bit bus confirmed?
Not at all. The diagrams show 16 memory controllers; if they’re 32-bit each, that’s 512-bit. Controller width isn’t confirmed.

Will it use GDDR7?
Unknown. Even fast GDDR6 would be monstrous on a 512-bit bus. GDDR7 would be icing.

Is 32–48GB VRAM realistic?
Yes, with 16 memory packages it’s plausible depending on chip densities. Whether that’s the default SKU or a premium variant is the real question.

How does 96 CUs compare to Nvidia’s cores?
Different architectures, different math. The 50%+ claim vs RX 9070 XT is the meaningful takeaway—not a 1:1 core count duel.

Should I wait?
If you want 4K with RT and lots of headroom, and you’re not in a rush, waiting makes sense. For 1440p today at a good price, current 70/70‑Ti class cards remain great buys.

Final thought

I’m not crowning a paper champion. But if these block diagrams are even directionally accurate, AMD is finally building the kind of GPU I’ve wanted from them since the first RDNA: compute that’s properly fed, VRAM that’s future-proof, and a product stack that looks tuned for the way we actually play and create in 2025. The specs are spicy; the execution is what matters. Your move, Radeon.

Pending (leak-based). Confidence in leak direction: Medium; in exact numbers: Low.
VERDICT

Cautiously optimistic. A 96‑CU, bandwidth-heavy RDNA 5 flagship with 32–48GB VRAM could put AMD back in the “80-class” fight and force Nvidia to sharpen its pencils. The midrange shapes up to be the real value story if the wider buses stick. All eyes on power, RT/software polish, and pricing.