Nvidia’s Nemotron License U-Turn: What the Removal of “Rug-Pull” Clauses Means for Open-Source AI

*After community pushback and mounting competition from Chinese models, Nvidia quietly updated the license for its flagship open-weight model—removing provisions that made production deployment legally risky.*

—

The Problem No One Wanted to Talk About

When Nvidia released Nemotron Super 3 122B A12B earlier this year, the model landed near the top of open-weight leaderboards. A 122-billion-parameter Mixture-of-Experts architecture with only 12 billion active parameters per forward pass, it delivered serious performance at a fraction of the inference cost of dense models.

But buried in the fine print was a problem: the Nvidia Open Model License.

The license included clauses that gave Nvidia broad rights to revoke usage. One provision terminated your license if you filed patent litigation. Another gave Nvidia leverage over modifications and guardrails. For companies considering production deployment, these “rug-pull” clauses were a non-starter. No sane engineering team ships production workloads under a license that could evaporate overnight.

So while the model impressed on benchmarks, it sat largely unused in serious applications. The community wanted to adopt it. The license said otherwise.

—

What Changed This Week

On March 15, 2026, Nvidia quietly pushed an update to the Nemotron license page. The new “Nvidia Nemotron Open Model License” replaces the original “Nvidia Open Model License,” and the differences are substantive:

Removed from the new license:

Modification restrictions that made it unclear whether fine-tuning or model merging was permitted
Guardrail requirements that tied your hands on safety layer modifications
Branding and attribution demands that complicated derivative model releases
Termination clauses triggered by patent litigation

What remains:

Basic use restrictions for prohibited applications
Standard warranty disclaimers
Traditional copyright terms

In practical terms, the new license behaves much closer to Apache 2.0 than the restrictive custom license it replaced. You can fine-tune it. You can abliterate safety layers. You can merge it with other models. You can ship it in production without a legal team holding their breath.

—

Why This Matters: The Production Question

The Reddit thread that broke this story drew a sharp response from one engineer: “The rug-pull clause removal is the part that actually matters for production. The old license basically said Nvidia could revoke your permission to use the weights if they decided you violated some vague terms—nobody sane ships production workloads under those conditions.”

This captures the core issue. Open weights aren’t truly open if the license contains time bombs. Teams evaluating infrastructure investments need certainty. A model that performs well but carries legal risk gets passed over.

The new license removes that risk. For the first time, Nemotron becomes a viable option for:

Fine-tuned variants: Companies can now train custom versions without legal ambiguity about whether modifying the model violates the license.

Abliterated models: The safety layer restrictions are gone, enabling the creation of uncensored variants for legitimate use cases like creative writing or research.

Model merging: The community can combine Nemotron with other open weights to create hybrid models—an increasingly popular technique for improving performance.

Production deployment: Engineering teams can ship with confidence that their license won’t be terminated over a contract dispute or changing corporate strategy.

—

The Competition Factor

The timing is hard to ignore. Qwen 3.5 has been dominating the open-weight leaderboards for months, with no such license restrictions attached. GLM-5 and Kimi 2.5 are climbing the ranks. Chinese model labs have embraced permissive licensing as a competitive advantage—and it’s working.

One community member put it bluntly: “Competition works. If Llama 4 ships with similar restrictions I expect the same outcome.”

This reads as a concession to market pressure. Nvidia likely saw adoption stalling while unrestricted alternatives gained ground. The license update isn’t charity—it’s a strategic correction. But that doesn’t make it less valuable for the community.

What’s particularly notable is the speed of the reversal. The original license drew criticism within days of the model’s release. Within weeks, community discussions highlighted production deployment concerns. Nvidia’s response—a complete license rewrite rather than minor clarifications—suggests they were paying attention to the feedback loop.

This creates an interesting precedent. When enough potential users flag licensing concerns, vendors may actually listen. It’s a reminder that adoption is the currency that matters in the open-weights ecosystem. Models without users, regardless of benchmark performance, become irrelevant.

—

Nemotron Super 3 122B: Technical Deep Dive

For teams now reconsidering Nemotron, here’s what you’re working with:

Architecture: Mixture-of-Experts with 122 billion total parameters, 12 billion active per inference. This MoE design means you get the capacity of a massive model while only paying the compute cost of a much smaller one during inference.

Quantization support: The model ships in BF16, FP8, and NVFP4 variants. The NVFP4 version weighs in at approximately 61GB—still requiring serious hardware, but accessible to professional workstations.

Performance positioning: Community benchmarks place it competitively with Qwen-3.5-27B and GPT-5 mini in coding and reasoning tasks. It’s not the absolute top of the charts, but it’s in the conversation.

Hardware requirements: You’ll need at least 61GB of combined memory to run the NVFP4 variant. That translates to either multiple high-end GPUs (RTX 6000 Pro, A6000) or creative memory management solutions (more on that below).

—

Parallel Innovation: GreenBoost and the vRAM Problem

While Nvidia sorted out its licensing, the open-source community was tackling a related bottleneck: running large models on consumer hardware.

On the same day the license news broke, another project called GreenBoost emerged. It’s an open-source Linux kernel module that extends GPU memory with system RAM and NVMe storage—a CUDA caching layer that makes larger models practical on smaller GPUs.

How it works: The kernel module allocates pinned DDR4 pages using the buddy allocator and exports them as DMA-BUF file descriptors. The GPU imports these as CUDA external memory. From the CUDA runtime’s perspective, those pages look like device-accessible memory—it doesn’t know they live in system RAM. The PCIe 4.0 x16 link handles data movement at roughly 32 GB/s.

The use case: The developer wanted to run a 31.8GB model (GLM-4.7-flash:q8_0) on a GeForce RTX 5070 with 12GB of VRAM. Traditional offloading worked but tanked token throughput because system memory lacked CUDA coherence. Smaller quantization maintained speed but sacrificed quality.

GreenBoost sits between those options—using system memory without the performance cliff of naive offloading. It’s experimental and GPLv2 licensed, available on GitLab.

This matters because it’s part of a broader pattern: the community filling gaps that hardware vendors leave open. Nvidia builds the GPUs. The community builds the software to use them efficiently. The combination is what makes local LLM deployment practical.

—

What This Means For Your Stack

If you’re building AI infrastructure, these developments shift the calculus in several ways:

1. Re-evaluate Nemotron for production. The license risk is gone. If you passed on it before, take another look—especially if you’re already invested in the Nvidia ecosystem. The model’s MoE architecture makes it particularly attractive for latency-sensitive applications where you want the quality of a large model without the inference cost.

2. Budget for memory, not just compute. MoE models like Nemotron are efficient at inference, but they’re still large. Factor in system RAM and storage when sizing deployments. The gap between “can run” and “runs well” is often 2-3x the model size in practice.

3. Watch the open-source tooling ecosystem. Projects like GreenBoost are force multipliers—they let you run bigger models on the hardware you already have. The best infrastructure investments aren’t always in newer GPUs; sometimes they’re in software that unlocks more value from existing hardware.

4. Pay attention to license changes. The trend is toward more permissive terms as companies compete for adoption. Don’t assume today’s restrictions are permanent. Set up monitoring for license updates on any model you depend on.

5. Consider the hybrid deployment model. With better licensing and tools for memory expansion, hybrid approaches become more viable—running inference locally for sensitive workloads while using cloud APIs for burst capacity. Nemotron’s new license makes this architecture legally straightforward.

—

Remaining Caveats

This isn’t a clean victory. A few points worth noting:

The model is still large. Even with the license fixed, Nemotron requires serious hardware. The 61GB NVFP4 variant isn’t running on your gaming laptop.

Nvidia’s motivations are strategic. This isn’t altruism; it’s a response to competitive pressure. The company could have held the line.

Other models still have restrictive licenses. This sets a precedent, but it doesn’t solve the broader problem of open-weight models with custom licenses that contain hidden landmines.

GreenBoost is experimental. It’s a promising direction, but not production-ready. Test thoroughly before depending on it.

—

Implementation Checklist

If you’re considering deploying Nemotron under the new license:

– [ ] Review the license yourself. Read both versions (old and new) to understand what changed. Links are available in the original Reddit thread.

– [ ] Verify your hardware. Calculate whether your GPU(s) + system memory can handle the model variant you need.

– [ ] Test with your actual workload. Benchmark on your specific use case—general leaderboard rankings may not reflect your domain.

– [ ] Check derivative model compatibility. If you’re planning fine-tunes or merges, confirm the license terms support your intended use.

– [ ] Document your compliance. Keep records of the license version in effect when you downloaded the weights.

– [ ] Monitor for future changes. Nvidia could update the license again. Stay subscribed to relevant channels.

—

The Bigger Picture

What we’re seeing is the open-source AI ecosystem maturing. Two years ago, “open weights” meant Meta releasing Llama and hoping for the best. Today, there’s a sophisticated understanding of what real openness requires: not just downloadable weights, but licenses that permit actual use.

Nvidia’s course correction is evidence that community pressure works. The original license drew criticism. Adoption lagged. Competitors with better terms gained ground. Nvidia responded. This is how the system should function.

The combination of better licensing and better tooling (like GreenBoost) points toward a future where running frontier-quality models locally becomes routine. We’re not there yet, but we’re closer than we were last week.

—

FAQ

Can I use Nemotron for commercial products now?

Yes. The new license removes the most problematic restrictions for production use. Standard corporate due diligence still applies.

What’s the difference between the old and new licenses?

The old license included modification restrictions, guardrail requirements, branding demands, and termination clauses. The new license removes these while retaining basic use restrictions and disclaimers.

Can I create uncensored/abliterated variants?

The modification restrictions are gone, which removes the legal ambiguity around safety layer modifications. However, you’re still responsible for how you deploy such variants.

Will this license apply to future Nvidia models?

Unknown. This is a model-specific license update. Nvidia could release future models under different terms.

Is GreenBoost ready for production?

No. It’s marked as experimental. Use it for testing and development, but validate thoroughly before production deployment.

What hardware do I need to run Nemotron?

For the NVFP4 variant (~61GB), you need at least that much combined GPU memory. Options include multiple RTX 6000 Pro/A6000 GPUs or memory expansion solutions like GreenBoost.

—

References

Reddit r/LocalLLaMA: Nvidia updated the Nemotron Super 3 122B A12B license — https://reddit.com/r/LocalLLaMA/comments/1rue6tn/
Nvidia Open Model License (original) — https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
Nvidia Nemotron Open Model License (new) — https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-nemotron-open-model-license/
Phoronix: Open-Source GreenBoost Driver — https://www.phoronix.com/news/Open-Source-GreenBoost-NVIDIA
GreenBoost GitLab Repository — https://gitlab.com/IsolatedOctopi/nvidia_greenboost
NVIDIA Developer Forums: GreenBoost announcement — https://forums.developer.nvidia.com/t/nvidia-greenboost-kernel-modules-opensourced/363486