DeepSeek-R1: does it conform to OSAID?

I would like to share an article from @markcollier about DeepSeek, highlighting in particular DeepSeek’s Open Source Week and how it’s an infrastructure success story.

While DeepSeek-R1 is not Open Source AI, the company behind it did Open Source many components and tools. Some highlights from the article:

  • FlashMLA: An efficient Multi-head Latent Attention decoding kernel optimized for Hopper GPUs, delivering up to 3000 GB/s for memory-bound operations and 580 TFLOPS for computation-bound operations on H800 SXM55.
  • DeepEP: An open-source library optimizing EP communication for Mixture of Experts (MoE) model training and inference, supporting efficient all-to-all communication for MoE models.
  • DeepGEMM: An FP8 GEMM library that focuses on FP8 precision, JIT compilation, and minimal dependencies, rivaling expert-tuned solutions for matrix operations.
  • 3FS: A high-performance parallel file system designed to supercharge data access for AI and big data applications, eliminating bottlenecks in data-intensive workflows.]

Today, we also saw the release DeepSeek-V3 using the MIT license (just like R1). More details about this release are coming…

3 Likes

For those interested, we’ve dug into the Deepseek family recently at the EU OSAI index: How Open is DeepSeek? | European Open Source AI Index

TL;DR: DeepSeek represents interesting technical developments, but it is at best “open weights”, not open source: crucial elements of the training process, pre-training data and instruction-tuning data remain behind closed doors. In our openness index, DeepSeek’s models end up somewhere in the middle of the pack: more open than some big name rivals, but too closed to enable serious auditing or scientific scrutiny. Read on for details.

The family is somewhat complex; here is our visual

5 Likes

Updating this topic with an article from Matt Asay:

Interesting to learn about OpenSeek by the Beijing Academy of Artificial Intelligence (BAAI):

From their GitHub profile:

OpenSeek is an open source project initiated by the Beijing Academy of Artificial Intelligence (BAAI), aiming to unite the global open source communities to drive collaborative innovation in algorithms, data and systems to develop next-generation models that surpass DeepSeek. Drawing inspiration from large model initiatives like Bigscience and OPT, the project is dedicated to building an independent open source algorithmic innovation system. Since the open sourcing of the DeepSeek model, academia has seen numerous algorithmic improvements and breakthroughs, but these innovations often lack complete code implementations, necessary computational resources, and high-quality data support. The OpenSeek project hopes to explore high-quality dataset construction mechanisms through uniting the open source community, promote open sourcing of the entire large model training pipeline, build innovative training and inference code to support various AI chips besides Nvidia, and promote independent technological innovation and application development.

Objectives of OpenSeek:

  • Innovative data synthesis technology: Address the challenge of acquiring high-quality data and break through data barriers.
  • Support for multiple AI chips: Reduce dependency on specific chips and improve model universality and adaptability.
  • Build an independent open source algorithmic innovation system: Promote independent algorithmic innovation and technology sharing through open source collaboration.

License Agreement:

  • Code is licensed under Apache 2.0
  • Model weights are licensed under Apache 2.0
  • Data is licensed under CC BY-SA 4.0

Seems like OpenSeek would meet the OSAID. What are your thoughts?

Interesting initiative. There is no model yet; pretraining has only just begun. Unsure whether dataset sharing or information will be up to standards. If you drill down the repo, this is all they share at this point:

2 Likes

Speaking of DeepSeek infrastructure, they just published a little more details… Unsurprisingly, they say:

While we initially considered open-sourcing our full internal inference engine, we identified several challenges

which is what I’ve heard from other groups. It’s a pattern we’ve seen in software, too: the groups that work in private repositories and release only at the end of their process will always have a hard time making their code useful for public use and collaboration.

The fact that DeepSeek recognizes the issue and are willing to work to reconcile their forks and release more of their code is a good sign though.

1 Like