Back to Search Start Over

DeepSeek-V3 Technical Report

Authors :
DeepSeek-AI
Liu, Aixin
Feng, Bei
Xue, Bing
Wang, Bingxuan
Wu, Bochao
Lu, Chengda
Zhao, Chenggang
Deng, Chengqi
Zhang, Chenyu
Ruan, Chong
Dai, Damai
Guo, Daya
Yang, Dejian
Chen, Deli
Ji, Dongjie
Li, Erhang
Lin, Fangyun
Dai, Fucong
Luo, Fuli
Hao, Guangbo
Chen, Guanting
Li, Guowei
Zhang, H.
Bao, Han
Xu, Hanwei
Wang, Haocheng
Zhang, Haowei
Ding, Honghui
Xin, Huajian
Gao, Huazuo
Li, Hui
Qu, Hui
Cai, J. L.
Liang, Jian
Guo, Jianzhong
Ni, Jiaqi
Li, Jiashi
Wang, Jiawei
Chen, Jin
Chen, Jingchang
Yuan, Jingyang
Qiu, Junjie
Li, Junlong
Song, Junxiao
Dong, Kai
Hu, Kai
Gao, Kaige
Guan, Kang
Huang, Kexin
Yu, Kuai
Wang, Lean
Zhang, Lecong
Xu, Lei
Xia, Leyi
Zhao, Liang
Wang, Litong
Zhang, Liyue
Li, Meng
Wang, Miaojun
Zhang, Mingchuan
Zhang, Minghua
Tang, Minghui
Li, Mingming
Tian, Ning
Huang, Panpan
Wang, Peiyi
Zhang, Peng
Wang, Qiancheng
Zhu, Qihao
Chen, Qinyu
Du, Qiushi
Chen, R. J.
Jin, R. L.
Ge, Ruiqi
Zhang, Ruisong
Pan, Ruizhe
Wang, Runji
Xu, Runxin
Zhang, Ruoyu
Chen, Ruyi
Li, S. S.
Lu, Shanghao
Zhou, Shangyan
Chen, Shanhuang
Wu, Shaoqing
Ye, Shengfeng
Ma, Shirong
Wang, Shiyu
Zhou, Shuang
Yu, Shuiping
Zhou, Shunfeng
Pan, Shuting
Wang, T.
Yun, Tao
Pei, Tian
Sun, Tianyu
Xiao, W. L.
Zeng, Wangding
Zhao, Wanjia
An, Wei
Liu, Wen
Liang, Wenfeng
Gao, Wenjun
Yu, Wenqin
Zhang, Wentao
Li, X. Q.
Jin, Xiangyue
Wang, Xianzu
Bi, Xiao
Liu, Xiaodong
Wang, Xiaohan
Shen, Xiaojin
Chen, Xiaokang
Zhang, Xiaokang
Chen, Xiaosha
Nie, Xiaotao
Sun, Xiaowen
Wang, Xiaoxiang
Cheng, Xin
Liu, Xin
Xie, Xin
Liu, Xingchao
Yu, Xingkai
Song, Xinnan
Shan, Xinxia
Zhou, Xinyi
Yang, Xinyu
Li, Xinyuan
Su, Xuecheng
Lin, Xuheng
Li, Y. K.
Wang, Y. Q.
Wei, Y. X.
Zhu, Y. X.
Zhang, Yang
Xu, Yanhong
Huang, Yanping
Li, Yao
Zhao, Yao
Sun, Yaofeng
Li, Yaohui
Wang, Yaohui
Yu, Yi
Zheng, Yi
Zhang, Yichao
Shi, Yifan
Xiong, Yiliang
He, Ying
Tang, Ying
Piao, Yishi
Wang, Yisong
Tan, Yixuan
Ma, Yiyang
Liu, Yiyuan
Guo, Yongqiang
Wu, Yu
Ou, Yuan
Zhu, Yuchen
Wang, Yuduan
Gong, Yue
Zou, Yuheng
He, Yujia
Zha, Yukun
Xiong, Yunfan
Ma, Yunxian
Yan, Yuting
Luo, Yuxiang
You, Yuxiang
Liu, Yuxuan
Zhou, Yuyang
Wu, Z. F.
Ren, Z. Z.
Ren, Zehui
Sha, Zhangli
Fu, Zhe
Xu, Zhean
Huang, Zhen
Zhang, Zhen
Xie, Zhenda
Zhang, Zhengyan
Hao, Zhewen
Gou, Zhibin
Ma, Zhicheng
Yan, Zhigang
Shao, Zhihong
Xu, Zhipeng
Wu, Zhiyu
Zhang, Zhongyu
Li, Zhuoshu
Gu, Zihui
Zhu, Zijia
Liu, Zijun
Li, Zilin
Xie, Ziwei
Song, Ziyang
Gao, Ziyi
Pan, Zizheng
Publication Year :
2024

Abstract

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at https://github.com/deepseek-ai/DeepSeek-V3.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2412.19437
Document Type :
Working Paper