Home
Enterprise & Ecosystem
AI Model Naming Conventions: A Complete Guide to Tags, Standards and Industry Best Practices

AI Model Naming Conventions: A Complete Guide to Tags, Standards and Industry Best Practices

Open Source & AI Applications Tech Vision & Selection Enterprise & Ecosystem

2026-03-17 15 mins read

This guide unlocks AI model naming logic, breaking down the core-to-detail universal structure and defining all mainstream tags (capability, training, quantization, language). It compares naming standards across open-source platforms, quantized model types and top vendors, and offers practical interpretation of typical model names—empowering seamless model selection and local deployment ...

In daily scenarios like local AI model deployment and open-source project collaboration, we often encounter a bewildering array of model names: google_gemma-3-270m-it-qat-Q4_K_M.gguf, Qwen3-VL-8B-Instruct, DeepSeek-R1-Distill-Llama-8B-UD-IQ1_M.gguf...

Tags such as it, qat, Instruct and Q4_K_M in these names are not randomly assembled, but evolved universal naming conventions in the AI industry. Both open-source models on Hugging Face and ModelScope, and quantized models in the llama.cpp community, adhere to core naming logic.

1. Core Consensus: Universal Naming Structure of AI Models

All AI model names, regardless of vendor or format, follow a core-to-detail hierarchical structure—the earlier the field, the more critical it is; the later the field, the more it leans toward technical implementation details.

Universal Structure Formula

Vendor/Series - Version - Parameter Scale - Modality/Capability - Fine-tuning Type - Training Feature - Quantization/File Format

Structure Breakdown (Prioritized)

Position	Core Field	Meaning	Examples
1	Vendor/Series	Model developer/ product series	Qwen (Alibaba Tongyi), Gemma (Google), Llama (Meta)
2	Version	Model iteration version	3, 3.1, R1, Base
3	Parameter Scale	Core parameter size of the model	270m (270 million), 8B (8 billion), 70B (70 billion)
4	Modality/Capability	Core function/ applicable scenario of the model	VL (Vision-Language), Chat (Conversation), RAG (Retrieval-Augmented Generation)
5	Fine-tuning Type	Model alignment method/ optimization direction	Instruct (Instruction Tuned), Function (Function Calling)
6	Training Feature	Training/ compression technology	Distill (Distilled), MoE (Mixture of Experts), QAT (Quantization-Aware Training)
7	Quantization/File Format	Model compression method/ file type	Q4_K_M (4-bit quantization), GGUF (file format)

Key Rules

Global unified units for parameter scale: K (thousand), M (million), B (billion)—no ambiguity;
Core function tags take precedence: Instruct/Chat/VL must be placed forward to reflect the core value of the model;
Quantization tags only appear at the end of filenames: Exclusive to quantized models (.gguf/.bin), absent in unquantized models.

2. Comprehensive Explanation of Universal Tags (Industry Consensus Version)

The following tags cover mainstream open-source and quantized models, categorized by usage scenarios with meanings, applicable scenarios and typical cases attached—beginners can refer to them directly.

2.1 Core Capability/Scenario Tags (Most Critical)

These tags determine what a model can do and serve as the core basis for model selection, applicable across all vendors.

Tag	Full Name	Core Meaning	Applicable Scenarios	Typical Cases
Base	Base Model	Pre-trained base model without instruction/conversation alignment	Secondary fine-tuning, domain adaptation	Qwen3-8B-Base, Llama-3-70B-Base
Instruct	Instruction Tuned	Instruction-fine-tuned model, adapted to execute human natural language instructions	General Q&A, tool calling, lightweight assistants	Qwen3-VL-8B-Instruct, Gemma-3-IT
IT	Instruction Tuned	Abbreviation for Instruct, with identical semantics	Google/Gemma series, open-source community quantized models	gemma-3-270m-it, gemma-2-9b-it
Chat	Chat Model	Optimized for multi-turn conversations, focusing on fluency and naturalness	Casual chat, customer service, companion assistants	Llama-3-8B-Chat, Qwen3-7B-Chat
Function	Function Calling	Optimized for tool/function calling, supporting API/code execution	AI agents, automation tools, API integration	Qwen-7B-Function, DeepSeek-Function
Tool	Tool Use	Synonymous with Function, focusing on tool usage capability	Plug-in ecosystem, cross-system collaboration	DeepSeek-Tool-LLM
RAG	Retrieval-Augmented Generation	Optimized for retrieval-augmented generation, adapted to knowledge base Q&A	Enterprise knowledge bases, document Q&A, information retrieval	Phi-3-Context-Obedient-RAG
VL	Vision-Language	Multimodal model supporting image-text understanding/generation	Visual Q&A, image captioning, visual tasks	Qwen3-VL-8B-Instruct, BLIP-2-VL
Vision	Vision Model	Pure vision model focusing on image/video understanding	Image classification, object detection, visual analysis	CLIP-Vision, ViT-B/32
Code	Code Optimized	Specialized model for code generation/understanding	Programming assistance, code debugging, algorithm development	CodeLlama-7B-Code, DeepSeek-Coder
Math	Math Reasoning	Model optimized for mathematical/logical reasoning	Calculations, logical deduction, academic computing	DeepSeek-Math-7B, Qwen-Math-14B
Reasoning	Enhanced Reasoning	Strengthened reasoning capability, focusing on logical chain generation	Complex problem decomposition, multi-step reasoning	Llama-3-70B-Reasoning

2.2 Training/Compression Technology Tags

These tags reflect the model's training methods and architectural features, affecting model performance, size and inference efficiency.

Tag	Full Name	Core Meaning	Features	Typical Cases
SFT	Supervised Fine-Tuning	Supervised fine-tuning, basic alignment method	Most commonly used, wide scenario adaptability	Qwen-7B-SFT, Llama-3-8B-SFT
DPO	Direct Preference Optimization	Preference alignment algorithm, optimizing generation quality	More natural than SFT, fewer hallucinations	DeepSeek-DPO-7B, Qwen-DPO-14B
ORPO	Odds Ratio Preference Optimization	Lightweight preference alignment with low training cost	Resource-friendly, performance close to DPO	Llama-3-8B-ORPO, Phi-3-ORPO
Distill	Distilled	Model distillation, compressing large models into small ones	Smaller size, faster inference, slight accuracy loss	DeepSeek-R1-Distill-Llama-8B
MoE	Mixture of Experts	Mixture of Experts architecture, high-efficiency large models	Large parameter scale but high inference efficiency, low cost	Qwen-14B-MoE, DeepSeek-MoE-32B
Merge	Merged Model	Model merging, fusing multiple models by communities/vendors	Integrates advantages of multiple models, adapted to diverse scenarios	Llama-3-Merge-8B, Qwen-Merge-14B
Context / 8k/32k/128k	Context Window	Context window length, maximum supported text length	Longer length enables processing more text	Phi-3-Context-128k, Llama-3-70B-32k

2.3 Quantization/File Format Tags (Exclusive to GGUF/llama.cpp)

These tags only exist in quantized model files (.gguf/.bin), following universal conventions of the llama.cpp community, and directly determine the cost and performance of local model deployment.

2.3.1 Quantization Level Tags (From Highest to Lowest Precision)

Tag	Bit Width	Core Positioning	Precision Performance	Recommended Scenarios
FP32	32bit	Original full-precision model	Highest, no loss	Scientific research, benchmark testing; not recommended for local deployment
FP16	16bit	Half-precision model	High, slight loss	Inference on high-performance devices, benchmark comparison
Q8_0	8bit	High-fidelity quantization	Extremely high, close to FP16	High-performance local deployment (sufficient VRAM/RAM)
Q6_K	6bit	High-quality quantization	High, faster than Q8_0	Mid-to-high-end devices, pursuing balance
Q5_K_M	5bit	Optimal comprehensive quantization	Good, balanced speed and precision	Main choice for local deployment (16G+ RAM)
Q4_K_M	4bit	Cost-effective quantization	Good, small size and fast speed	Mainstream local deployment (8G/16G RAM, first choice)
Q3_K_M	3bit	Lightweight quantization	Medium, significantly compressed size	Low-config devices (4G/8G RAM)
IQ2_XXS / IQ1_M	1-2bit	Extreme compression quantization	Low, obvious accuracy loss	Ultra-low-config devices (within 4G RAM), emergency use

2.3.2 Quantization Training Tags

Tag	Full Name	Core Meaning	Precision Advantages	Typical Cases
QAT	Quantization-Aware Training	Quantization-aware training, adapting to quantization during training	10%-30% higher precision than regular quantization	gemma-3-270m-it-qat, Qwen-8B-qat-Q4_K_M
PTQ	Post-Training Quantization	Post-training quantization, quantizing after training completion	Regular precision, no additional training cost	Most open-source quantized models

2.3.3 File Format Tags

Tag	Full Name	Core Meaning	Applicable Scenarios
GGUF	Generic GGUF Format	Standard format of llama.cpp, replacing the old GGML	Local inference (llama.cpp, ollama, etc.)
GGML	Generic GGML Format	Obsolete quantized format, phased out	Only compatible with old tools, not recommended for new deployments
.bin / .pth	PyTorch Model Format	Native PyTorch model format	Secondary development, fine-tuning; not suitable for direct local inference

2.4 Language/Domain Tags

These tags mark the model's language adaptability and domain adaptation direction, enabling rapid screening of scenario-specific models.

Tag	Meaning	Typical Cases
zh / Chinese	Chinese-optimized model	Qwen3-8B-Instruct-zh, Llama-3-8B-Chinese
en	English-optimized model	Llama-3-70B-en, Gemma-2-9B-en
Multi / Multilingual	Multilingual model	Qwen3-14B-Multi, Mistral-7B-Multilingual
General / UD	General-domain model	DeepSeek-R1-Distill-Llama-8B-UD, Qwen-7B-General
Domain / Law/Med/Fina	Vertical-domain model (Law/Medical/Finance)	Qwen-Law-7B, Med-Alpaca-13B

2.5 Essential Quick Reference for High-Frequency Industry Abbreviations

Abbreviation	Full Name	Meaning
IT	Instruction Tuned	Instruction Tuned
SFT	Supervised Fine-Tuning	Supervised Fine-Tuning
DPO	Direct Preference Optimization	Direct Preference Optimization
ORPO	Odds Ratio Preference Optimization	Odds Ratio Preference Optimization
QAT	Quantization-Aware Training	Quantization-Aware Training
PTQ	Post-Training Quantization	Post-Training Quantization
MoE	Mixture of Experts	Mixture of Experts
VL	Vision-Language	Vision-Language
RAG	Retrieval-Augmented Generation	Retrieval-Augmented Generation
UD	Universal Domain	Universal Domain (Exclusive to DeepSeek)

3. Core Scenarios: Comparison of Universal Naming Standards

Naming standards vary slightly across different platforms and model types, but core tags are universal. The following is a comparison of standards for mainstream scenarios to help you avoid pitfalls in model selection across platforms.

3.1 Open-Source Platform Standard Comparison (Hugging Face vs ModelScope)

Standard Dimension	Hugging Face	ModelScope	Common Ground
Core Structure	Vendor/Author/Project - Version - Parameter Scale - Capability - Fine-tuning Type	Vendor/Series - Version - Parameter Scale - Capability - Fine-tuning Type	Adheres to core-to-detail structure; universal parameter scale and capability tags
Naming Separator	Mostly `-`, some `_`	Mostly `_`, compatible with `-`	No mandatory separator rules; no impact on semantic understanding
Quantization Tags	Only in quantized models (.gguf), placed at the end	Only in quantized models (.gguf), placed at the end	Identical position and meaning of quantization tags
Special Tags	Many community-customized tags (e.g., `-chatml`)	Unified Alibaba-series tags (e.g., `-instruct`)	Universal core function tags (Instruct/VL)
Example	meta-llama/Llama-3-8B-Instruct	qwen/Qwen3-VL-8B-Instruct	Identical structure and core tags

3.2 Quantized Model Standard Comparison (GGUF vs Regular Quantization)

Standard Dimension	GGUF Quantized Models (llama.cpp)	Regular Quantized Models (.bin/.pth)	Common Ground
Core Structure	Model Name-Capability-Quantization Method-Quantization Level.gguf	Model Name-Capability-Quantization Type.bin	Universal core function tags; quantization tags exclusive to quantized models
Quantization Tags	Includes llama.cpp-exclusive tags (e.g., `Q4_K_M/IQ1_M`)	Includes training quantization tags (e.g., `QAT/PTQ`)	Universal quantization training tags (QAT/PTQ)
Separator	Unified `-` for all modules	Mostly `-`, some `_`	Separator has no impact on semantics; unambiguous core tags
Example	google_gemma-3-270m-it-qat-Q4_K_M.gguf	Qwen-8B-qat-Instruct.bin	Identical semantics of core function tags (it/Instruct)

3.3 Vendor-Specific Naming Standard Comparison (Mainstream Manufacturers)

Different vendors have minor naming habits, but core tags are universal for quick adaptation in model selection.

Vendor	Core Series	Naming Habits	Typical Cases
Alibaba (Tongyi)	Qwen	Mostly uses `Instruct/VL`; parameter scale in `B/M`	Qwen3-VL-8B-Instruct, Qwen-7B-Function
Google	Gemma	Mostly uses `IT` instead of `Instruct`; concise version numbers	Gemma-3-270m-it, Gemma-2-9B-it-Q5_K_M
Meta	Llama	Clear version numbers (3/3.1); distinct `Chat/Instruct` tags	Llama-3-8B-Chat, Llama-3-70B-Instruct
DeepSeek	DeepSeek	Mostly uses `Distill/UD/MoE`; focuses on inference optimization	DeepSeek-R1-Distill-Llama-8B, DeepSeek-MoE-32B
Microsoft	Phi	Mostly uses `Context/Obedient/RAG`; lightweight design	Phi-3-Context-Obedient-RAG-Q4_K_M

4. Practical Analysis: Full Interpretation of Typical Model Names

Combined with the above standards, we analyze 3 high-frequency model names to help you quickly master the logic of naming interpretation—understand at a glance.

Case 1: google_gemma-3-270m-it-qat-Q4_K_M.gguf

Vendor/Series: Google Gemma 3
Parameter Scale: 270m (270 million parameters)
Fine-tuning Type: it (Instruction Tuned, ready for direct conversation)
Quantization Training: qat (Quantization-Aware Training, higher precision)
Quantization Level: Q4_K_M (4-bit quantization, first choice for local deployment)
File Format: GGUF (standard llama.cpp format)

Conclusion: A 270M parameter instruction-tuned quantized model from Google's Gemma 3 series, optimized with Quantization-Aware Training and 4-bit quantization—ideal for low-config local deployment.

Case 2: Qwen3-VL-8B-Instruct

Vendor/Series: Alibaba Tongyi Qwen 3
Modality: VL (Vision-Language multimodal)
Parameter Scale: 8B (8 billion parameters)
Fine-tuning Type: Instruct (Instruction Tuned, ready for direct image-text Q&A)

Conclusion: An 8B parameter image-text multimodal instruction model from Alibaba's Tongyi Qwen 3 series, supporting visual Q&A and image captioning—ready for direct deployment.

Case 3: DeepSeek-R1-Distill-Llama-8B-UD-IQ1_M.gguf

Vendor/Series: DeepSeek R1
Training Feature: Distill (Model Distillation, based on Llama architecture)
Parameter Scale: 8B (8 billion parameters)
Domain: UD (Universal Domain)
Quantization Level: IQ1_M (1-bit extreme quantization, ultra-small size)
File Format: GGUF

Conclusion: An 8B parameter general-domain quantized model from DeepSeek's R1 series, distilled based on Llama and with 1-bit extreme quantization—ultra-compact for lightweight local deployment.

#AI Large Model #AI #Industry Standards #Naming Conventions #Qwen #DeepSeek #Llama #llama.cpp

AI Model Naming Conventions: A Complete Guide to Tags, Standards and Industry Best Practices

1. Core Consensus: Universal Naming Structure of AI Models

Universal Structure Formula

Structure Breakdown (Prioritized)

Key Rules

2. Comprehensive Explanation of Universal Tags (Industry Consensus Version)

2.1 Core Capability/Scenario Tags (Most Critical)

2.2 Training/Compression Technology Tags

2.3 Quantization/File Format Tags (Exclusive to GGUF/llama.cpp)

2.3.1 Quantization Level Tags (From Highest to Lowest Precision)

2.3.2 Quantization Training Tags

2.3.3 File Format Tags

2.4 Language/Domain Tags

2.5 Essential Quick Reference for High-Frequency Industry Abbreviations

3. Core Scenarios: Comparison of Universal Naming Standards

3.1 Open-Source Platform Standard Comparison (Hugging Face vs ModelScope)

3.2 Quantized Model Standard Comparison (GGUF vs Regular Quantization)

3.3 Vendor-Specific Naming Standard Comparison (Mainstream Manufacturers)

4. Practical Analysis: Full Interpretation of Typical Model Names

Case 1: google_gemma-3-270m-it-qat-Q4_K_M.gguf

Case 2: Qwen3-VL-8B-Instruct

Case 3: DeepSeek-R1-Distill-Llama-8B-UD-IQ1_M.gguf

Subscribe our newsletter