Exciting news! TCMS official website is live! Offering full-stack software services including enterprise-level custom R&D, App and mini-program development, multi-system integration, AI, blockchain, and embedded development, empowering digital-intelligent transformation across industries. Visit dev.tekin.cn to discuss cooperation!
This guide unlocks AI model naming logic, breaking down the core-to-detail universal structure and defining all mainstream tags (capability, training, quantization, language). It compares naming standards across open-source platforms, quantized model types and top vendors, and offers practical interpretation of typical model names—empowering seamless model selection and local deployment ...
In daily scenarios like local AI model deployment and open-source project collaboration, we often encounter a bewildering array of model names: google_gemma-3-270m-it-qat-Q4_K_M.gguf, Qwen3-VL-8B-Instruct, DeepSeek-R1-Distill-Llama-8B-UD-IQ1_M.gguf...
Tags such as it, qat, Instruct and Q4_K_M in these names are not randomly assembled, but evolved universal naming conventions in the AI industry. Both open-source models on Hugging Face and ModelScope, and quantized models in the llama.cpp community, adhere to core naming logic.
All AI model names, regardless of vendor or format, follow a core-to-detail hierarchical structure—the earlier the field, the more critical it is; the later the field, the more it leans toward technical implementation details.
Vendor/Series - Version - Parameter Scale - Modality/Capability - Fine-tuning Type - Training Feature - Quantization/File Format
| Position | Core Field | Meaning | Examples |
|---|---|---|---|
| 1 | Vendor/Series | Model developer/ product series | Qwen (Alibaba Tongyi), Gemma (Google), Llama (Meta) |
| 2 | Version | Model iteration version | 3, 3.1, R1, Base |
| 3 | Parameter Scale | Core parameter size of the model | 270m (270 million), 8B (8 billion), 70B (70 billion) |
| 4 | Modality/Capability | Core function/ applicable scenario of the model | VL (Vision-Language), Chat (Conversation), RAG (Retrieval-Augmented Generation) |
| 5 | Fine-tuning Type | Model alignment method/ optimization direction | Instruct (Instruction Tuned), Function (Function Calling) |
| 6 | Training Feature | Training/ compression technology | Distill (Distilled), MoE (Mixture of Experts), QAT (Quantization-Aware Training) |
| 7 | Quantization/File Format | Model compression method/ file type | Q4_K_M (4-bit quantization), GGUF (file format) |
K (thousand), M (million), B (billion)—no ambiguity;Instruct/Chat/VL must be placed forward to reflect the core value of the model;The following tags cover mainstream open-source and quantized models, categorized by usage scenarios with meanings, applicable scenarios and typical cases attached—beginners can refer to them directly.
These tags determine what a model can do and serve as the core basis for model selection, applicable across all vendors.
| Tag | Full Name | Core Meaning | Applicable Scenarios | Typical Cases |
|---|---|---|---|---|
| Base | Base Model | Pre-trained base model without instruction/conversation alignment | Secondary fine-tuning, domain adaptation | Qwen3-8B-Base, Llama-3-70B-Base |
| Instruct | Instruction Tuned | Instruction-fine-tuned model, adapted to execute human natural language instructions | General Q&A, tool calling, lightweight assistants | Qwen3-VL-8B-Instruct, Gemma-3-IT |
| IT | Instruction Tuned | Abbreviation for Instruct, with identical semantics | Google/Gemma series, open-source community quantized models | gemma-3-270m-it, gemma-2-9b-it |
| Chat | Chat Model | Optimized for multi-turn conversations, focusing on fluency and naturalness | Casual chat, customer service, companion assistants | Llama-3-8B-Chat, Qwen3-7B-Chat |
| Function | Function Calling | Optimized for tool/function calling, supporting API/code execution | AI agents, automation tools, API integration | Qwen-7B-Function, DeepSeek-Function |
| Tool | Tool Use | Synonymous with Function, focusing on tool usage capability | Plug-in ecosystem, cross-system collaboration | DeepSeek-Tool-LLM |
| RAG | Retrieval-Augmented Generation | Optimized for retrieval-augmented generation, adapted to knowledge base Q&A | Enterprise knowledge bases, document Q&A, information retrieval | Phi-3-Context-Obedient-RAG |
| VL | Vision-Language | Multimodal model supporting image-text understanding/generation | Visual Q&A, image captioning, visual tasks | Qwen3-VL-8B-Instruct, BLIP-2-VL |
| Vision | Vision Model | Pure vision model focusing on image/video understanding | Image classification, object detection, visual analysis | CLIP-Vision, ViT-B/32 |
| Code | Code Optimized | Specialized model for code generation/understanding | Programming assistance, code debugging, algorithm development | CodeLlama-7B-Code, DeepSeek-Coder |
| Math | Math Reasoning | Model optimized for mathematical/logical reasoning | Calculations, logical deduction, academic computing | DeepSeek-Math-7B, Qwen-Math-14B |
| Reasoning | Enhanced Reasoning | Strengthened reasoning capability, focusing on logical chain generation | Complex problem decomposition, multi-step reasoning | Llama-3-70B-Reasoning |
These tags reflect the model's training methods and architectural features, affecting model performance, size and inference efficiency.
| Tag | Full Name | Core Meaning | Features | Typical Cases |
|---|---|---|---|---|
| SFT | Supervised Fine-Tuning | Supervised fine-tuning, basic alignment method | Most commonly used, wide scenario adaptability | Qwen-7B-SFT, Llama-3-8B-SFT |
| DPO | Direct Preference Optimization | Preference alignment algorithm, optimizing generation quality | More natural than SFT, fewer hallucinations | DeepSeek-DPO-7B, Qwen-DPO-14B |
| ORPO | Odds Ratio Preference Optimization | Lightweight preference alignment with low training cost | Resource-friendly, performance close to DPO | Llama-3-8B-ORPO, Phi-3-ORPO |
| Distill | Distilled | Model distillation, compressing large models into small ones | Smaller size, faster inference, slight accuracy loss | DeepSeek-R1-Distill-Llama-8B |
| MoE | Mixture of Experts | Mixture of Experts architecture, high-efficiency large models | Large parameter scale but high inference efficiency, low cost | Qwen-14B-MoE, DeepSeek-MoE-32B |
| Merge | Merged Model | Model merging, fusing multiple models by communities/vendors | Integrates advantages of multiple models, adapted to diverse scenarios | Llama-3-Merge-8B, Qwen-Merge-14B |
| Context / 8k/32k/128k | Context Window | Context window length, maximum supported text length | Longer length enables processing more text | Phi-3-Context-128k, Llama-3-70B-32k |
These tags only exist in quantized model files (.gguf/.bin), following universal conventions of the llama.cpp community, and directly determine the cost and performance of local model deployment.
| Tag | Bit Width | Core Positioning | Precision Performance | Recommended Scenarios |
|---|---|---|---|---|
| FP32 | 32bit | Original full-precision model | Highest, no loss | Scientific research, benchmark testing; not recommended for local deployment |
| FP16 | 16bit | Half-precision model | High, slight loss | Inference on high-performance devices, benchmark comparison |
| Q8_0 | 8bit | High-fidelity quantization | Extremely high, close to FP16 | High-performance local deployment (sufficient VRAM/RAM) |
| Q6_K | 6bit | High-quality quantization | High, faster than Q8_0 | Mid-to-high-end devices, pursuing balance |
| Q5_K_M | 5bit | Optimal comprehensive quantization | Good, balanced speed and precision | Main choice for local deployment (16G+ RAM) |
| Q4_K_M | 4bit | Cost-effective quantization | Good, small size and fast speed | Mainstream local deployment (8G/16G RAM, first choice) |
| Q3_K_M | 3bit | Lightweight quantization | Medium, significantly compressed size | Low-config devices (4G/8G RAM) |
| IQ2_XXS / IQ1_M | 1-2bit | Extreme compression quantization | Low, obvious accuracy loss | Ultra-low-config devices (within 4G RAM), emergency use |
| Tag | Full Name | Core Meaning | Precision Advantages | Typical Cases |
|---|---|---|---|---|
| QAT | Quantization-Aware Training | Quantization-aware training, adapting to quantization during training | 10%-30% higher precision than regular quantization | gemma-3-270m-it-qat, Qwen-8B-qat-Q4_K_M |
| PTQ | Post-Training Quantization | Post-training quantization, quantizing after training completion | Regular precision, no additional training cost | Most open-source quantized models |
| Tag | Full Name | Core Meaning | Applicable Scenarios |
|---|---|---|---|
| GGUF | Generic GGUF Format | Standard format of llama.cpp, replacing the old GGML | Local inference (llama.cpp, ollama, etc.) |
| GGML | Generic GGML Format | Obsolete quantized format, phased out | Only compatible with old tools, not recommended for new deployments |
| .bin / .pth | PyTorch Model Format | Native PyTorch model format | Secondary development, fine-tuning; not suitable for direct local inference |
These tags mark the model's language adaptability and domain adaptation direction, enabling rapid screening of scenario-specific models.
| Tag | Meaning | Typical Cases |
|---|---|---|
| zh / Chinese | Chinese-optimized model | Qwen3-8B-Instruct-zh, Llama-3-8B-Chinese |
| en | English-optimized model | Llama-3-70B-en, Gemma-2-9B-en |
| Multi / Multilingual | Multilingual model | Qwen3-14B-Multi, Mistral-7B-Multilingual |
| General / UD | General-domain model | DeepSeek-R1-Distill-Llama-8B-UD, Qwen-7B-General |
| Domain / Law/Med/Fina | Vertical-domain model (Law/Medical/Finance) | Qwen-Law-7B, Med-Alpaca-13B |
| Abbreviation | Full Name | Meaning |
|---|---|---|
| IT | Instruction Tuned | Instruction Tuned |
| SFT | Supervised Fine-Tuning | Supervised Fine-Tuning |
| DPO | Direct Preference Optimization | Direct Preference Optimization |
| ORPO | Odds Ratio Preference Optimization | Odds Ratio Preference Optimization |
| QAT | Quantization-Aware Training | Quantization-Aware Training |
| PTQ | Post-Training Quantization | Post-Training Quantization |
| MoE | Mixture of Experts | Mixture of Experts |
| VL | Vision-Language | Vision-Language |
| RAG | Retrieval-Augmented Generation | Retrieval-Augmented Generation |
| UD | Universal Domain | Universal Domain (Exclusive to DeepSeek) |
Naming standards vary slightly across different platforms and model types, but core tags are universal. The following is a comparison of standards for mainstream scenarios to help you avoid pitfalls in model selection across platforms.
| Standard Dimension | Hugging Face | ModelScope | Common Ground |
|---|---|---|---|
| Core Structure | Vendor/Author/Project - Version - Parameter Scale - Capability - Fine-tuning Type | Vendor/Series - Version - Parameter Scale - Capability - Fine-tuning Type | Adheres to core-to-detail structure; universal parameter scale and capability tags |
| Naming Separator | Mostly -, some _ | Mostly _, compatible with - | No mandatory separator rules; no impact on semantic understanding |
| Quantization Tags | Only in quantized models (.gguf), placed at the end | Only in quantized models (.gguf), placed at the end | Identical position and meaning of quantization tags |
| Special Tags | Many community-customized tags (e.g., -chatml) | Unified Alibaba-series tags (e.g., -instruct) | Universal core function tags (Instruct/VL) |
| Example | meta-llama/Llama-3-8B-Instruct | qwen/Qwen3-VL-8B-Instruct | Identical structure and core tags |
| Standard Dimension | GGUF Quantized Models (llama.cpp) | Regular Quantized Models (.bin/.pth) | Common Ground |
|---|---|---|---|
| Core Structure | Model Name-Capability-Quantization Method-Quantization Level.gguf | Model Name-Capability-Quantization Type.bin | Universal core function tags; quantization tags exclusive to quantized models |
| Quantization Tags | Includes llama.cpp-exclusive tags (e.g., Q4_K_M/IQ1_M) | Includes training quantization tags (e.g., QAT/PTQ) | Universal quantization training tags (QAT/PTQ) |
| Separator | Unified - for all modules | Mostly -, some _ | Separator has no impact on semantics; unambiguous core tags |
| Example | google_gemma-3-270m-it-qat-Q4_K_M.gguf | Qwen-8B-qat-Instruct.bin | Identical semantics of core function tags (it/Instruct) |
Different vendors have minor naming habits, but core tags are universal for quick adaptation in model selection.
| Vendor | Core Series | Naming Habits | Typical Cases |
|---|---|---|---|
| Alibaba (Tongyi) | Qwen | Mostly uses Instruct/VL; parameter scale in B/M | Qwen3-VL-8B-Instruct, Qwen-7B-Function |
| Gemma | Mostly uses IT instead of Instruct; concise version numbers | Gemma-3-270m-it, Gemma-2-9B-it-Q5_K_M | |
| Meta | Llama | Clear version numbers (3/3.1); distinct Chat/Instruct tags | Llama-3-8B-Chat, Llama-3-70B-Instruct |
| DeepSeek | DeepSeek | Mostly uses Distill/UD/MoE; focuses on inference optimization | DeepSeek-R1-Distill-Llama-8B, DeepSeek-MoE-32B |
| Microsoft | Phi | Mostly uses Context/Obedient/RAG; lightweight design | Phi-3-Context-Obedient-RAG-Q4_K_M |
Combined with the above standards, we analyze 3 high-frequency model names to help you quickly master the logic of naming interpretation—understand at a glance.
Conclusion: A 270M parameter instruction-tuned quantized model from Google's Gemma 3 series, optimized with Quantization-Aware Training and 4-bit quantization—ideal for low-config local deployment.
Conclusion: An 8B parameter image-text multimodal instruction model from Alibaba's Tongyi Qwen 3 series, supporting visual Q&A and image captioning—ready for direct deployment.
Conclusion: An 8B parameter general-domain quantized model from DeepSeek's R1 series, distilled based on Llama and with 1-bit extreme quantization—ultra-compact for lightweight local deployment.
