Exciting news! TCMS official website is live! Offering full-stack software services including enterprise-level custom R&D, App and mini-program development, multi-system integration, AI, blockchain, and embedded development, empowering digital-intelligent transformation across industries. Visit dev.tekin.cn to discuss cooperation!

AI Model Naming Conventions: A Complete Guide to Tags, Standards and Industry Best Practices

2026-03-17 15 mins read

This guide unlocks AI model naming logic, breaking down the core-to-detail universal structure and defining all mainstream tags (capability, training, quantization, language). It compares naming standards across open-source platforms, quantized model types and top vendors, and offers practical interpretation of typical model names—empowering seamless model selection and local deployment ...

In daily scenarios like local AI model deployment and open-source project collaboration, we often encounter a bewildering array of model names: google_gemma-3-270m-it-qat-Q4_K_M.gguf, Qwen3-VL-8B-Instruct, DeepSeek-R1-Distill-Llama-8B-UD-IQ1_M.gguf...

Tags such as it, qat, Instruct and Q4_K_M in these names are not randomly assembled, but evolved universal naming conventions in the AI industry. Both open-source models on Hugging Face and ModelScope, and quantized models in the llama.cpp community, adhere to core naming logic.

1. Core Consensus: Universal Naming Structure of AI Models

All AI model names, regardless of vendor or format, follow a core-to-detail hierarchical structure—the earlier the field, the more critical it is; the later the field, the more it leans toward technical implementation details.

Universal Structure Formula

Vendor/Series - Version - Parameter Scale - Modality/Capability - Fine-tuning Type - Training Feature - Quantization/File Format

Structure Breakdown (Prioritized)

PositionCore FieldMeaningExamples
1Vendor/SeriesModel developer/ product seriesQwen (Alibaba Tongyi), Gemma (Google), Llama (Meta)
2VersionModel iteration version3, 3.1, R1, Base
3Parameter ScaleCore parameter size of the model270m (270 million), 8B (8 billion), 70B (70 billion)
4Modality/CapabilityCore function/ applicable scenario of the modelVL (Vision-Language), Chat (Conversation), RAG (Retrieval-Augmented Generation)
5Fine-tuning TypeModel alignment method/ optimization directionInstruct (Instruction Tuned), Function (Function Calling)
6Training FeatureTraining/ compression technologyDistill (Distilled), MoE (Mixture of Experts), QAT (Quantization-Aware Training)
7Quantization/File FormatModel compression method/ file typeQ4_K_M (4-bit quantization), GGUF (file format)

Key Rules

  1. Global unified units for parameter scale: K (thousand), M (million), B (billion)—no ambiguity;
  2. Core function tags take precedence: Instruct/Chat/VL must be placed forward to reflect the core value of the model;
  3. Quantization tags only appear at the end of filenames: Exclusive to quantized models (.gguf/.bin), absent in unquantized models.

2. Comprehensive Explanation of Universal Tags (Industry Consensus Version)

The following tags cover mainstream open-source and quantized models, categorized by usage scenarios with meanings, applicable scenarios and typical cases attached—beginners can refer to them directly.

2.1 Core Capability/Scenario Tags (Most Critical)

These tags determine what a model can do and serve as the core basis for model selection, applicable across all vendors.

TagFull NameCore MeaningApplicable ScenariosTypical Cases
BaseBase ModelPre-trained base model without instruction/conversation alignmentSecondary fine-tuning, domain adaptationQwen3-8B-Base, Llama-3-70B-Base
InstructInstruction TunedInstruction-fine-tuned model, adapted to execute human natural language instructionsGeneral Q&A, tool calling, lightweight assistantsQwen3-VL-8B-Instruct, Gemma-3-IT
ITInstruction TunedAbbreviation for Instruct, with identical semanticsGoogle/Gemma series, open-source community quantized modelsgemma-3-270m-it, gemma-2-9b-it
ChatChat ModelOptimized for multi-turn conversations, focusing on fluency and naturalnessCasual chat, customer service, companion assistantsLlama-3-8B-Chat, Qwen3-7B-Chat
FunctionFunction CallingOptimized for tool/function calling, supporting API/code executionAI agents, automation tools, API integrationQwen-7B-Function, DeepSeek-Function
ToolTool UseSynonymous with Function, focusing on tool usage capabilityPlug-in ecosystem, cross-system collaborationDeepSeek-Tool-LLM
RAGRetrieval-Augmented GenerationOptimized for retrieval-augmented generation, adapted to knowledge base Q&AEnterprise knowledge bases, document Q&A, information retrievalPhi-3-Context-Obedient-RAG
VLVision-LanguageMultimodal model supporting image-text understanding/generationVisual Q&A, image captioning, visual tasksQwen3-VL-8B-Instruct, BLIP-2-VL
VisionVision ModelPure vision model focusing on image/video understandingImage classification, object detection, visual analysisCLIP-Vision, ViT-B/32
CodeCode OptimizedSpecialized model for code generation/understandingProgramming assistance, code debugging, algorithm developmentCodeLlama-7B-Code, DeepSeek-Coder
MathMath ReasoningModel optimized for mathematical/logical reasoningCalculations, logical deduction, academic computingDeepSeek-Math-7B, Qwen-Math-14B
ReasoningEnhanced ReasoningStrengthened reasoning capability, focusing on logical chain generationComplex problem decomposition, multi-step reasoningLlama-3-70B-Reasoning

2.2 Training/Compression Technology Tags

These tags reflect the model's training methods and architectural features, affecting model performance, size and inference efficiency.

TagFull NameCore MeaningFeaturesTypical Cases
SFTSupervised Fine-TuningSupervised fine-tuning, basic alignment methodMost commonly used, wide scenario adaptabilityQwen-7B-SFT, Llama-3-8B-SFT
DPODirect Preference OptimizationPreference alignment algorithm, optimizing generation qualityMore natural than SFT, fewer hallucinationsDeepSeek-DPO-7B, Qwen-DPO-14B
ORPOOdds Ratio Preference OptimizationLightweight preference alignment with low training costResource-friendly, performance close to DPOLlama-3-8B-ORPO, Phi-3-ORPO
DistillDistilledModel distillation, compressing large models into small onesSmaller size, faster inference, slight accuracy lossDeepSeek-R1-Distill-Llama-8B
MoEMixture of ExpertsMixture of Experts architecture, high-efficiency large modelsLarge parameter scale but high inference efficiency, low costQwen-14B-MoE, DeepSeek-MoE-32B
MergeMerged ModelModel merging, fusing multiple models by communities/vendorsIntegrates advantages of multiple models, adapted to diverse scenariosLlama-3-Merge-8B, Qwen-Merge-14B
Context / 8k/32k/128kContext WindowContext window length, maximum supported text lengthLonger length enables processing more textPhi-3-Context-128k, Llama-3-70B-32k

2.3 Quantization/File Format Tags (Exclusive to GGUF/llama.cpp)

These tags only exist in quantized model files (.gguf/.bin), following universal conventions of the llama.cpp community, and directly determine the cost and performance of local model deployment.

2.3.1 Quantization Level Tags (From Highest to Lowest Precision)

TagBit WidthCore PositioningPrecision PerformanceRecommended Scenarios
FP3232bitOriginal full-precision modelHighest, no lossScientific research, benchmark testing; not recommended for local deployment
FP1616bitHalf-precision modelHigh, slight lossInference on high-performance devices, benchmark comparison
Q8_08bitHigh-fidelity quantizationExtremely high, close to FP16High-performance local deployment (sufficient VRAM/RAM)
Q6_K6bitHigh-quality quantizationHigh, faster than Q8_0Mid-to-high-end devices, pursuing balance
Q5_K_M5bitOptimal comprehensive quantizationGood, balanced speed and precisionMain choice for local deployment (16G+ RAM)
Q4_K_M4bitCost-effective quantizationGood, small size and fast speedMainstream local deployment (8G/16G RAM, first choice)
Q3_K_M3bitLightweight quantizationMedium, significantly compressed sizeLow-config devices (4G/8G RAM)
IQ2_XXS / IQ1_M1-2bitExtreme compression quantizationLow, obvious accuracy lossUltra-low-config devices (within 4G RAM), emergency use

2.3.2 Quantization Training Tags

TagFull NameCore MeaningPrecision AdvantagesTypical Cases
QATQuantization-Aware TrainingQuantization-aware training, adapting to quantization during training10%-30% higher precision than regular quantizationgemma-3-270m-it-qat, Qwen-8B-qat-Q4_K_M
PTQPost-Training QuantizationPost-training quantization, quantizing after training completionRegular precision, no additional training costMost open-source quantized models

2.3.3 File Format Tags

TagFull NameCore MeaningApplicable Scenarios
GGUFGeneric GGUF FormatStandard format of llama.cpp, replacing the old GGMLLocal inference (llama.cpp, ollama, etc.)
GGMLGeneric GGML FormatObsolete quantized format, phased outOnly compatible with old tools, not recommended for new deployments
.bin / .pthPyTorch Model FormatNative PyTorch model formatSecondary development, fine-tuning; not suitable for direct local inference

2.4 Language/Domain Tags

These tags mark the model's language adaptability and domain adaptation direction, enabling rapid screening of scenario-specific models.

TagMeaningTypical Cases
zh / ChineseChinese-optimized modelQwen3-8B-Instruct-zh, Llama-3-8B-Chinese
enEnglish-optimized modelLlama-3-70B-en, Gemma-2-9B-en
Multi / MultilingualMultilingual modelQwen3-14B-Multi, Mistral-7B-Multilingual
General / UDGeneral-domain modelDeepSeek-R1-Distill-Llama-8B-UD, Qwen-7B-General
Domain / Law/Med/FinaVertical-domain model (Law/Medical/Finance)Qwen-Law-7B, Med-Alpaca-13B

2.5 Essential Quick Reference for High-Frequency Industry Abbreviations

AbbreviationFull NameMeaning
ITInstruction TunedInstruction Tuned
SFTSupervised Fine-TuningSupervised Fine-Tuning
DPODirect Preference OptimizationDirect Preference Optimization
ORPOOdds Ratio Preference OptimizationOdds Ratio Preference Optimization
QATQuantization-Aware TrainingQuantization-Aware Training
PTQPost-Training QuantizationPost-Training Quantization
MoEMixture of ExpertsMixture of Experts
VLVision-LanguageVision-Language
RAGRetrieval-Augmented GenerationRetrieval-Augmented Generation
UDUniversal DomainUniversal Domain (Exclusive to DeepSeek)

3. Core Scenarios: Comparison of Universal Naming Standards

Naming standards vary slightly across different platforms and model types, but core tags are universal. The following is a comparison of standards for mainstream scenarios to help you avoid pitfalls in model selection across platforms.

3.1 Open-Source Platform Standard Comparison (Hugging Face vs ModelScope)

Standard DimensionHugging FaceModelScopeCommon Ground
Core StructureVendor/Author/Project - Version - Parameter Scale - Capability - Fine-tuning TypeVendor/Series - Version - Parameter Scale - Capability - Fine-tuning TypeAdheres to core-to-detail structure; universal parameter scale and capability tags
Naming SeparatorMostly -, some _Mostly _, compatible with -No mandatory separator rules; no impact on semantic understanding
Quantization TagsOnly in quantized models (.gguf), placed at the endOnly in quantized models (.gguf), placed at the endIdentical position and meaning of quantization tags
Special TagsMany community-customized tags (e.g., -chatml)Unified Alibaba-series tags (e.g., -instruct)Universal core function tags (Instruct/VL)
Examplemeta-llama/Llama-3-8B-Instructqwen/Qwen3-VL-8B-InstructIdentical structure and core tags

3.2 Quantized Model Standard Comparison (GGUF vs Regular Quantization)

Standard DimensionGGUF Quantized Models (llama.cpp)Regular Quantized Models (.bin/.pth)Common Ground
Core StructureModel Name-Capability-Quantization Method-Quantization Level.ggufModel Name-Capability-Quantization Type.binUniversal core function tags; quantization tags exclusive to quantized models
Quantization TagsIncludes llama.cpp-exclusive tags (e.g., Q4_K_M/IQ1_M)Includes training quantization tags (e.g., QAT/PTQ)Universal quantization training tags (QAT/PTQ)
SeparatorUnified - for all modulesMostly -, some _Separator has no impact on semantics; unambiguous core tags
Examplegoogle_gemma-3-270m-it-qat-Q4_K_M.ggufQwen-8B-qat-Instruct.binIdentical semantics of core function tags (it/Instruct)

3.3 Vendor-Specific Naming Standard Comparison (Mainstream Manufacturers)

Different vendors have minor naming habits, but core tags are universal for quick adaptation in model selection.

VendorCore SeriesNaming HabitsTypical Cases
Alibaba (Tongyi)QwenMostly uses Instruct/VL; parameter scale in B/MQwen3-VL-8B-Instruct, Qwen-7B-Function
GoogleGemmaMostly uses IT instead of Instruct; concise version numbersGemma-3-270m-it, Gemma-2-9B-it-Q5_K_M
MetaLlamaClear version numbers (3/3.1); distinct Chat/Instruct tagsLlama-3-8B-Chat, Llama-3-70B-Instruct
DeepSeekDeepSeekMostly uses Distill/UD/MoE; focuses on inference optimizationDeepSeek-R1-Distill-Llama-8B, DeepSeek-MoE-32B
MicrosoftPhiMostly uses Context/Obedient/RAG; lightweight designPhi-3-Context-Obedient-RAG-Q4_K_M

4. Practical Analysis: Full Interpretation of Typical Model Names

Combined with the above standards, we analyze 3 high-frequency model names to help you quickly master the logic of naming interpretation—understand at a glance.

Case 1: google_gemma-3-270m-it-qat-Q4_K_M.gguf

  • Vendor/Series: Google Gemma 3
  • Parameter Scale: 270m (270 million parameters)
  • Fine-tuning Type: it (Instruction Tuned, ready for direct conversation)
  • Quantization Training: qat (Quantization-Aware Training, higher precision)
  • Quantization Level: Q4_K_M (4-bit quantization, first choice for local deployment)
  • File Format: GGUF (standard llama.cpp format)

Conclusion: A 270M parameter instruction-tuned quantized model from Google's Gemma 3 series, optimized with Quantization-Aware Training and 4-bit quantization—ideal for low-config local deployment.

Case 2: Qwen3-VL-8B-Instruct

  • Vendor/Series: Alibaba Tongyi Qwen 3
  • Modality: VL (Vision-Language multimodal)
  • Parameter Scale: 8B (8 billion parameters)
  • Fine-tuning Type: Instruct (Instruction Tuned, ready for direct image-text Q&A)

Conclusion: An 8B parameter image-text multimodal instruction model from Alibaba's Tongyi Qwen 3 series, supporting visual Q&A and image captioning—ready for direct deployment.

Case 3: DeepSeek-R1-Distill-Llama-8B-UD-IQ1_M.gguf

  • Vendor/Series: DeepSeek R1
  • Training Feature: Distill (Model Distillation, based on Llama architecture)
  • Parameter Scale: 8B (8 billion parameters)
  • Domain: UD (Universal Domain)
  • Quantization Level: IQ1_M (1-bit extreme quantization, ultra-small size)
  • File Format: GGUF

Conclusion: An 8B parameter general-domain quantized model from DeepSeek's R1 series, distilled based on Llama and with 1-bit extreme quantization—ultra-compact for lightweight local deployment.


scan-search-w
 

Image NewsLetter
Icon primary
Newsletter

Subscribe our newsletter

Please enter your email address below and click the subscribe button. By doing so, you agree to our Terms and Conditions.

Your experience on this site will be improved by allowing cookies Cookie Policy