TrAIner Beta
Efficiently train and run inference on transformer models with your personal computer
TrAIner is a lightweight yet powerful platform designed to democratize AI model training by enabling efficient transformer model training on consumer-grade hardware. With advanced memory optimization techniques and a focus on performance, TrAIner makes it possible to build, train, and deploy sophisticated language models without requiring enterprise-level infrastructure.
Whether you're a researcher, developer, or AI enthusiast, TrAIner provides the tools you need to experiment with state-of-the-art model architectures while maintaining complete control over your training pipeline and data.
import torch
from modeling import ChatModel, ChatConfig, ChatTokenizer
# Load model and tokenizer
model_path = "./model/trained_model"
model, tokenizer, device = load_model_and_tokenizer(model_path)
# Generate text
response = chat("Explain the concept of neural networks",
model, tokenizer, device,
max_new_tokens=512,
temperature=0.7)
Key Features
Efficient Training on Consumer Hardware
TrAIner is specifically optimized to run on consumer-grade hardware, making AI model development accessible to everyone. Through advanced memory optimization techniques, gradient accumulation, and mixed precision training, you can train sophisticated models on standard GPUs.
Advanced Model Architecture
TrAIner implements state-of-the-art transformer architectures with performance-enhancing features such as Mixture of Experts, Grouped Query Attention, and more. These advanced techniques allow for larger, more capable models that can still run on limited hardware.
Fast Inference
TrAIner includes highly optimized inference capabilities with support for advanced techniques like speculative decoding, KV-caching, and quantization. This enables responsive, real-time AI interactions even on modest hardware.
Flexible Data Handling
Train models on a variety of data formats with minimal preprocessing. TrAIner supports both raw text files and structured JSONL formats, with built-in tokenization and data augmentation capabilities.
Installation and Setup
System Requirements
Component | Minimum | Recommended |
---|---|---|
CPU | 4 cores | 8+ cores |
RAM | 8GB | 16GB+ |
GPU | 4GB VRAM (CUDA compatible) | 8GB+ VRAM |
Storage | 10GB free space | 50GB+ free space |
OS | Windows 10, Ubuntu 20.04, macOS 10.15 | Latest versions |
Installation Guide
# Clone the repository
git clone https://github.com/CPScript/TrAIner.git
cd TrAIner
# Install dependencies
pip install torch safetensors tqdm numpy
# For CUDA support (adjust based on your CUDA version)
pip install torch --extra-index-url https://download.pytorch.org/whl/cu118
# Verify installation
python -c "import torch; print(f'PyTorch version: {torch.__version__}, CUDA available: {torch.cuda.is_available()}')"
Configuration
TrAIner can be configured through command-line arguments or a configuration file. Here's a sample configuration file showing key parameters:
# config.json
{
"model": {
"hidden_size": 768,
"num_layers": 12,
"num_heads": 12,
"num_kv_heads": 4,
"feed_forward_dim": 3072,
"max_seq_length": 2048,
"dropout": 0.1,
"use_moe": true,
"num_experts": 8,
"use_rotary": true,
"use_rmsnorm": true
},
"training": {
"data_path": "./data/dialogues.txt",
"data_format": "txt",
"output_dir": "./model",
"batch_size": 4,
"epochs": 30,
"learning_rate": 5e-5,
"weight_decay": 0.01,
"warmup_steps": 1000,
"fp16": true
}
}
Quickstart Guide
Training Your First Model
# Create a sample training data file
mkdir -p data
cat > data/dialogues.txt << EOF
<|user|>What is artificial intelligence?<|assistant|>Artificial intelligence refers to systems or machines that can perform tasks that typically require human intelligence. This includes learning from examples and experience, recognizing objects, understanding and responding to language, making decisions, and solving problems.<|end|>
<|user|>Explain neural networks<|assistant|>Neural networks are computing systems inspired by the biological neural networks in animal brains. They consist of artificial neurons that can learn from and make decisions based on input data. Deep learning, a subset of machine learning, uses multiple layers of these neural networks to progressively extract higher-level features from raw input.<|end|>
EOF
# Train a model with default parameters
python train.py --data_path ./data/dialogues.txt --epochs 3 --batch_size 2
# The model will be saved in the ./model directory
Running Inference
# Start the interactive chat interface
python chat.py --model_path ./model/final
# You can now chat with your trained model
> Tell me about machine learning
> exit # Type 'exit' to quit
Non-Interactive Generation
# Generate a response for a specific input
python chat.py --model_path ./model/final --input "What are transformer models?" --no_interactive
Customizing Model Architecture
# Train with a customized model architecture
python train.py \
--data_path ./data/dialogues.txt \
--hidden_size 512 \
--num_layers 8 \
--num_heads 8 \
--feed_forward_dim 2048 \
--use_moe \
--num_experts 4 \
--batch_size 2 \
--epochs 5
Model Architecture
TrAIner implements a modern transformer-based architecture with several advanced features:
+--------------------------------+ | Input Embedding | +--------------------------------+ | v +--------------------------------+ | Positional Encoding | +--------------------------------+ | v +--------------------------------+ | Transformer Layer 1 | | +--------------------------+ | | | Self-Attention | | | | (GQA + RoPE) | | | +--------------------------+ | | | | | v | | +--------------------------+ | | | Mixture of Experts | | | | Feed Forward Network | | | +--------------------------+ | +--------------------------------+ | v +--------------------------------+ | Transformer Layer 2...N | +--------------------------------+ | v +--------------------------------+ | Layer Normalization | +--------------------------------+ | v +--------------------------------+ | Language Model Head | +--------------------------------+
Core Components
Self-Attention Mechanism
TrAIner implements an advanced attention mechanism with support for both traditional multi-head attention and Grouped Query Attention (GQA). GQA reduces computation and memory requirements by sharing key-value heads across query heads.
class MultiHeadAttention(nn.Module):
def __init__(self, config):
super().__init__()
self.config = config
self.hidden_size = config.hidden_size
self.num_heads = config.num_heads
self.num_kv_heads = config.num_kv_heads if config.use_gqa else config.num_heads
self.head_dim = config.hidden_size // config.num_heads
# For GQA, we have different numbers of q and kv heads
self.kv_repeats = self.num_heads // self.num_kv_heads if config.use_gqa else 1
# Q, K, V projections
self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
self.k_proj = nn.Linear(self.hidden_size, self.num_kv_heads * self.head_dim, bias=config.attention_bias)
self.v_proj = nn.Linear(self.hidden_size, self.num_kv_heads * self.head_dim, bias=config.attention_bias)
self.out_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
Mixture of Experts (MoE)
MoE enables larger, more capable models by conditionally activating only a subset of model parameters for each input token. This provides increased model capacity without a proportional increase in computation.
class MoEFeedForward(nn.Module):
def __init__(self, config):
super().__init__()
self.experts = nn.ModuleList([
GEGLU(
config.hidden_size,
config.feed_forward_dim,
bias=True
) for _ in range(config.num_experts)
])
self.gate = nn.Linear(config.hidden_size, config.num_experts)
self.dropout = nn.Dropout(config.dropout)
self.num_experts = config.num_experts
self.expert_capacity = 2 # Number of tokens per expert
Training Procedures
TrAIner provides comprehensive training capabilities with support for various optimizations:
Mixed Precision Training
Dramatically reduce memory usage and speed up training with minimal impact on model quality by using lower precision formats (FP16 or BF16) for calculation while maintaining FP32 precision for weights.
# Enable mixed precision training with FP16
python train.py --data_path ./data/dialogues.txt --fp16
# Use BF16 for better numerical stability on supported hardware
python train.py --data_path ./data/dialogues.txt --optimized_bf16
Gradient Accumulation
Train with larger effective batch sizes on limited hardware by accumulating gradients across multiple forward/backward passes before updating weights.
# Use gradient accumulation for larger effective batch size
python train.py --data_path ./data/dialogues.txt --batch_size 2 --gradient_accumulation_steps 8
# Effective batch size: 2 * 8 = 16
Inference
TrAIner provides efficient inference capabilities with several optimization techniques:
Speculative Decoding
This advanced technique uses a smaller "draft" model to predict multiple tokens at once, which are then verified by the main model. This can significantly speed up generation by 2-3x with minimal impact on quality.
# Enable speculative decoding
python chat.py --model_path ./model/final --use_speculative --spec_len 5
# Use a separate draft model
python chat.py --model_path ./model/final --use_speculative --draft_model ./model/smaller
KV Caching
TrAIner automatically caches key-value pairs during generation, avoiding redundant computation and significantly speeding up the inference process.
Quantization
For memory-constrained environments, TrAIner supports 8-bit quantization that can reduce memory requirements by up to 4x with minimal quality degradation.
# Enable 8-bit quantization for inference
python chat.py --model_path ./model/final --quantize
Data Processing
TrAIner supports flexible data formats and processing pipelines:
Supported Data Formats
Text Format
Simple text files with conversation markers:
<|user|>What is machine learning?<|assistant|>Machine learning is a branch of artificial intelligence that focuses on developing systems that can learn from and make decisions based on data. Instead of being explicitly programmed to perform a task, these systems learn from experience.<|end|>
<|user|>How does deep learning differ from machine learning?<|assistant|>Deep learning is a subset of machine learning that uses neural networks with many layers (hence "deep"). While traditional machine learning requires feature extraction by humans, deep learning automatically discovers the representations needed for feature detection or classification from raw data. It's particularly powerful for complex tasks like image and speech recognition.<|end|>
JSONL Format
Structured JSON Lines format for more complex conversations:
{"messages": [{"role": "user", "content": "Explain the concept of transformer models."}, {"role": "assistant", "content": "Transformer models are a type of neural network architecture introduced in the paper 'Attention Is All You Need' by Vaswani et al. They rely primarily on self-attention mechanisms rather than recurrence or convolution, allowing them to process input sequences in parallel rather than sequentially. This parallelization enables more efficient training on large datasets. The architecture typically consists of an encoder and decoder, each made up of attention layers and feed-forward neural networks. Transformers have become the foundation for many state-of-the-art language models like GPT, BERT, and T5."}]}
{"messages": [{"role": "system", "content": "You are a helpful AI assistant."}, {"role": "user", "content": "What are the limitations of current AI systems?"}, {"role": "assistant", "thinking": "I should cover limitations like data quality dependence, lack of common sense, explainability issues, and resource requirements.", "content": "Current AI systems face several important limitations: 1) They're highly dependent on training data quality and can perpetuate biases present in that data. 2) They lack true understanding or common sense reasoning. 3) Many modern AI systems, especially deep learning ones, function as 'black boxes' with limited explainability. 4) They typically require significant computational resources and energy to train. 5) They struggle with transferring knowledge between different tasks (though transfer learning helps). 6) They're vulnerable to adversarial attacks and can be fooled by carefully crafted inputs."}]}
Mixture of Experts
Coming Soon
In-depth documentation about Mixture of Experts architecture implementation in TrAIner
The Mixture of Experts (MoE) module in TrAIner enables training models with significantly increased parameter count while maintaining reasonable computational requirements. MoE works by activating only a subset of the model's parameters for each input token, which enables models to scale to billions of parameters even on consumer hardware.
This section will include detailed explanations of:
- How MoE architecture works in transformer models
- TrAIner's implementation of routing algorithms
- Expert capacity and load balancing techniques
- Performance benchmarks comparing standard vs MoE models
- Step-by-step guide to configuring and training MoE models
- Advanced tuning parameters and best practices
Distributed Training
Coming Soon
Comprehensive guide to distributed training with TrAIner across multiple GPUs and machines
TrAIner supports distributed training across multiple GPUs and even multiple machines, allowing you to scale up your training for faster completion or to accommodate larger models. This is implemented using PyTorch's DistributedDataParallel (DDP) framework.
This section will cover:
- Setting up multi-GPU training on a single machine
- Configuring distributed training across multiple machines
- Scaling strategies and performance optimizations
- Handling checkpoints and model saving in distributed environments
- Common issues and troubleshooting tips
- Advanced configurations for different network topologies
Quantization
Coming Soon
Detailed guide to model quantization for improved inference performance
Quantization is a technique that reduces the precision of model weights from 32-bit floating point (FP32) to lower precision formats like 8-bit integers (INT8). This can significantly reduce memory usage and improve inference speed with minimal impact on generation quality.
This section will include:
- Post-training quantization techniques supported in TrAIner
- Quantization-aware training for better results
- Performance benchmarks comparing FP32, FP16, and INT8 models
- Effects of quantization on different model components
- Advanced calibration techniques to minimize accuracy loss
- Hardware-specific optimizations for quantized models
Speculative Decoding
Coming Soon
Advanced technique documentation for faster text generation using speculative methods
Speculative decoding is an advanced technique that significantly speeds up text generation by using a smaller "draft" model to predict multiple tokens at once, which are then verified by the main model. This can provide a 2-3x speedup in generation while maintaining output quality.
This section will cover:
- Theoretical foundation of speculative decoding
- Implementation details in TrAIner
- Creating and training effective draft models
- Performance benchmarks and quality comparisons
- Tuning parameters to optimize the speed/quality tradeoff
- Advanced techniques like tree-based speculation
Text Generation
Coming Soon
Comprehensive guide to generating text with TrAIner models
TrAIner provides flexible text generation capabilities with various options to control the output style, creativity, and generation speed. This section will detail how to use the generation API for different use cases.
Topics to be covered include:
- Interactive chat interface usage
- Programmatic text generation with the API
- Controlling parameters like temperature, top-p, and top-k
- Streaming generation for real-time applications
- Advanced sampling techniques and their effects
- Best practices for prompt engineering
Fine-tuning
Coming Soon
Detailed guide to fine-tuning existing models on your custom datasets
Fine-tuning allows you to take a pre-trained model and adapt it to specific domains or tasks by training on your own datasets. This often achieves better results than training from scratch, especially for specialized applications.
This section will provide detailed information on:
- Preparing your dataset for fine-tuning
- Selecting appropriate hyperparameters
- Techniques to prevent catastrophic forgetting
- Parameter-efficient fine-tuning methods
- Evaluating fine-tuned model performance
- Advanced fine-tuning for specific applications
JSONL Format
Coming Soon
Detailed specification and examples for using JSONL formatted training data
TrAIner supports structured JSONL (JSON Lines) format for training data, which provides more flexibility than plain text formats. This allows for more complex conversation structures, metadata, and special formatting.
This section will include:
- JSONL format specification for TrAIner
- Examples of various conversation patterns
- Supporting system prompts and multi-turn conversations
- Including metadata and auxiliary information
- Tools for converting between formats
- Performance considerations for large datasets
CLI Usage
Coming Soon
Complete reference for command-line interface options and parameters
TrAIner provides a comprehensive command-line interface for training, inference, and model management. This section will serve as a complete reference for all available options.
Topics to be covered include:
- Complete listing of all CLI commands and parameters
- Train command options and syntax
- Chat command options for inference
- Model conversion and export utilities
- Data processing command-line tools
- Automation and scripting examples
Model API
Coming Soon
Comprehensive documentation of the Model API for developers
The Model API provides programmatic access to TrAIner's model architecture, allowing developers to integrate, extend, and customize models for their specific needs.
This section will include:
- ChatModel class API reference
- ChatConfig options and parameter details
- Methods for loading and saving models
- Generation API and parameter explanations
- Forward pass customization options
- Extending the model with custom layers
Tokenizer API
Coming Soon
Detailed documentation of the tokenization system and API
The Tokenizer API handles the conversion between text and token IDs that the model can process. TrAIner includes a custom tokenizer designed to efficiently handle multiple languages and special tokens.
This documentation will cover:
- ChatTokenizer class API reference
- Building and managing custom vocabularies
- Adding special tokens and handling
- Tokenization patterns and regular expressions
- Encoding and decoding methods
- Performance optimization techniques
Training API
Coming Soon
Comprehensive reference for the training and optimization API
The Training API provides programmatic access to TrAIner's training pipeline, allowing developers to customize the training process, integrate custom datasets, and implement advanced training techniques.
This section will include:
- Training loop implementation details
- Optimizer and scheduler configuration
- Gradient accumulation and checkpointing
- Mixed precision training API
- Distributed training setup
- Custom training callbacks and hooks
Dataset API
Coming Soon
Detailed documentation for dataset handling and processing
The Dataset API handles loading, processing, and batching training data for efficient model training. This includes support for various data formats, preprocessing techniques, and data augmentation.
This documentation will cover:
- ChatDataset and JsonlChatDataset class references
- Creating custom dataset implementations
- Efficient data loading and preprocessing
- Data augmentation techniques
- Handling large datasets efficiently
- Implementing custom collation functions
Performance Benchmarks
Training Performance
Model Size | Hardware | Batch Size | Tokens/Second | Memory Usage |
---|---|---|---|---|
TrAIner-Nano (14M) | GTX 1660 (6GB) | 32 | 35,000 | 2.1 GB |
TrAIner-Micro (42M) | GTX 1660 (6GB) | 16 | 22,000 | 3.8 GB |
TrAIner-Mini (125M) | RTX 3060 (12GB) | 8 | 12,000 | 5.2 GB |
TrAIner-Small (350M) | RTX 3080 (10GB) | 4 | 5,800 | 8.4 GB |
TrAIner-Base (1.3B) | RTX 4090 (24GB) | 2 | 2,200 | 14.6 GB |
Optimization
Coming Soon
Detailed guide to optimizing model performance for both training and inference
This section will provide comprehensive information about performance optimization techniques for both training and inference. It will help you maximize the efficiency of TrAIner on your hardware.
Topics to be covered include:
- Memory optimization techniques
- Computational efficiency improvements
- Hardware-specific optimizations
- Speed/quality tradeoffs and how to balance them
- Advanced profiling and bottleneck identification
- Case studies and benchmark analysis
Hardware Requirements
Coming Soon
Comprehensive hardware specifications for different model sizes and use cases
This section will provide detailed hardware recommendations for different model sizes and use cases, helping you plan your hardware requirements based on your specific goals.
Information will include:
- Detailed GPU recommendations by model size
- RAM and CPU requirements
- Storage considerations for datasets and checkpoints
- Cost-effective hardware configurations
- Cloud computing options and cost analysis
- Future hardware roadmap considerations
Frequently Asked Questions
General Questions
What makes TrAIner different from other AI frameworks?
TrAIner is specifically designed for efficient AI model training on consumer hardware. Unlike many frameworks that require enterprise-grade GPUs, TrAIner implements advanced memory optimization techniques, includes Mixture of Experts architecture, and provides speculative decoding - all optimized to run on standard consumer GPUs.
Can I train models with my own data?
Yes! TrAIner is designed to be used with your own data. You can provide training data in simple text format or structured JSONL format. The system will automatically tokenize and process your data for training.
What hardware do I need?
TrAIner works on a wide range of hardware. For minimal setups, you can run smaller models on CPUs or GPUs with at least 4GB VRAM. For optimal performance, a GPU with 8GB+ VRAM (like RTX 3060 or better) is recommended. The system automatically adapts to your available hardware.
Troubleshooting
Coming Soon
Comprehensive troubleshooting guide for common issues and their solutions
This section will provide step-by-step guidance for diagnosing and resolving common issues you might encounter when using TrAIner.
Topics to be covered include:
- Common installation issues and solutions
- CUDA compatibility problems
- Memory errors during training
- Slow training or inference performance
- Model quality issues and fine-tuning problems
- Data formatting and preprocessing errors
Community
Coming Soon
Information about joining the TrAIner community and contributing to the project
TrAIner has a growing community of researchers, developers, and enthusiasts. This section will provide information about how to connect with others, contribute to the project, and share your work.
This section will include:
- Discord server and discussion forums
- Contributing guidelines for developers
- Model sharing platform and community models
- Research papers and publications
- Community showcase of TrAIner applications
- Tutorials and community resources
Roadmap
Coming Soon
Development roadmap and future plans for TrAIner
This section will outline the planned development roadmap for TrAIner, including upcoming features, improvements, and research directions.
Future plans include:
- Support for multi-modal models (text + images)
- Advanced LoRA and QLoRA fine-tuning support
- Improved quantization techniques for even better efficiency
- Web-based training and inference interface
- Mobile deployment options
- Pre-trained model collection and repository