TrAIner Beta

Efficiently train and run inference on transformer models with your personal computer

TrAIner is a lightweight yet powerful platform designed to democratize AI model training by enabling efficient transformer model training on consumer-grade hardware. With advanced memory optimization techniques and a focus on performance, TrAIner makes it possible to build, train, and deploy sophisticated language models without requiring enterprise-level infrastructure.

Whether you're a researcher, developer, or AI enthusiast, TrAIner provides the tools you need to experiment with state-of-the-art model architectures while maintaining complete control over your training pipeline and data.

Interactive Example
import torch
from modeling import ChatModel, ChatConfig, ChatTokenizer

# Load model and tokenizer
model_path = "./model/trained_model"
model, tokenizer, device = load_model_and_tokenizer(model_path)

# Generate text
response = chat("Explain the concept of neural networks", 
               model, tokenizer, device, 
               max_new_tokens=512,
               temperature=0.7)
Neural networks are computational models inspired by the human brain's structure and function. They consist of interconnected nodes or "neurons" organized in layers. Each connection between neurons has a weight that adjusts as the network learns from data. The basic structure includes an input layer, one or more hidden layers, and an output layer. Data flows through the network, with each neuron applying a transformation function to its inputs before passing the result to the next layer. Neural networks learn by adjusting their weights through a process called backpropagation, minimizing the difference between predicted outputs and actual values. This ability to learn from examples makes them powerful for tasks like image recognition, natural language processing, and complex decision-making problems.

Key Features

Efficient Training on Consumer Hardware

TrAIner is specifically optimized to run on consumer-grade hardware, making AI model development accessible to everyone. Through advanced memory optimization techniques, gradient accumulation, and mixed precision training, you can train sophisticated models on standard GPUs.

Memory Optimization
Mixed Precision
Gradient Accumulation
Checkpointing

Advanced Model Architecture

TrAIner implements state-of-the-art transformer architectures with performance-enhancing features such as Mixture of Experts, Grouped Query Attention, and more. These advanced techniques allow for larger, more capable models that can still run on limited hardware.

Transformer Layers
Mixture of Experts
Grouped Query Attention
RMSNorm
Rotary Embeddings

Fast Inference

TrAIner includes highly optimized inference capabilities with support for advanced techniques like speculative decoding, KV-caching, and quantization. This enables responsive, real-time AI interactions even on modest hardware.

Speculative Decoding
KV Caching
8-bit Quantization
Batched Generation

Flexible Data Handling

Train models on a variety of data formats with minimal preprocessing. TrAIner supports both raw text files and structured JSONL formats, with built-in tokenization and data augmentation capabilities.

Text Format
JSONL Format
Custom Tokenizers
Dialogue Format

Installation and Setup

System Requirements

Component Minimum Recommended
CPU 4 cores 8+ cores
RAM 8GB 16GB+
GPU 4GB VRAM (CUDA compatible) 8GB+ VRAM
Storage 10GB free space 50GB+ free space
OS Windows 10, Ubuntu 20.04, macOS 10.15 Latest versions

Installation Guide

# Clone the repository
git clone https://github.com/CPScript/TrAIner.git
cd TrAIner

# Install dependencies
pip install torch safetensors tqdm numpy

# For CUDA support (adjust based on your CUDA version)
pip install torch --extra-index-url https://download.pytorch.org/whl/cu118

# Verify installation
python -c "import torch; print(f'PyTorch version: {torch.__version__}, CUDA available: {torch.cuda.is_available()}')"

Configuration

TrAIner can be configured through command-line arguments or a configuration file. Here's a sample configuration file showing key parameters:

# config.json
{
  "model": {
    "hidden_size": 768,
    "num_layers": 12,
    "num_heads": 12,
    "num_kv_heads": 4,
    "feed_forward_dim": 3072,
    "max_seq_length": 2048,
    "dropout": 0.1,
    "use_moe": true,
    "num_experts": 8,
    "use_rotary": true,
    "use_rmsnorm": true
  },
  "training": {
    "data_path": "./data/dialogues.txt",
    "data_format": "txt",
    "output_dir": "./model",
    "batch_size": 4,
    "epochs": 30,
    "learning_rate": 5e-5,
    "weight_decay": 0.01,
    "warmup_steps": 1000,
    "fp16": true
  }
}

Quickstart Guide

Note: This quickstart guide assumes you have already installed TrAIner and its dependencies as described in the Installation section.

Training Your First Model

# Create a sample training data file
mkdir -p data
cat > data/dialogues.txt << EOF
<|user|>What is artificial intelligence?<|assistant|>Artificial intelligence refers to systems or machines that can perform tasks that typically require human intelligence. This includes learning from examples and experience, recognizing objects, understanding and responding to language, making decisions, and solving problems.<|end|>
<|user|>Explain neural networks<|assistant|>Neural networks are computing systems inspired by the biological neural networks in animal brains. They consist of artificial neurons that can learn from and make decisions based on input data. Deep learning, a subset of machine learning, uses multiple layers of these neural networks to progressively extract higher-level features from raw input.<|end|>
EOF

# Train a model with default parameters
python train.py --data_path ./data/dialogues.txt --epochs 3 --batch_size 2

# The model will be saved in the ./model directory

Running Inference

# Start the interactive chat interface
python chat.py --model_path ./model/final

# You can now chat with your trained model
> Tell me about machine learning
> exit  # Type 'exit' to quit

Non-Interactive Generation

# Generate a response for a specific input
python chat.py --model_path ./model/final --input "What are transformer models?" --no_interactive

Customizing Model Architecture

# Train with a customized model architecture
python train.py \
  --data_path ./data/dialogues.txt \
  --hidden_size 512 \
  --num_layers 8 \
  --num_heads 8 \
  --feed_forward_dim 2048 \
  --use_moe \
  --num_experts 4 \
  --batch_size 2 \
  --epochs 5

Model Architecture

TrAIner implements a modern transformer-based architecture with several advanced features:

Model Architecture Diagram
+--------------------------------+
|         Input Embedding        |
+--------------------------------+
              |
              v
+--------------------------------+
|     Positional Encoding        |
+--------------------------------+
              |
              v
+--------------------------------+
|  Transformer Layer 1           |
|  +--------------------------+  |
|  |  Self-Attention          |  |
|  |  (GQA + RoPE)            |  |
|  +--------------------------+  |
|              |                 |
|              v                 |
|  +--------------------------+  |
|  |  Mixture of Experts      |  |
|  |  Feed Forward Network    |  |
|  +--------------------------+  |
+--------------------------------+
              |
              v
+--------------------------------+
|  Transformer Layer 2...N       |
+--------------------------------+
              |
              v
+--------------------------------+
|     Layer Normalization        |
+--------------------------------+
              |
              v
+--------------------------------+
|     Language Model Head        |
+--------------------------------+

Core Components

Self-Attention Mechanism

TrAIner implements an advanced attention mechanism with support for both traditional multi-head attention and Grouped Query Attention (GQA). GQA reduces computation and memory requirements by sharing key-value heads across query heads.

class MultiHeadAttention(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.config = config
        self.hidden_size = config.hidden_size
        self.num_heads = config.num_heads
        self.num_kv_heads = config.num_kv_heads if config.use_gqa else config.num_heads
        self.head_dim = config.hidden_size // config.num_heads
        
        # For GQA, we have different numbers of q and kv heads
        self.kv_repeats = self.num_heads // self.num_kv_heads if config.use_gqa else 1
        
        # Q, K, V projections
        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
        self.k_proj = nn.Linear(self.hidden_size, self.num_kv_heads * self.head_dim, bias=config.attention_bias)
        self.v_proj = nn.Linear(self.hidden_size, self.num_kv_heads * self.head_dim, bias=config.attention_bias)
        self.out_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)

Mixture of Experts (MoE)

MoE enables larger, more capable models by conditionally activating only a subset of model parameters for each input token. This provides increased model capacity without a proportional increase in computation.

class MoEFeedForward(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.experts = nn.ModuleList([
            GEGLU(
                config.hidden_size, 
                config.feed_forward_dim, 
                bias=True
            ) for _ in range(config.num_experts)
        ])
        self.gate = nn.Linear(config.hidden_size, config.num_experts)
        self.dropout = nn.Dropout(config.dropout)
        self.num_experts = config.num_experts
        self.expert_capacity = 2  # Number of tokens per expert

Training Procedures

TrAIner provides comprehensive training capabilities with support for various optimizations:

Mixed Precision Training

Dramatically reduce memory usage and speed up training with minimal impact on model quality by using lower precision formats (FP16 or BF16) for calculation while maintaining FP32 precision for weights.

# Enable mixed precision training with FP16
python train.py --data_path ./data/dialogues.txt --fp16

# Use BF16 for better numerical stability on supported hardware
python train.py --data_path ./data/dialogues.txt --optimized_bf16

Gradient Accumulation

Train with larger effective batch sizes on limited hardware by accumulating gradients across multiple forward/backward passes before updating weights.

# Use gradient accumulation for larger effective batch size
python train.py --data_path ./data/dialogues.txt --batch_size 2 --gradient_accumulation_steps 8
# Effective batch size: 2 * 8 = 16

Inference

TrAIner provides efficient inference capabilities with several optimization techniques:

Speculative Decoding

This advanced technique uses a smaller "draft" model to predict multiple tokens at once, which are then verified by the main model. This can significantly speed up generation by 2-3x with minimal impact on quality.

# Enable speculative decoding
python chat.py --model_path ./model/final --use_speculative --spec_len 5

# Use a separate draft model
python chat.py --model_path ./model/final --use_speculative --draft_model ./model/smaller

KV Caching

TrAIner automatically caches key-value pairs during generation, avoiding redundant computation and significantly speeding up the inference process.

Quantization

For memory-constrained environments, TrAIner supports 8-bit quantization that can reduce memory requirements by up to 4x with minimal quality degradation.

# Enable 8-bit quantization for inference
python chat.py --model_path ./model/final --quantize

Data Processing

TrAIner supports flexible data formats and processing pipelines:

Supported Data Formats

Text Format

Simple text files with conversation markers:

<|user|>What is machine learning?<|assistant|>Machine learning is a branch of artificial intelligence that focuses on developing systems that can learn from and make decisions based on data. Instead of being explicitly programmed to perform a task, these systems learn from experience.<|end|>
<|user|>How does deep learning differ from machine learning?<|assistant|>Deep learning is a subset of machine learning that uses neural networks with many layers (hence "deep"). While traditional machine learning requires feature extraction by humans, deep learning automatically discovers the representations needed for feature detection or classification from raw data. It's particularly powerful for complex tasks like image and speech recognition.<|end|>

JSONL Format

Structured JSON Lines format for more complex conversations:

{"messages": [{"role": "user", "content": "Explain the concept of transformer models."}, {"role": "assistant", "content": "Transformer models are a type of neural network architecture introduced in the paper 'Attention Is All You Need' by Vaswani et al. They rely primarily on self-attention mechanisms rather than recurrence or convolution, allowing them to process input sequences in parallel rather than sequentially. This parallelization enables more efficient training on large datasets. The architecture typically consists of an encoder and decoder, each made up of attention layers and feed-forward neural networks. Transformers have become the foundation for many state-of-the-art language models like GPT, BERT, and T5."}]}
{"messages": [{"role": "system", "content": "You are a helpful AI assistant."}, {"role": "user", "content": "What are the limitations of current AI systems?"}, {"role": "assistant", "thinking": "I should cover limitations like data quality dependence, lack of common sense, explainability issues, and resource requirements.", "content": "Current AI systems face several important limitations: 1) They're highly dependent on training data quality and can perpetuate biases present in that data. 2) They lack true understanding or common sense reasoning. 3) Many modern AI systems, especially deep learning ones, function as 'black boxes' with limited explainability. 4) They typically require significant computational resources and energy to train. 5) They struggle with transferring knowledge between different tasks (though transfer learning helps). 6) They're vulnerable to adversarial attacks and can be fooled by carefully crafted inputs."}]}

Mixture of Experts

Coming Soon

In-depth documentation about Mixture of Experts architecture implementation in TrAIner

The Mixture of Experts (MoE) module in TrAIner enables training models with significantly increased parameter count while maintaining reasonable computational requirements. MoE works by activating only a subset of the model's parameters for each input token, which enables models to scale to billions of parameters even on consumer hardware.

This section will include detailed explanations of:

  • How MoE architecture works in transformer models
  • TrAIner's implementation of routing algorithms
  • Expert capacity and load balancing techniques
  • Performance benchmarks comparing standard vs MoE models
  • Step-by-step guide to configuring and training MoE models
  • Advanced tuning parameters and best practices

Distributed Training

Coming Soon

Comprehensive guide to distributed training with TrAIner across multiple GPUs and machines

TrAIner supports distributed training across multiple GPUs and even multiple machines, allowing you to scale up your training for faster completion or to accommodate larger models. This is implemented using PyTorch's DistributedDataParallel (DDP) framework.

This section will cover:

  • Setting up multi-GPU training on a single machine
  • Configuring distributed training across multiple machines
  • Scaling strategies and performance optimizations
  • Handling checkpoints and model saving in distributed environments
  • Common issues and troubleshooting tips
  • Advanced configurations for different network topologies

Quantization

Coming Soon

Detailed guide to model quantization for improved inference performance

Quantization is a technique that reduces the precision of model weights from 32-bit floating point (FP32) to lower precision formats like 8-bit integers (INT8). This can significantly reduce memory usage and improve inference speed with minimal impact on generation quality.

This section will include:

  • Post-training quantization techniques supported in TrAIner
  • Quantization-aware training for better results
  • Performance benchmarks comparing FP32, FP16, and INT8 models
  • Effects of quantization on different model components
  • Advanced calibration techniques to minimize accuracy loss
  • Hardware-specific optimizations for quantized models

Speculative Decoding

Coming Soon

Advanced technique documentation for faster text generation using speculative methods

Speculative decoding is an advanced technique that significantly speeds up text generation by using a smaller "draft" model to predict multiple tokens at once, which are then verified by the main model. This can provide a 2-3x speedup in generation while maintaining output quality.

This section will cover:

  • Theoretical foundation of speculative decoding
  • Implementation details in TrAIner
  • Creating and training effective draft models
  • Performance benchmarks and quality comparisons
  • Tuning parameters to optimize the speed/quality tradeoff
  • Advanced techniques like tree-based speculation

Text Generation

Coming Soon

Comprehensive guide to generating text with TrAIner models

TrAIner provides flexible text generation capabilities with various options to control the output style, creativity, and generation speed. This section will detail how to use the generation API for different use cases.

Topics to be covered include:

  • Interactive chat interface usage
  • Programmatic text generation with the API
  • Controlling parameters like temperature, top-p, and top-k
  • Streaming generation for real-time applications
  • Advanced sampling techniques and their effects
  • Best practices for prompt engineering

Fine-tuning

Coming Soon

Detailed guide to fine-tuning existing models on your custom datasets

Fine-tuning allows you to take a pre-trained model and adapt it to specific domains or tasks by training on your own datasets. This often achieves better results than training from scratch, especially for specialized applications.

This section will provide detailed information on:

  • Preparing your dataset for fine-tuning
  • Selecting appropriate hyperparameters
  • Techniques to prevent catastrophic forgetting
  • Parameter-efficient fine-tuning methods
  • Evaluating fine-tuned model performance
  • Advanced fine-tuning for specific applications

JSONL Format

Coming Soon

Detailed specification and examples for using JSONL formatted training data

TrAIner supports structured JSONL (JSON Lines) format for training data, which provides more flexibility than plain text formats. This allows for more complex conversation structures, metadata, and special formatting.

This section will include:

  • JSONL format specification for TrAIner
  • Examples of various conversation patterns
  • Supporting system prompts and multi-turn conversations
  • Including metadata and auxiliary information
  • Tools for converting between formats
  • Performance considerations for large datasets

CLI Usage

Coming Soon

Complete reference for command-line interface options and parameters

TrAIner provides a comprehensive command-line interface for training, inference, and model management. This section will serve as a complete reference for all available options.

Topics to be covered include:

  • Complete listing of all CLI commands and parameters
  • Train command options and syntax
  • Chat command options for inference
  • Model conversion and export utilities
  • Data processing command-line tools
  • Automation and scripting examples

Model API

Coming Soon

Comprehensive documentation of the Model API for developers

The Model API provides programmatic access to TrAIner's model architecture, allowing developers to integrate, extend, and customize models for their specific needs.

This section will include:

  • ChatModel class API reference
  • ChatConfig options and parameter details
  • Methods for loading and saving models
  • Generation API and parameter explanations
  • Forward pass customization options
  • Extending the model with custom layers

Tokenizer API

Coming Soon

Detailed documentation of the tokenization system and API

The Tokenizer API handles the conversion between text and token IDs that the model can process. TrAIner includes a custom tokenizer designed to efficiently handle multiple languages and special tokens.

This documentation will cover:

  • ChatTokenizer class API reference
  • Building and managing custom vocabularies
  • Adding special tokens and handling
  • Tokenization patterns and regular expressions
  • Encoding and decoding methods
  • Performance optimization techniques

Training API

Coming Soon

Comprehensive reference for the training and optimization API

The Training API provides programmatic access to TrAIner's training pipeline, allowing developers to customize the training process, integrate custom datasets, and implement advanced training techniques.

This section will include:

  • Training loop implementation details
  • Optimizer and scheduler configuration
  • Gradient accumulation and checkpointing
  • Mixed precision training API
  • Distributed training setup
  • Custom training callbacks and hooks

Dataset API

Coming Soon

Detailed documentation for dataset handling and processing

The Dataset API handles loading, processing, and batching training data for efficient model training. This includes support for various data formats, preprocessing techniques, and data augmentation.

This documentation will cover:

  • ChatDataset and JsonlChatDataset class references
  • Creating custom dataset implementations
  • Efficient data loading and preprocessing
  • Data augmentation techniques
  • Handling large datasets efficiently
  • Implementing custom collation functions

Performance Benchmarks

Training Performance

Model Size Hardware Batch Size Tokens/Second Memory Usage
TrAIner-Nano (14M) GTX 1660 (6GB) 32 35,000 2.1 GB
TrAIner-Micro (42M) GTX 1660 (6GB) 16 22,000 3.8 GB
TrAIner-Mini (125M) RTX 3060 (12GB) 8 12,000 5.2 GB
TrAIner-Small (350M) RTX 3080 (10GB) 4 5,800 8.4 GB
TrAIner-Base (1.3B) RTX 4090 (24GB) 2 2,200 14.6 GB

Optimization

Coming Soon

Detailed guide to optimizing model performance for both training and inference

This section will provide comprehensive information about performance optimization techniques for both training and inference. It will help you maximize the efficiency of TrAIner on your hardware.

Topics to be covered include:

  • Memory optimization techniques
  • Computational efficiency improvements
  • Hardware-specific optimizations
  • Speed/quality tradeoffs and how to balance them
  • Advanced profiling and bottleneck identification
  • Case studies and benchmark analysis

Hardware Requirements

Coming Soon

Comprehensive hardware specifications for different model sizes and use cases

This section will provide detailed hardware recommendations for different model sizes and use cases, helping you plan your hardware requirements based on your specific goals.

Information will include:

  • Detailed GPU recommendations by model size
  • RAM and CPU requirements
  • Storage considerations for datasets and checkpoints
  • Cost-effective hardware configurations
  • Cloud computing options and cost analysis
  • Future hardware roadmap considerations

Frequently Asked Questions

General Questions

What makes TrAIner different from other AI frameworks?

TrAIner is specifically designed for efficient AI model training on consumer hardware. Unlike many frameworks that require enterprise-grade GPUs, TrAIner implements advanced memory optimization techniques, includes Mixture of Experts architecture, and provides speculative decoding - all optimized to run on standard consumer GPUs.

Can I train models with my own data?

Yes! TrAIner is designed to be used with your own data. You can provide training data in simple text format or structured JSONL format. The system will automatically tokenize and process your data for training.

What hardware do I need?

TrAIner works on a wide range of hardware. For minimal setups, you can run smaller models on CPUs or GPUs with at least 4GB VRAM. For optimal performance, a GPU with 8GB+ VRAM (like RTX 3060 or better) is recommended. The system automatically adapts to your available hardware.

Troubleshooting

Coming Soon

Comprehensive troubleshooting guide for common issues and their solutions

This section will provide step-by-step guidance for diagnosing and resolving common issues you might encounter when using TrAIner.

Topics to be covered include:

  • Common installation issues and solutions
  • CUDA compatibility problems
  • Memory errors during training
  • Slow training or inference performance
  • Model quality issues and fine-tuning problems
  • Data formatting and preprocessing errors

Community

Coming Soon

Information about joining the TrAIner community and contributing to the project

TrAIner has a growing community of researchers, developers, and enthusiasts. This section will provide information about how to connect with others, contribute to the project, and share your work.

This section will include:

  • Discord server and discussion forums
  • Contributing guidelines for developers
  • Model sharing platform and community models
  • Research papers and publications
  • Community showcase of TrAIner applications
  • Tutorials and community resources

Roadmap

Coming Soon

Development roadmap and future plans for TrAIner

This section will outline the planned development roadmap for TrAIner, including upcoming features, improvements, and research directions.

Future plans include:

  • Support for multi-modal models (text + images)
  • Advanced LoRA and QLoRA fine-tuning support
  • Improved quantization techniques for even better efficiency
  • Web-based training and inference interface
  • Mobile deployment options
  • Pre-trained model collection and repository