| COMMIT |
0.40 |
Add some appendix E runtimes (#927) |
|
Repetitive commit lines suggest possible |
2025-12-19 |
| COMMIT |
0.30 |
Add Tiny Aya from scratch (#962) |
|
Standard PR title format, but too generi |
2026-02-19 |
| COMMIT |
0.30 |
Readability and code quality improvements (#959) |
|
Generic but plausible human commit messa |
2026-02-18 |
| COMMIT |
0.30 |
Sliding window KV Cache bug fix (#925) |
|
Structured but human bug report; slight |
2025-12-16 |
| COMMIT |
0.30 |
Add Olmo 3 README (#915) |
|
README update with repetitive edits; min |
2025-11-23 |
| COMMIT |
0.30 |
fix: correct role of the beta hyperparameter on the DPO loss |
|
Technical explanation, formal tone but n |
2025-09-13 |
| COMMIT |
0.20 |
User argpars utils to show default args on command line |
|
Brief technical command line utility des |
2026-03-02 |
| COMMIT |
0.20 |
remove redundant assignment (#961) |
|
Brief technical note, typical human comm |
2026-02-19 |
| COMMIT |
0.20 |
Use correct input in layernorm example (#960) |
|
Bullet points follow human commit habit, |
2026-02-19 |
| COMMIT |
0.20 |
Fix flex attention in PyTorch 2.10 (#957) |
|
Specific technical fix referencing PyTor |
2026-02-09 |
| COMMIT |
0.20 |
Fix docstring parameter names in compute_dpo_loss function ( |
|
Technical docstring fix, jargon present |
2026-01-29 |
| COMMIT |
0.20 |
Update unit tests for CI (#952) |
|
CI update commits often terse; bullet fo |
2026-01-27 |
| COMMIT |
0.20 |
Cover Python 3.12 (#933) |
|
Simple PR title referencing issue; typic |
2025-12-27 |
| COMMIT |
0.20 |
Gated DeltaNet updates (#926) |
|
Concise, technical PR title; human-autho |
2025-12-19 |
| COMMIT |
0.20 |
n_heads × d_head -> d_head × d_head in DeltaNet (#903) |
|
Structured explanation but uses human-li |
2025-11-06 |
| COMMIT |
0.20 |
Training on MPS in PyTorch 2.9 (#900) |
|
Terse, informal update; typical human co |
2025-11-01 |
| COMMIT |
0.20 |
simplify uv command (#898) |
|
Very brief and casual; typical human com |
2025-11-01 |
| COMMIT |
0.20 |
Add bonus dependencies to pyproject (#897) |
|
Minimal, informal updates; typical human |
2025-10-29 |
| COMMIT |
0.20 |
Fix ffn link (#892) |
|
Simple fix with manual attribution; natu |
2025-10-22 |
| COMMIT |
0.20 |
Make quote style consistent (#891) |
|
Concise, direct change description; typi |
2025-10-22 |
| COMMIT |
0.20 |
- docs(moe): correct arXiv link for DeepSeekMoE (#890) |
|
Structured, factual documentation correc |
2025-10-21 |
| COMMIT |
0.20 |
Mixture-of-Experts intro (#888) |
|
Brief, descriptive title; typical human |
2025-10-20 |
| COMMIT |
0.20 |
Make it easier to toggle between thinking and instruct varia |
|
Clear, direct feature improvement; natur |
2025-10-17 |
| COMMIT |
0.20 |
Update the compression rate comment in MLA (#883) |
|
Informal comment update; typical human r |
2025-10-14 |
| COMMIT |
0.20 |
Add LoRA scaling (#823) |
|
Concise, technical PR title; common huma |
2025-09-14 |
| COMMIT |
0.20 |
Fix IMDb spelling (#811) |
|
Short, direct fixes; informal 'IMDb' spe |
2025-09-06 |
| COMMIT |
0.10 |
fix: pin 1 unpinned action(s) (#987) |
|
Template-based automated security fix no |
2026-03-26 |
| COMMIT |
0.10 |
fix: added KVcache in `generate_text_basic_stream` (#981) |
|
Concise technical commit message with sp |
2026-03-21 |
| COMMIT |
0.10 |
Minor typo fix (#974) |
|
Simple typo fix notice, typical human co |
2026-03-07 |
| COMMIT |
0.10 |
Bpe whitespace fixes (#975) |
|
Brief technical reference to BPE whitesp |
2026-03-07 |
| COMMIT |
0.10 |
Add more analysis to qwen3.5 image |
|
Concise, terse enhancement description t |
2026-03-04 |
| COMMIT |
0.10 |
Use full HF url |
|
Very brief technical instruction with ab |
2026-03-03 |
| COMMIT |
0.10 |
Qwen3.5 from scratch (#969) |
|
Terse commit title with minimal, human-l |
2026-03-03 |
| COMMIT |
0.10 |
Jupyter scrolling glitch tips (#965) |
|
Specific issue reference to Jupyter scro |
2026-02-27 |
| COMMIT |
0.10 |
image size |
|
Extremely terse; typical human GitHub sh |
2026-02-19 |
| COMMIT |
0.10 |
image size |
|
Very minimal, template-like entry |
2026-02-19 |
| COMMIT |
0.10 |
formatting fix |
|
Too brief to assess; likely human short |
2026-02-19 |
| COMMIT |
0.10 |
yearly update |
|
Brief, minimal human phrase with no AI h |
2026-01-02 |
| COMMIT |
0.10 |
upload saved nb |
|
Terse, informal human note; likely manua |
2025-12-21 |
| COMMIT |
0.10 |
Update memory efficient loading nb |
|
Direct technical note; common human shor |
2025-12-21 |
| COMMIT |
0.10 |
update submodule |
|
Minimal, human-specific action; no AI ph |
2025-12-20 |
| COMMIT |
0.10 |
Remove persistent flag from cache buffers (#916) |
|
Direct technical title; typical human ch |
2025-11-25 |
| COMMIT |
0.10 |
Update README wrt multi-query attention |
|
Formal but clear technical summary, not |
2025-11-17 |
| COMMIT |
0.10 |
Fix MHAEinsum weight dimension bug when d_in != d_out (#857) |
|
Detailed technical fix with unit tests; |
2025-11-01 |
| COMMIT |
0.10 |
Windows compile (#845) |
|
Minimal, terse commit-like entries show |
2025-09-26 |
| COMMIT |
0.10 |
Update package dependencies (#842) |
|
Single-line PR title is typical GitHub s |
2025-09-22 |
| COMMIT |
0.10 |
Improve MoE implementation (#841) |
|
Brief title only; no text shows AI styli |
2025-09-22 |
| COMMIT |
0.10 |
Note about devcontainer root usage (#833) |
|
Very minimal entry with template placeho |
2025-09-21 |
| COMMIT |
0.10 |
Note about RoPE usage (#839) |
|
Terse bullet points with simple notes; n |
2025-09-20 |
| COMMIT |
0.10 |
`Qwen3Tokenizer` fix for Qwen3 Base models and generation mi |
|
Technical details, reverts, and copy ref |
2025-09-17 |
| COMMIT |
0.10 |
fix code comment (#834) |
|
Fix comment is very brief and informal; |
2025-09-17 |
| COMMIT |
0.10 |
More efficient angles computation in RoPE (#830) |
|
Short technical title only; no extended |
2025-09-16 |
| COMMIT |
0.10 |
rename eval method (#832) |
|
Single-line rename; typical human commit |
2025-09-16 |
| COMMIT |
0.10 |
Improve weight tying handling (#826) |
|
Two minimal bullet points; concise and d |
2025-09-14 |
| COMMIT |
0.10 |
main push to sync github ruleset |
|
Informal, terse, and brief; clearly huma |
2025-09-14 |
| COMMIT |
0.10 |
Added Apple Silicon GPU device update (#820) |
|
Template-driven commit with human 'Co-au |
2025-09-13 |
| COMMIT |
0.10 |
remove redundant next_cache (#817) |
|
Minimal, direct commit message; typical |
2025-09-11 |
| COMMIT |
0.10 |
Add defensive context trimming for multiturn (#815) |
|
Concise commit title and brief update me |
2025-09-10 |
| COMMIT |
0.10 |
Improve multiturn stopping condition (#814) |
|
Short, simple commit messages; typical h |
2025-09-10 |
| COMMIT |
0.10 |
Clarify Qwen3 notebook purpose (#812) |
|
Brief commit updates; template-driven bu |
2025-09-06 |
| COMMIT |
0.10 |
Add additional notes on debugging SSL issues (#810) |
|
Series of repeated 'update' messages; me |
2025-09-06 |
| COMMIT |
0.00 |
harded the link checker |
|
Terse human message with a typo ('harded |
2026-03-07 |
| COMMIT |
0.00 |
chore: Update outdated GitHub Actions versions (#951) |
|
Standard chore commit with minimal human |
2026-01-19 |
| COMMIT |
0.00 |
link GRPO notebook (#950) |
|
Brief, terse human message typical for G |
2026-01-18 |
| COMMIT |
0.00 |
Optional weight tying for Qwen3 and Llama3.2 pretraining (#9 |
|
Technical domain jargon and casual 'typo |
2026-01-14 |
| COMMIT |
0.00 |
Fix encoding of multiple preceding spaces in BPE tokenizer. |
|
Structured human PR with specific techni |
2026-01-10 |
| COMMIT |
0.00 |
Chapter 5 with alternative LLMs (Qwen3, Llama 3) (#943) |
|
Human PR with iterative technical fixes |
2026-01-09 |
| COMMIT |
0.00 |
Correct batch_idx in appendix A logging (#942) |
|
Technical human fix with specific batch_ |
2026-01-08 |
| COMMIT |
0.00 |
Fix Olmo3 YaRN RoPE implementation bug (#940) |
|
Specific technical bug fix with model na |
2026-01-04 |
| COMMIT |
0.00 |
Correct 'pix' to 'pixi' in README.md (#935) |
|
Brief human typo correction commit with |
2026-01-02 |
| COMMIT |
0.00 |
Clean up native-uv.md documentation (#938) |
|
Minimal human-style documentation cleanu |
2026-01-02 |
| COMMIT |
0.00 |
Fix GitHub CI timeout issue for link checker (#937) |
|
Specific CI troubleshooting with iterati |
2026-01-02 |
| COMMIT |
0.00 |
Olmo 3 from scratch (#914) |
|
Contains generic commit titles with mini |
2025-11-23 |
| COMMIT |
0.00 |
RoPE decay plot (#910) |
|
Simple, repetitive commit titles, showin |
2025-11-17 |
| COMMIT |
0.00 |
Write-up on how to get the most out of this book (#909) |
|
Minimal description, typical of a simple |
2025-11-13 |
| COMMIT |
0.00 |
fix(GatedDeltaNet): Init param A from log of a uniform distr |
|
Concise, direct commit-message style wit |
2025-11-09 |
| COMMIT |
0.00 |
Use consistent title case |
|
Very brief, action-oriented phrase, typi |
2025-11-06 |
| COMMIT |
0.00 |
Fix empty device issue (#904) |
|
Direct fix description, common in human |
2025-11-06 |
| COMMIT |
0.00 |
Image resizing |
|
Extremely brief, generic title, likely h |
2025-11-03 |
| COMMIT |
0.00 |
Gated DeltaNet write-up (#901) |
|
Mix of simple titles and mundane commit |
2025-11-03 |
| COMMIT |
0.00 |
Use figure numbers in ch05-7 (#881) |
|
Terse technical commit messages with dom |
2025-10-13 |
| COMMIT |
0.00 |
Add alternative attention structure (#880) |
|
Brief, informal title typical of GitHub |
2025-10-13 |
| COMMIT |
0.00 |
sliding window attention (#879) |
|
Minimal technical title without AI hallm |
2025-10-13 |
| COMMIT |
0.00 |
Add other appendices for completeness (#878) |
|
Template-structured PR with minimal, ter |
2025-10-13 |
| COMMIT |
0.00 |
rm plot |
|
Extremely terse command-like commit mess |
2025-10-12 |
| COMMIT |
0.00 |
Multi-Head Latent Attention (#876) |
|
Template-structured PR with brief techni |
2025-10-12 |
| COMMIT |
0.00 |
Use GB instead of GiB consistently (#875) |
|
Concise technical specification change. |
2025-10-11 |
| COMMIT |
0.00 |
Grouped-Query Attention memory (#874) |
|
Template PR with action-oriented, concis |
2025-10-11 |
| COMMIT |
0.00 |
Use inference_device |
|
Very brief, technical command-like phras |
2025-10-09 |
| COMMIT |
0.00 |
Add simpler BPE, and make previous BPE better (#870) |
|
Template-structured PR with brief techni |
2025-10-09 |
| COMMIT |
0.00 |
Qwen3 and evaluation bonus materials (#869) |
|
Title and branch naming show informal, h |
2025-10-08 |
| COMMIT |
0.00 |
Switch from urllib to requests to improve reliability (#867) |
|
Brief, terse commit messages with minima |
2025-10-07 |
| COMMIT |
0.00 |
Add missing comma in imports in README (#865) |
|
Short, direct title for a minor document |
2025-10-06 |
| COMMIT |
0.00 |
Note about output dimensions (#862) |
|
Succinct title about documentation lacks |
2025-10-01 |
| COMMIT |
0.00 |
Update ollama address (#861) |
|
Title is a simple, straightforward updat |
2025-10-01 |
| COMMIT |
0.00 |
some typo fixes (#858) |
|
Informal 'some typo fixes' and specific |
2025-09-30 |
| COMMIT |
0.00 |
Test dependencies with Python 3.13 (#843) |
|
Title is direct technical work; commit m |
2025-09-27 |
| COMMIT |
0.00 |
Update generate script (#847) |
|
Title and iterative update commits show |
2025-09-27 |
| COMMIT |
0.00 |
Numerically stable generate on mps (#849) |
|
Technical title and brief, specific comm |
2025-09-27 |
| COMMIT |
0.00 |
Requirements update (#851) |
|
Title and commit about 'tricker workers' |
2025-09-27 |
| PR |
0.00 |
fix: pin 1 unpinned action(s) |
|
— |
2026-03-26 |
| PR |
0.00 |
fix: `qwen3.5-plus-kv-cache.ipynb` to use KVCache |
|
— |
2026-03-17 |
| PR |
0.00 |
[Invalid] Mistake |
|
— |
2026-03-08 |
| PR |
0.00 |
Minor typo fix |
|
— |
2026-03-07 |
| PR |
0.00 |
Bpe whitespace fixes |
|
— |
2026-03-07 |
| PR |
0.00 |
Qwen3.5 from scratch |
|
— |
2026-03-03 |
| PR |
0.00 |
Jupyter scrolling glitch tips |
|
— |
2026-02-27 |
| PR |
0.00 |
Add Tiny Aya from scratch |
|
— |
2026-02-19 |
| PR |
0.00 |
Remove redundant model assignment |
|
— |
2026-02-19 |
| PR |
0.00 |
Use correct input in layernorm example |
|
— |
2026-02-19 |
| PR |
0.00 |
Readability and code quality improvements |
|
— |
2026-02-18 |
| PR |
0.00 |
Feature/tool calling experiment |
|
— |
2026-02-15 |
| PR |
0.00 |
Fix flex attention in PyTorch 2.10 |
|
— |
2026-02-09 |
| PR |
0.00 |
Download |
|
— |
2026-02-01 |
| PR |
0.00 |
Fix docstring parameter names in compute_dpo_loss function |
|
— |
2026-01-29 |
| PR |
0.00 |
Update unit tests for CI |
|
— |
2026-01-27 |
| PR |
0.00 |
chore: Update outdated GitHub Actions versions |
|
— |
2026-01-19 |
| PR |
0.00 |
Link GRPO notebook |
|
— |
2026-01-18 |
| PR |
0.00 |
Optional weight tying for Qwen3 and Llama3.2 pretraining |
|
— |
2026-01-14 |
| PR |
0.00 |
Suggestion: Add quick start example for training and inferen |
|
— |
2026-01-11 |
| PR |
0.00 |
Fix encoding of multiple preceding spaces in BPE tokenizer. |
|
— |
2026-01-10 |
| PR |
0.00 |
Chapter 5 with alternative LLMs (Qwen3, Llama 3) |
|
— |
2026-01-09 |
| PR |
0.00 |
Correct batch_idx in appendix A logging |
|
— |
2026-01-05 |
| PR |
0.00 |
Fix Olmo3 YaRN RoPE implementation bug |
|
— |
2026-01-03 |
| PR |
0.00 |
Correct 'pix' to 'pixi' in README.md |
|
— |
2026-01-02 |
| PR |
0.00 |
Clean up native-uv.md documentation |
|
— |
2026-01-02 |
| PR |
0.00 |
Fix GitHub CI timeout issue for link checker |
|
— |
2026-01-02 |
| PR |
0.00 |
Cover Python 3.12 |
|
— |
2025-12-27 |
| PR |
0.00 |
Add some appendix E runtimes |
|
— |
2025-12-19 |
| PR |
0.00 |
Added comments in class SimpleTokenizerV1 |
|
— |
2025-11-27 |
| PR |
0.00 |
Gated DeltaNet updates |
|
— |
2025-12-17 |
| PR |
0.00 |
Optimized KV Cache (sliding window) bug fix |
|
— |
2025-12-15 |
| PR |
0.00 |
Sidd |
|
— |
2025-12-03 |
| PR |
0.00 |
GatedDeltaNet code: Initialize A as the log of a uniform dis |
|
— |
2025-11-08 |
| PR |
0.00 |
Remove persistent flag from cache buffers |
|
— |
2025-11-24 |
| PR |
0.00 |
Add Olmo 3 README |
|
— |
2025-11-23 |
| PR |
0.00 |
Olmo 3 from scratch |
|
— |
2025-11-23 |
| PR |
0.00 |
Fix empty device issue |
|
— |
2025-11-06 |
| PR |
0.00 |
RoPE decay plot |
|
— |
2025-11-17 |
| PR |
0.00 |
Write-up on how to get the most out of this book |
|
— |
2025-11-13 |
| PR |
0.00 |
n_heads × d_head -> d_head × d_head in DeltaNet |
|
— |
2025-11-06 |
| PR |
0.00 |
Gated DeltaNet write-up |
|
— |
2025-11-03 |
| PR |
0.00 |
Training on MPS in PyTorch 2.9 |
|
— |
2025-11-01 |
| PR |
0.00 |
Add bonus dependencies to pyproject |
|
— |
2025-10-29 |
| PR |
0.00 |
Fix MHAEinsum weight dimension bug when d_in != d_out (#857) |
|
— |
2025-10-22 |
| PR |
0.00 |
Simplify uv command |
|
— |
2025-11-01 |
| PR |
0.00 |
some typo fixes |
|
— |
2025-09-30 |
| PR |
0.00 |
docs(MoE): minor links fix |
|
— |
2025-10-20 |
| PR |
0.00 |
Fix ffn link |
|
— |
2025-10-22 |
| PR |
0.00 |
Style consistency update |
|
— |
2025-10-21 |
| PR |
0.00 |
Mixture-of-Experts intro |
|
— |
2025-10-20 |
| PR |
0.00 |
Make it easier to toggle between thinking and instruct varia |
|
— |
2025-10-17 |
| PR |
0.00 |
Update the compression rate comment in MLA |
|
— |
2025-10-14 |
| PR |
0.00 |
Use figure numbers in ch05-7 |
|
— |
2025-10-13 |
| PR |
0.00 |
Add alternative attention structure |
|
— |
2025-10-13 |
| PR |
0.00 |
Sliding window attention |
|
— |
2025-10-13 |
| PR |
0.00 |
Add other appendices for completeness |
|
— |
2025-10-12 |
| PR |
0.00 |
Multi-Head Latent Attention |
|
— |
2025-10-12 |
| PR |
0.00 |
Use GB instead of GiB consistently |
|
— |
2025-10-11 |
| PR |
0.00 |
Grouped-Query Attention memory |
|
— |
2025-10-11 |
| PR |
0.00 |
Add simpler BPE, and make previous BPE better |
|
— |
2025-10-09 |
| PR |
0.00 |
Qwen3 and evaluation bonus materials |
|
— |
2025-10-08 |
| PR |
0.00 |
Switch from urllib to requests to improve reliability |
|
— |
2025-10-07 |
| PR |
0.00 |
Update Docker file |
|
— |
2025-10-06 |
| PR |
0.00 |
Add missing comma in imports in README |
|
— |
2025-10-06 |
| PR |
0.00 |
docs: Auto-translate README and Wiki |
|
— |
2025-10-05 |
| PR |
0.00 |
Fix consistency in the definition of `MultiHeadAttentionWrap |
|
— |
2025-09-30 |
| PR |
0.00 |
Note about output dimensions d_out in ch03 |
|
— |
2025-10-01 |
| PR |
0.00 |
Update ollama address |
|
— |
2025-10-01 |
| PR |
0.00 |
Test dependencies with Python 3.13 |
|
— |
2025-09-23 |
| PR |
0.00 |
Update generate script |
|
— |
2025-09-27 |
| PR |
0.00 |
Numerically stable generate on mps |
|
— |
2025-09-27 |
| PR |
0.00 |
Requirements update |
|
— |
2025-09-27 |
| PR |
0.00 |
Requirements update |
|
— |
2025-09-27 |
| PR |
0.00 |
Windows compile |
|
— |
2025-09-26 |
| PR |
0.00 |
`Qwen3Tokenizer` fix for Qwen3 Base models and generation mi |
|
— |
2025-09-15 |
| PR |
0.00 |
Update package dependencies |
|
— |
2025-09-22 |
| PR |
0.00 |
Improve MoE implementation |
|
— |
2025-09-22 |
| PR |
0.00 |
Note about devcontainer root usage |
|
— |
2025-09-16 |
| PR |
0.00 |
Note about RoPE usage |
|
— |
2025-09-20 |
| PR |
0.00 |
fix code comment |
|
— |
2025-09-17 |
| PR |
0.00 |
More efficient angles computation in RoPE |
|
— |
2025-09-16 |
| PR |
0.00 |
Rename eval method |
|
— |
2025-09-16 |
| PR |
0.00 |
Improve weight tying handling |
|
— |
2025-09-14 |
| PR |
0.00 |
Add LoRA scaling |
|
— |
2025-09-14 |
| PR |
0.00 |
Added Apple Silicon GPU device update |
|
— |
2025-09-13 |
| PR |
0.00 |
Added Apple Silicon GPU device |
|
— |
2025-09-13 |
| PR |
0.00 |
fix: correct role of the beta hyperparameter on the DPO loss |
|
— |
2025-09-13 |
| PR |
0.00 |
Remove redundant next_cache |
|
— |
2025-09-11 |
| PR |
0.00 |
Add defensive context trimming for multiturn |
|
— |
2025-09-10 |
| PR |
0.00 |
Improve multiturn stopping condition |
|
— |
2025-09-09 |
| PR |
0.00 |
Update ch02.ipynb |
|
— |
2024-10-17 |
| PR |
0.00 |
Clarify Qwen3 notebook purpose |
|
— |
2025-09-06 |
| PR |
0.00 |
Add additional notes on debugging SSL issues |
|
— |
2025-09-06 |
| PR |
0.00 |
Fix IMDb spelling |
|
— |
2025-09-06 |
| PR |
0.00 |
Update code dependencies |
|
— |
2025-09-05 |
| PR |
0.00 |
Update requirements for Intel Macs |
|
— |
2025-09-04 |
| PR |
0.00 |
Interactive qwen3 chat interface |
|
— |
2025-09-02 |
| PR |
0.00 |
Update requirements for Intel macOS |
|
— |
2025-09-03 |
| PR |
0.00 |
added brief explanations about 2 different ways of RoPE imp |
|
— |
2025-09-02 |