1349 words
7 minutes
Exploring Programmatic Anki

While Anki is a powerful tool for spaced repetition, its true potential is unlocked when we treat it as a database rather than just an app. Under the hood, an .apkg file is essentially a ZIP archive containing a SQLite database and media files.

This post explores how to use database approach, Python, and the anki library to programmatically extract, read, and manipulate Anki decks, using code driving the creation of Anki decks hosted on language.openml.io

Anki Database#

Each Anki .apkg file is essentially a SQLite database. Exploring it with a SQLite client is a great way to understand how Anki data is structured. Interested in language and data, we find the internal schema quite interesting - it is a classic example of a “long-lived” database schema with some historical quirks (like using epoch milliseconds for primary keys).

An Anki deck package (.apkg) is just a ZIP archive with a custom extension. It typically contains:

  • collection.anki2/collection.anki21/collection.anki21b: The primary SQLite 3 database file.
  • media: A JSON file mapping numeric filenames to their original file names.
  • 0, 1, 2, etc.: Renamed media files (images/audio) corresponding to the keys in the media JSON.

Installing a SQLite Client#

DB Browser for SQLite is recommended as an UI option. It is open-source, lightweight, and handles Anki’s specific SQLite format well. We can install it via Homebrew Cask if we are on Mac:

Terminal window
brew install --cask db-browser-for-sqlite

A terminal option for someone comfortable with CLI would be sqlite3. Some systems, like Mac, already has it pre-installed

Extracting Database#

  1. Copy the file: Create a copy of your .apkg file to avoid corrupting your actual deck.

  2. Rename: Change the file extension from .apkg to .zip.

    Terminal window
    mv MyDeck.apkg MyDeck.zip
  3. Unzip: Extract the contents.

    Terminal window
    unzip MyDeck.zip -d MyDeck_Extracted
  4. Locate the DB: Inside the new folder, we will see a file named collection.anki2 or collection.anki21. This is the SQLite database.

TIP

A collection.anki21 file is a newer format used by recent Anki versions, though the schema is largely backward compatible. Most .apkg exports still default to the anki2 standard for compatibility.

Starting with Anki 2.1.50+, the software introduced a new, more efficient database schema and compression format. However, to prevent older versions of Anki (which expect a specific SQLite schema) from crashing when they try to import these new files, Anki includes a valid - but fake - collection.anki2 SQLite database, which contains exactly one warning message: Please update to the latest Anki version, then import the .colpkg/.apkg file again.. In this case, we likely have a file named collection.anki21b alone side with collection.anki2 and we must pick up the former.

The b in collection.anki21b stands for binary/compressed. This is the SQLite database compressed with Zstandard (Zstd). We need to decompress it before SQLite can read it.

  1. Install Zstd (if you don’t have it):

    Terminal window
    brew install zstd
  2. Decompress the file:

    Terminal window
    zstd -d collection.anki21b -o collection.real.anki2
  3. Query the new file:

    Terminal window
    sqlite3 collection.real.anki2

    or point UI to collection.real.anki2

Querying Database#

Here are a few queries to run immediately to see readable data.

Get raw note content:

SELECT
id,
sfld AS sort_field,
replace(flds, char(31), ' | ') AS fields_pipe_separated
FROM notes
LIMIT 10;

Join Cards and Notes to see what is due:

SELECT
n.sfld AS Front_Text,
c.ivl AS Interval_Days,
c.reps AS Review_Count
FROM cards c
JOIN notes n ON c.nid = n.id
ORDER BY c.ivl DESC
LIMIT 10;

Anki Python#

Although SQLite database approach offers raw read speed, when data integrity and transformation mandates far more abstractions than SQL queries especially in the case of modifying or writing to .apkg files. Specifically

  1. Checksums & Schema: Anki calculates checksums for fields and notes. If we modify the SQLite flds column directly without updating the checksums and modification times (mod column), the Anki desktop client may treat the database as corrupt or refuse to sync it.
  2. Field Parsing: Raw SQLite gives us a string of fields separated by 0x1f. A higher-level library would give us a structured object note['Front'], handling the parsing for us.
  3. Media Management: If our programmatic decks generates static images (such as SVGs), we need to register them in the media file and zip them correctly. A higher-level library shall handle this via for example col.media.add_file.

Such higher-level library is the official anki Python library (available via pip install aqt anki), which is actually the core backend of the Anki Desktop application. The Python bindings are located in pylib/anki within that library. It is not a separate “SDK” product which is why we won’t find a neat “ReadTheDocs” page for it. The de-facto documentation is blended into the Writing Anki Add-ons Guide. Since add-ons use the exact same internal Python API we are about to discuss, this would be our best reference.

Extracting the Data (.apkg as ZIP)#

The first step in any Anki automation pipeline is accessing the raw files. An Anki package (.apkg) is just a ZIP file with a custom extension. We can use Python’s standard zipfile module to unzip it into a temporary directory for processing:

import os
import zipfile
def extract_apkg(apkg_path: str, target_dir: str) -> None:
"""
Extracts the contents of an .apkg (zip) file to a target directory.
Args:
apkg_path: Full path to the .apkg file.
target_dir: Directory where contents should be extracted.
"""
if not os.path.exists(apkg_path):
raise FileNotFoundError(f"Anki package not found: {apkg_path}")
with zipfile.ZipFile(apkg_path, 'r') as z:
z.extractall(target_dir)

Once extracted, we will typically find a file named collection.anki2 (or collection.anki21 in newer versions). This is the SQLite database containing all our cards, notes, and deck configurations.

TIP

To clean up the extracted files after we are done, it is convenient to create a temporary directory for this specific deck’s extraction:

import tempfile
with tempfile.TemporaryDirectory() as temp_dir:
extract_apkg(full_path, temp_dir)

Loading the Collection#

To interact with the database safely, we use the official anki library (specifically anki.collection.Collection). Directly modifying the SQLite file is possible but risky, as it bypasses Anki’s internal logic for checksums and scheduling.

The load_collection_from_dir function demonstrates how to initialize the Collection object while handling version differences:

import os
from anki.collection import Collection
def load_collection_from_dir(directory: str) -> Collection:
"""
Identifies the correct SQLite file in the directory and initializes
an Anki Collection object. Handles the legacy 'anki2' vs newer 'anki21' distinction.
Args:
directory: The directory containing the extracted .apkg contents.
Returns:
An initialized anki.collection.Collection object.
"""
# Anki 2.1.50+ introduced collection.anki21.
# If it exists, it is the source of truth. Otherwise, fallback to legacy collection.anki2.
# We prioritize anki21 because anki2 might just be a "decoy" stub in newer exports.
possible_dbs = ["collection.anki21", "collection.anki2"]
db_path = None
for db_name in possible_dbs:
candidate = os.path.join(directory, db_name)
if os.path.exists(candidate):
db_path = candidate
break
if not db_path:
raise FileNotFoundError(f"No valid Anki database found in {directory}")
# Initialize the Collection.
# Note: The anki library may create lock files or temporary journals in this directory.
return Collection(db_path)
IMPORTANT

Don’t forget to close the collection to free up the resources when it’s no longer used:

collection.close()

Accessing Data (Notes and Fields)#

Once the collection is loaded, we can iterate through notes and access their fields. Here are the key methods:

  • col.find_notes(""): Returns a list of IDs for every note in the collection. We can pass a query string (like “deck”) to filter results.

  • col.get_note(nid): Retrieves the actual Note object.

  • note["Field Name"]: Accesses content like a dictionary.

    TIP

    The Field Name can be looked up and verified by opening the .apkg in the Anki desktop app, clicking “Browse”, selecting a card, and looking at the field labels on the editing pane.

  • col.update_note(new_note_html): Update content of a note

    Save the modified deck

    To zips the directory with edited contents back into an .apkg file, we can do

    def create_deck_export(source_dir: str, output_path: str):
    Args:
    source_dir: The directory containing the edited `.apkg` contents.
    output_path: The path where new edited `.apkg` will be written to
    print(f"Packaging {output_path}...")
    with zipfile.ZipFile(output_path, 'w', zipfile.ZIP_DEFLATED) as z:
    for root, _, files in os.walk(source_dir):
    for file in files:
    file_path = os.path.join(root, file)
    arcname = os.path.relpath(file_path, source_dir)
    z.write(file_path, arcname)

Anki fields often contain HTML (like <div> or <br>). When processing data programmatically, we usually need a normalization step by stripping tags and common particles:

def normalize_text(text: str) -> str:
# Remove HTML wrappers often added by Anki's editor
text = text.replace("<div>", "").replace("</div>", "").replace("<br>", "")
# Normalize case and whitespace
text = text.lower().strip()
return text

By wrapping the anki library in Python scripts, we can transform static flashcards into dynamic data sources. This approach allows for advanced use cases like:

  • Data transformation
  • Bulk Updates: Fixing typos or updating formatting across thousands of cards instantly.
  • Data Analysis: Exporting review history to visualizations tools.
Exploring Programmatic Anki
https://blogs.openml.io/posts/anki/
Author
OpenML Blogs
Published at
2025-11-26
License
CC BY-NC-SA 4.0