Exploring Programmatic Anki - OpenML's AI Blogs

While Anki is a powerful tool for spaced repetition, its true potential is unlocked when we treat it as a database rather than just an app. Under the hood, an .apkg file is essentially a ZIP archive containing a SQLite database and media files.

This post explores how to use database approach, Python, and the anki library to programmatically extract, read, and manipulate Anki decks, using code driving the creation of Anki decks hosted on language.openml.io

Anki Database#

Each Anki .apkg file is essentially a SQLite database. Exploring it with a SQLite client is a great way to understand how Anki data is structured. Interested in language and data, we find the internal schema quite interesting - it is a classic example of a “long-lived” database schema with some historical quirks (like using epoch milliseconds for primary keys).

An Anki deck package (.apkg) is just a ZIP archive with a custom extension. It typically contains:

collection.anki2/collection.anki21/collection.anki21b: The primary SQLite 3 database file.
media: A JSON file mapping numeric filenames to their original file names.
0, 1, 2, etc.: Renamed media files (images/audio) corresponding to the keys in the media JSON.

Installing a SQLite Client#

DB Browser for SQLite is recommended as an UI option. It is open-source, lightweight, and handles Anki’s specific SQLite format well. We can install it via Homebrew Cask if we are on Mac:

1
brew install --cask db-browser-for-sqlite

A terminal option for someone comfortable with CLI would be sqlite3. Some systems, like Mac, already has it pre-installed

Extracting Database#

Copy the file: Create a copy of your .apkg file to avoid corrupting your actual deck.
Rename: Change the file extension from .apkg to .zip.
Terminal window
```
1
mv MyDeck.apkg MyDeck.zip
```
Unzip: Extract the contents.
Terminal window
```
1
unzip MyDeck.zip -d MyDeck_Extracted
```
Locate the DB: Inside the new folder, we will see a file named collection.anki2 or collection.anki21. This is the SQLite database.

TIP
A collection.anki21 file is a newer format used by recent Anki versions, though the schema is largely backward compatible. Most .apkg exports still default to the anki2 standard for compatibility.

Starting with Anki 2.1.50+, the software introduced a new, more efficient database schema and compression format. However, to prevent older versions of Anki (which expect a specific SQLite schema) from crashing when they try to import these new files, Anki includes a valid - but fake - collection.anki2 SQLite database, which contains exactly one warning message: Please update to the latest Anki version, then import the .colpkg/.apkg file again.. In this case, we likely have a file named collection.anki21b alone side with collection.anki2 and we must pick up the former.

The b in collection.anki21b stands for binary/compressed. This is the SQLite database compressed with Zstandard (Zstd). We need to decompress it before SQLite can read it.

Install Zstd (if you don’t have it):
Terminal window
```
1
brew install zstd
```

Decompress the file:

1
zstd -d collection.anki21b -o collection.real.anki2

Query the new file:
Terminal window
```
1
sqlite3 collection.real.anki2
```
or point UI to collection.real.anki2

Querying Database#

Here are a few queries to run immediately to see readable data.

Get raw note content:

1
SELECT
2
    id,
3
    sfld AS sort_field,
4
    replace(flds, char(31), ' | ') AS fields_pipe_separated
5
FROM notes
6
LIMIT 10;

Join Cards and Notes to see what is due:

1
SELECT
2
    n.sfld AS Front_Text,
3
    c.ivl AS Interval_Days,
4
    c.reps AS Review_Count
5
FROM cards c
6
JOIN notes n ON c.nid = n.id
7
ORDER BY c.ivl DESC
8
LIMIT 10;

Anki Python#

Although SQLite database approach offers raw read speed, when data integrity and transformation mandates far more abstractions than SQL queries especially in the case of modifying or writing to .apkg files. Specifically

Checksums & Schema: Anki calculates checksums for fields and notes. If we modify the SQLite flds column directly without updating the checksums and modification times (mod column), the Anki desktop client may treat the database as corrupt or refuse to sync it.
Field Parsing: Raw SQLite gives us a string of fields separated by 0x1f. A higher-level library would give us a structured object note['Front'], handling the parsing for us.
Media Management: If our programmatic decks generates static images (such as SVGs), we need to register them in the media file and zip them correctly. A higher-level library shall handle this via for example col.media.add_file.

Such higher-level library is the official anki Python library (available via pip install aqt anki), which is actually the core backend of the Anki Desktop application. The Python bindings are located in pylib/anki within that library. It is not a separate “SDK” product which is why we won’t find a neat “ReadTheDocs” page for it. The de-facto documentation is blended into the Writing Anki Add-ons Guide. Since add-ons use the exact same internal Python API we are about to discuss, this would be our best reference.

Extracting the Data (`.apkg` as ZIP)#

The first step in any Anki automation pipeline is accessing the raw files. An Anki package (.apkg) is just a ZIP file with a custom extension. We can use Python’s standard zipfile module to unzip it into a temporary directory for processing:

1
import os
2
import zipfile
3

4
def extract_apkg(apkg_path: str, target_dir: str) -> None:
5
    """
6
    Extracts the contents of an .apkg (zip) file to a target directory.
7

8
    Args:
9
        apkg_path: Full path to the .apkg file.
10
        target_dir: Directory where contents should be extracted.
11
    """
12
    if not os.path.exists(apkg_path):
13
        raise FileNotFoundError(f"Anki package not found: {apkg_path}")
14

15
    with zipfile.ZipFile(apkg_path, 'r') as z:
16
        z.extractall(target_dir)

Once extracted, we will typically find a file named collection.anki2 (or collection.anki21 in newer versions). This is the SQLite database containing all our cards, notes, and deck configurations.

TIP
To clean up the extracted files after we are done, it is convenient to create a temporary directory for this specific deck’s extraction:
1
import tempfile
2

3
with tempfile.TemporaryDirectory() as temp_dir:
4
    extract_apkg(full_path, temp_dir)

Loading the Collection#

To interact with the database safely, we use the official anki library (specifically anki.collection.Collection). Directly modifying the SQLite file is possible but risky, as it bypasses Anki’s internal logic for checksums and scheduling.

The load_collection_from_dir function demonstrates how to initialize the Collection object while handling version differences:

1
import os
2
from anki.collection import Collection
3

4
def load_collection_from_dir(directory: str) -> Collection:
5
    """
6
    Identifies the correct SQLite file in the directory and initializes
7
    an Anki Collection object. Handles the legacy 'anki2' vs newer 'anki21' distinction.
8

9
    Args:
10
        directory: The directory containing the extracted .apkg contents.
11

12
    Returns:
13
        An initialized anki.collection.Collection object.
14
    """
15
    # Anki 2.1.50+ introduced collection.anki21.
16
    # If it exists, it is the source of truth. Otherwise, fallback to legacy collection.anki2.
17
    # We prioritize anki21 because anki2 might just be a "decoy" stub in newer exports.
18
    possible_dbs = ["collection.anki21", "collection.anki2"]
19

20
    db_path = None
21
    for db_name in possible_dbs:
22
        candidate = os.path.join(directory, db_name)
23
        if os.path.exists(candidate):
24
            db_path = candidate
25
            break
26

27
    if not db_path:
28
        raise FileNotFoundError(f"No valid Anki database found in {directory}")
29

30
    # Initialize the Collection.
31
    # Note: The anki library may create lock files or temporary journals in this directory.
32
    return Collection(db_path)

IMPORTANT
Don’t forget to close the collection to free up the resources when it’s no longer used:
1
collection.close()

Accessing Data (Notes and Fields)#

Once the collection is loaded, we can iterate through notes and access their fields. Here are the key methods:

col.find_notes(""): Returns a list of IDs for every note in the collection. We can pass a query string (like “deck”) to filter results.
col.get_note(nid): Retrieves the actual Note object.
note["Field Name"]: Accesses content like a dictionary.

TIP
The Field Name can be looked up and verified by opening the .apkg in the Anki desktop app, clicking “Browse”, selecting a card, and looking at the field labels on the editing pane.

col.update_note(new_note_html): Update content of a note

Save the modified deck

To zips the directory with edited contents back into an .apkg file, we can do

1
def create_deck_export(source_dir: str, output_path: str):
2
    Args:
3
        source_dir: The directory containing the edited `.apkg` contents.
4
        output_path: The path where new edited `.apkg` will be written to
5

6
    print(f"Packaging {output_path}...")
7
    with zipfile.ZipFile(output_path, 'w', zipfile.ZIP_DEFLATED) as z:
8
        for root, _, files in os.walk(source_dir):
9
            for file in files:
10
                file_path = os.path.join(root, file)
11
                arcname = os.path.relpath(file_path, source_dir)
12
                z.write(file_path, arcname)

Anki fields often contain HTML (like <div> or <br>). When processing data programmatically, we usually need a normalization step by stripping tags and common particles:

1
def normalize_text(text: str) -> str:
2
    # Remove HTML wrappers often added by Anki's editor
3
    text = text.replace("<div>", "").replace("</div>", "").replace("<br>", "")
4

5
    # Normalize case and whitespace
6
    text = text.lower().strip()
7

8
    return text

By wrapping the anki library in Python scripts, we can transform static flashcards into dynamic data sources. This approach allows for advanced use cases like:

Data transformation
Bulk Updates: Fixing typos or updating formatting across thousands of cards instantly.
Data Analysis: Exporting review history to visualizations tools.

Anki Database#

Installing a SQLite Client#

Extracting Database#

Querying Database#

Anki Python#

Extracting the Data (.apkg as ZIP)#

Loading the Collection#

Accessing Data (Notes and Fields)#

Extracting the Data (`.apkg` as ZIP)#