While Anki is a powerful tool for spaced repetition, its true potential is unlocked when we treat it as a database
rather than just an app. Under the hood, an .apkg file is essentially a ZIP archive containing a SQLite database and
media files.
This post explores how to use database approach, Python, and the anki library to programmatically extract, read, and
manipulate Anki decks, using code driving the creation of Anki decks hosted on language.openml.io
Each Anki .apkg file is essentially a SQLite database. Exploring it with a SQLite client is a
great way to understand how Anki data is structured. Interested in language and data, we find the internal schema quite
interesting - it is a classic example of a “long-lived” database schema with some historical quirks (like using epoch
milliseconds for primary keys).
An Anki deck package (.apkg) is just a ZIP archive with a custom extension. It typically contains:
collection.anki2/collection.anki21/collection.anki21b: The primary SQLite 3 database file.
media: A JSON file mapping numeric filenames to their original file names.
0, 1, 2, etc.: Renamed media files (images/audio) corresponding to the keys in the media JSON.
DB Browser for SQLite is recommended as an UI option. It is open-source, lightweight,
and handles Anki’s specific SQLite format well. We can install it via Homebrew Cask if we are on Mac:
Terminal window
1
brewinstall--caskdb-browser-for-sqlite
A terminal option for someone comfortable with CLI would be sqlite3. Some systems, like Mac, already has it
pre-installed
Copy the file: Create a copy of your .apkg file to avoid corrupting your actual deck.
Rename: Change the file extension from .apkg to .zip.
Terminal window
1
mvMyDeck.apkgMyDeck.zip
Unzip: Extract the contents.
Terminal window
1
unzipMyDeck.zip-dMyDeck_Extracted
Locate the DB: Inside the new folder, we will see a file named collection.anki2 or collection.anki21. This is
the SQLite database.
TIP
A collection.anki21 file is a newer format used by recent Anki versions, though the schema is largely backward
compatible. Most .apkg exports still default to the anki2 standard for compatibility.
Starting with Anki 2.1.50+, the software introduced a new, more efficient database schema and compression format.
However, to prevent older versions of Anki (which expect a specific SQLite schema) from crashing when they try to import
these new files, Anki includes a valid - but fake - collection.anki2 SQLite database, which contains exactly one
warning message: Please update to the latest Anki version, then import the .colpkg/.apkg file again.. In this case,
we likely have a file named collection.anki21b alone side with collection.anki2 and we must pick up the former.
The b in collection.anki21b stands for binary/compressed. This is the SQLite database compressed with Zstandard
(Zstd). We need to decompress it before SQLite can read it.
Although SQLite database approach offers raw read speed, when data integrity and transformation
mandates far more abstractions than SQL queries especially in the case of modifying or writing to .apkg files.
Specifically
Checksums & Schema: Anki calculates checksums for fields and notes. If we modify the SQLite flds column
directly without updating the checksums and modification times (mod column), the Anki desktop client may treat the
database as corrupt or refuse to sync it.
Field Parsing: Raw SQLite gives us a string of fields separated by 0x1f. A higher-level library would give us a
structured object note['Front'], handling the parsing for us.
Media Management: If our programmatic decks generates static images (such as SVGs), we need to register them in
the media file and zip them correctly. A higher-level library shall handle this via for example
col.media.add_file.
Such higher-level library is the official anki Python library (available via
pip install aqt anki), which is actually the core backend of the Anki Desktop application. The Python bindings are
located in pylib/anki within that library. It is not a separate “SDK” product which is why we won’t find a neat
“ReadTheDocs” page for it. The de-facto documentation is blended into the Writing Anki Add-ons Guide.
Since add-ons use the exact same internal Python API we are about to discuss, this would be our best reference.
The first step in any Anki automation pipeline is accessing the raw files. An Anki package (.apkg) is just a ZIP file
with a custom extension. We can use Python’s standard zipfile module to unzip it into a temporary directory for
processing:
Extracts the contents of an .apkg (zip) file to a target directory.
7
8
Args:
9
apkg_path: Full path to the .apkg file.
10
target_dir: Directory where contents should be extracted.
11
"""
12
ifnot os.path.exists(apkg_path):
13
raiseFileNotFoundError(f"Anki package not found: {apkg_path}")
14
15
with zipfile.ZipFile(apkg_path,'r')as z:
16
z.extractall(target_dir)
Once extracted, we will typically find a file named collection.anki2 (or collection.anki21 in newer versions). This
is the SQLite database containing all our cards, notes, and deck configurations.
TIP
To clean up the extracted files after we are done, it is convenient to create a temporary directory for this specific
deck’s extraction:
To interact with the database safely, we use the official anki library (specifically anki.collection.Collection).
Directly modifying the SQLite file is possible but risky, as it bypasses Anki’s internal logic for checksums and
scheduling.
The load_collection_from_dir function demonstrates how to initialize the Collection object while handling version
differences:
Once the collection is loaded, we can iterate through notes and access their fields. Here are the key methods:
col.find_notes(""): Returns a list of IDs for every note in the collection. We can pass a query string (like
“deck”) to filter results.
col.get_note(nid): Retrieves the actual Note object.
note["Field Name"]: Accesses content like a dictionary.
TIP
The Field Name can be looked up and verified by opening the .apkg in the Anki desktop app, clicking “Browse”,
selecting a card, and looking at the field labels on the editing pane.
col.update_note(new_note_html): Update content of a note
Save the modified deck
To zips the directory with edited contents back into an .apkg file, we can do
source_dir: The directory containing the edited `.apkg` contents.
4
output_path: The path where new edited `.apkg` will be written to
5
6
print(f"Packaging {output_path}...")
7
with zipfile.ZipFile(output_path,'w', zipfile.ZIP_DEFLATED)as z:
8
for root, _, files in os.walk(source_dir):
9
for file in files:
10
file_path = os.path.join(root, file)
11
arcname = os.path.relpath(file_path, source_dir)
12
z.write(file_path, arcname)
Anki fields often contain HTML (like <div> or <br>). When processing data programmatically, we usually need a
normalization step by stripping tags and common particles:
1
defnormalize_text(text:str)->str:
2
# Remove HTML wrappers often added by Anki's editor
3
text = text.replace("<div>","").replace("</div>","").replace("<br>","")
4
5
# Normalize case and whitespace
6
text = text.lower().strip()
7
8
return text
By wrapping the anki library in Python scripts, we can transform static flashcards into dynamic data sources. This
approach allows for advanced use cases like:
Data transformation
Bulk Updates: Fixing typos or updating formatting across thousands of cards instantly.
Data Analysis: Exporting review history to visualizations tools.