Open Core
Built on transparent, peer-reviewable foundations
Soak is built on soaking, an open-source Python library for LLM-assisted qualitative analysis. The core algorithms, pipeline execution, and analytical methods are freely available under the AGPL licence -- you can inspect every line of code, validate our methods, and extend them for your own research.
Why open core matters for research
Academic research demands transparency. When you use a tool to analyse qualitative data, you need to know exactly what it does. Black-box tools create methodological opacity -- reviewers can't assess your methods, and neither can you.
With an open-core approach, the algorithms that code your transcripts, extract themes, and verify quotes are fully documented and available for scrutiny. This isn't just good practice -- it's essential for reproducible research.
The soaking library
The open-source soaking package provides:
- DAG-based pipelines -- Define multi-stage analysis workflows with parallel execution
- Thematic analysis -- Zero-shot coding and theme extraction from transcripts
- Quote verification -- BM25 + embedding-based validation to detect hallucinations
- Classification -- Extract structured data with multi-model agreement metrics
- Comparison tools -- Semantic similarity and optimal transport for comparing analyses
- Ground truth validation -- Precision, recall, F1, and confusion matrices against labelled data
Install via pip and bring your own API key. Run analyses locally, on your institution's servers, or anywhere you need.
pip install soaking
View on GitHub · PyPI · Zenodo
No vendor lock-in
Your research data and analytical outputs belong to you. Everything you create in Soak can be exported and processed with the open-source library. If we disappeared tomorrow, your workflows would still work.
Pipeline definitions use the same YAML format as the open-source tool. Analysis outputs export to JSON, CSV, and HTML. Embeddings use standard sentence-transformer models. We've designed the system so you're never trapped.
What Soak adds
The web application builds on soaking to provide:
- Managed infrastructure -- No Python setup, API keys, or command line required
- Team collaboration -- Share analyses with colleagues, control who sees what
- Visual comparison -- Interactive heatmaps and Sankey diagrams for comparing analyses
- Progress tracking -- Watch pipeline execution in real time
- Cost management -- Monitor LLM usage and set budget limits
- EU data residency -- Documents stay in Europe by default
Think of it as the difference between git and GitHub. The core functionality is open; the convenience layer is what you're paying for.
Citing soak in your research
If you use soak (either the library or web application) in published research, please cite:
Ben Whalley. (2025). benwhalley/soak: Initial public release (v0.3.0). Zenodo. https://doi.org/10.5281/zenodo.17293023
Contributing
The soaking library welcomes contributions. Whether you've found a bug, want to add a node type, or have ideas for new comparison metrics -- the code is on GitHub and we'd like to hear from you.
For academic collaborations, methods papers, or validation studies, please get in touch directly.
Licence
The soaking Python library is released under the GNU Affero General Public License v3 (AGPL-3.0). This means you can use, modify, and distribute the code freely, provided that derivative works are also made available under the same licence.
The AGPL is specifically designed for network services: if you modify soaking and run it as a web service, you must make your modifications available. This ensures improvements benefit the research community, not just individual vendors.
soak