Developer Guide¶
As an Extralit developer, you are already part of the community, and your contribution is valuable to our development. This guide will help you set up your development environment and start contributing.
Extralit core components
-
Mkdocs Documentation: Extralit's documentation serves as an invaluable resource, providing a comprehensive and in-depth guide for both annotators and project admins to explore, understand, and effectively use the core components of the Extralit ecosystem.
-
Vue.js Web UI: A web application to visualize, extract and validate your data, users, and teams. It is built with
Vue.js
andNuxt.js
and is directly deployed alongside the Extralit Server within our Extralit Docker image. -
Python SDK: A Python SDK installable with
pip install extralit
to interact with the Extralit Server and the Extralit UI. It provides an API as well as a CLI to manage the data, configuration, and extraction workflows. -
FastAPI Server: The core of Extralit's back-end is a Python
FastAPI server
that manages the document extraction and data annotation lifecycle, as well as serving the Nuxt-built Web UI. It does so by interfacing with the relational database, text-search/vector database, file blob storage, and redis. It provides an REST API that interacts with the data from the Python SDK and the Extralit UI. It also provides a web interface to visualize the data. -
Relational Database: A relational database to store the data of the records, workspaces, and users.
PostgreSQL
is the preferred database option for persistent deployments, otherwisesqlite
can also be used for certain local development scenarios, such as testing or lightweight, single-user setups. -
File Blob Storage: A file storage system to store the documents and files associated with the records. It can be a local file system or a cloud-based storage solution like
Minio
orAmazon S3
. For local development, we use a local file system or self-hosted Minio, while for production deployments, we recommend using S3. -
Text Search Database: An indexed text search database to enable efficient searching and retrieval of data records. We currently support
Elasticsearch
for this purpose, which allows for full-text search capabilities and is integrated with the Extralit Server. When deployed in initiation, Elasticsearch copies and indexes all of the records data from the Relational Database. -
RAG Vector Database: A vector database to store the document content and perform scalable vector similarity searches, supporting RAG uses for LLM extraction. We currently support
Weaviate
but soon will add support to "Elasticsearch" to consolidate the dependencies.
Environment setup¶
Extralit offers a comprehensive guide for setting up your development environment. For detailed instructions, please refer to our Development Environment Setup Guide.
This guide covers everything you need to get started, including:
- Installing prerequisites
- Setting up Docker containers
- Configuring your development environment
- Running Extralit components locally
Once you have your environment set up, you can return to this guide to learn more about the specific components you want to contribute to.
The Extralit repository¶
The Extralit repository has a monorepo structure, which means that all the components are located in the same repository: extralit/extralit
. This repo is divided into the following folders:
extralit
: The FastAPI server project for extractionargilla/docs
: The documentation projectextralit
: The argilla SDK projectargilla-server
: The FastAPI server project for annotationargilla-frontend
: The Vue.js UI projectexamples
: Example resources for deployments, scripts and notebooks
How to contribute?
Before starting to develop, we recommend reading our contribution guide to understand the contribution process and the guidelines to follow. Once you have cloned the Extralit repository and checked out to the correct branch, you can start setting up your development environment.
Manual Setup: Alternative Development Options¶
If you prefer not to use Codespaces, you can set up your development environment manually using the following approaches.
Set up the Python environment¶
To work on the Extralit Python SDK, you must install the Extralit package on your system.
Create a virtual environment
We recommend creating a dedicated virtual environment for SDK development to prevent conflicts. For this, you can use the manager of your choice, such as venv
, conda
, pyenv
, or uv
.
From the root of the cloned Extralit repository, you should move to the extralit
folder in your terminal.
Next, activate your virtual environment and make the required installations:
# Install the `pdm` package manager
pip install pdm
# Install extralit in editable mode and the development dependencies
pdm install --dev
Linting and formatting¶
To maintain a consistent code format, install the pre-commit
hooks to run before each commit automatically.
In addition, run the following scripts to check the code formatting and linting:
Running tests¶
Running tests at the end of every development cycle is indispensable to ensure no breaking changes.
Running linting, formatting, and tests
You can run all the checks at once by using the following command:
Manual Kubernetes Setup¶
If you want to set up a local Kubernetes cluster manually:
- Install required tools:
- kubectl
- Tilt
- kind
-
Create a local Kubernetes cluster:
-
For local development with image registry:
-
Apply storage configurations:
-
Create namespace and deploy services:
-
Deploy with Tilt:
Set up the databases directly¶
If you prefer to run the databases directly without Kubernetes:
Vector database¶
# Extralit supports ElasticSearch versions >=8.5
docker run -d --name elasticsearch-for-extralit -p 9200:9200 -p 9300:9300 -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" -e "discovery.type=single-node" -e "xpack.security.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:8.5.3
Relational database¶
docker run -d --name postgres-for-extralit -p 5432:5432 -e POSTGRES_PASSWORD=postgres -e POSTGRES_USER=postgres -e POSTGRES_DB=postgres postgres:14
Set up the documentation¶
Documentation is essential to provide users with a comprehensive guide about Extralit.
From main
or develop
?
If you are updating, improving, or fixing the current documentation without a code change, work on the main
branch. For new features or bug fixes that require documentation, use the develop
branch.
To contribute to the documentation and generate it locally, ensure you installed the development dependencies as shown in the "Set up the Python environment" section, and run the following command to create the development server with mkdocs
:
Documentation guidelines¶
As mentioned, we use mkdocs
to build the documentation. You can write the documentation in markdown
format, and it will automatically be converted to HTML. In addition, you can include elements such as tables, tabs, images, and others, as shown in this guide. We recommend following these guidelines:
- Use clear and concise language: Ensure the documentation is easy to understand for all users by using straightforward language and including meaningful examples. Images are not easy to maintain, so use them only when necessary and place them in the appropriate folder within the
docs/assets/images
directory. - Verify code snippets: Double-check that all code snippets are correct and runnable.
- Review spelling and grammar: Check the spelling and grammar of the documentation.
- Update the table of contents: If you add a new page, include it in the relevant
index.md
or themkdocs.yml
file.
Contribute with a tutorial
You can also contribute a tutorial (.ipynb
) to the "Community" section. We recommend aligning the tutorial with the structure of the existing tutorials. For an example, check this tutorial.
Troubleshooting¶
Persistent Volume & Storage Classes¶
When using Kubernetes, persistent volume issues can occur:
- PVs might not be available when services are deployed, especially in kind
clusters
- PVC might bind to incorrect PVs depending on creation order
- For persistent storage issues, check the uncategorized
resource in Tilt
- Sometimes clearing /tmp/kind-volumes/
and restarting the cluster is needed
Deployment Issues¶
Common deployment problems:
- elasticsearch
: Can fail on restart due to data-shard issues
- main-db
Postgres: May fail to remount volumes after redeployment due to password changes
For support, join the Extralit Slack channel.