Skip to content

Command Line Interface Guide

This guide explains how to use Extralit's command line interface (CLI) to manage your data extraction projects. The CLI provides tools for every step of the extraction workflow, from setting up projects to exporting final data.

Getting Started

Installation

Install Extralit and its CLI using pip:

pip install extralit

For specific features, you can install additional dependencies:

# For OCR and PDF processing
pip install "extralit-server[ocr,pdf]"

# For LLM-based extraction
pip install "extralit-server[llm]"

Authentication

Before using Extralit, authenticate with your server:

extralit login --api-url http://your-extralit-server --api-key your-api-key

You can verify your authentication status:

extralit whoami

To log out:

extralit logout

Environment Setup

The CLI uses these environment variables: - ARGILLA_API_URL: Your Extralit server URL - ARGILLA_API_KEY: Your API key

Configuration is stored in ~/.extralit/credentials.json.

Shell Completion

Enable command completion for easier CLI use:

extralit --install-completion

After restarting your shell, press [Tab][Tab] after typing extralit for command suggestions.

Project Setup and Management

Creating a Workspace

Start by creating a workspace to organize your extraction project:

extralit workspaces create --name my-workspace --description "Description of your extraction project"

Managing Team Access

Add team members to collaborate on the project:

# Add a user with annotator role
extralit workspaces --name my-workspace add-user --username user1 --role annotator

# Remove a user if needed
extralit workspaces --name my-workspace delete-user --username user1

Viewing Available Workspaces

List all workspaces you have access to:

extralit workspaces list

Document Management

Importing Documents

Add scientific papers to your workspace:

extralit documents import --workspace my-workspace \
  --papers path/to/references.csv \
  --metadatas title,authors,year

The references CSV should include: - reference (required): Paper ID (e.g., author_year_firstword) - pmid (optional): PubMed ID - doi (optional): DOI - file_path (required): Path to PDF file - title, authors, year (optional): Metadata fields

Managing Documents

List documents in your workspace:

extralit documents list --workspace my-workspace

Add individual documents:

extralit documents add --workspace my-workspace \
  [--file paper.pdf] [--url https://...] \
  [--pmid 123456] [--doi 10.1234/...]

Delete a document:

extralit documents delete <document_id> --workspace my-workspace --force

Schema Management

Schemas define the structure of data to be extracted from papers. They specify fields, relationships, and validation rules.

Managing Schemas

Upload schemas to your workspace:

extralit schemas upload ./schemas --workspace my-workspace --overwrite

Download existing schemas:

extralit schemas download ./downloaded_schemas \
  --workspace my-workspace \
  [--name specific-schema] \
  [--exclude schema1,schema2]

List available schemas:

extralit schemas list --workspace my-workspace

Extraction Workflow

1. PDF Preprocessing

Process PDFs to extract text and detect tables:

extralit preprocessing run \
  --workspace my-workspace \
  --references ref1,ref2 \
  --text-ocr model1,model2 \
  --table-ocr model1,model2 \
  --output-dataset preprocessing-results

2. LLM-Based Extraction

Extract structured data using LLMs:

extralit extraction run \
  --workspace my-workspace \
  --references ref1,ref2 \
  --output-dataset extraction-results

3. Monitor Progress

Check extraction status:

extralit extraction status \
  --workspace my-workspace \
  --references ref1,ref2

4. Export Results

Export extracted data:

extralit extraction export \
  --workspace my-workspace \
  --output extracted_data.csv

Dataset Operations

Managing Datasets

Create a new dataset:

extralit datasets --workspace my-workspace create \
  --name my-dataset \
  --description "Dataset description"

List datasets:

extralit datasets --workspace my-workspace list

Delete a dataset:

extralit datasets --workspace my-workspace delete --name my-dataset

Sharing Datasets

Push to HuggingFace Hub:

extralit datasets --workspace my-workspace push-to-hub \
  --name my-dataset \
  --repo-id username/repo-name

File Management

Managing Files

List files:

extralit files list --workspace my-workspace [--path documents/]

Upload files:

extralit files upload data.csv \
  --workspace my-workspace \
  --remote-path data/data.csv

Download files:

extralit files download data/data.csv \
  --workspace my-workspace \
  --output ./local_data.csv

Delete files:

extralit files delete data/data.csv --workspace my-workspace --force

User Management

Managing Users

List users:

extralit users list

Create a new user:

extralit users create \
  --username user1 \
  --password password123 \
  --full-name "User One" \
  --email [email protected]

Delete a user:

extralit users delete --username user1

Training Models

Train extraction models:

# Basic training
extralit training --name my-dataset --framework spacy --model en_core_web_sm

# Advanced options
extralit training \
  --name my-dataset \
  --framework transformers \
  --model bert-base-uncased \
  --workspace my-workspace \
  --train-size 0.8 \
  --seed 42 \
  --device 0 \
  --output-dir models/my-model

Troubleshooting

Authentication Issues

If you have authentication problems: 1. Verify your API key is correct 2. Check the server URL is accessible 3. Try logging out and back in

Command Failures

If a command fails: 1. Confirm you're authenticated 2. Check you have necessary permissions 3. Verify resources exist 4. Add --debug flag for detailed error information

Getting Help

For any command, add --help to see detailed usage information:

extralit <command> --help

For server information:

extralit info