rg.Workspace
¶
In Argilla, workspaces are used to organize datasets in to groups. For example, you might have a workspace for each project or team.
Usage Examples¶
To create a new workspace, instantiate the Workspace
object with the client and the name:
To retrieve an existing workspace, use the client.workspaces
attribute:
File Operations¶
List Files¶
List files in a workspace.
# Get a workspace reference
workspace = client.workspaces(name="my-workspace")
# List files with optional parameters
files = workspace.list_files(path="", recursive=True, include_version=True)
# Access the files
for file in files.objects:
print(f"File: {file.object_name}, Size: {file.size}, Last Modified: {file.last_modified}")
Get File¶
Get a file from a workspace.
workspace = client.workspaces(name="my-workspace")
file_response = workspace.get_file(path="path/to/file.txt", version_id=None)
# Access the file content
content = file_response.content
# Access the file metadata
metadata = file_response.metadata
print(f"Content Type: {metadata.content_type}, ETag: {metadata.etag}")
Upload File¶
Upload a file to a workspace.
workspace = client.workspaces(name="my-workspace")
file_metadata = workspace.put_file(path="path/to/store/file.txt", file_path="/local/path/to/file.txt")
print(f"File uploaded: {file_metadata.object_name}")
Delete File¶
Delete a file from a workspace.
workspace = client.workspaces(name="my-workspace")
workspace.delete_file(path="path/to/file.txt", version_id=None)
Document Operations¶
Add Document¶
Add a document to a workspace.
workspace = client.workspaces(name="my-workspace")
# Add a document with a URL
document_id = workspace.add_document(url="https://example.com/document.pdf")
# Add a document with a PMID
document_id = workspace.add_document(pmid="PMC12345")
# Add a document with a DOI
document_id = workspace.add_document(doi="10.1234/example")
# Add a document with a file
document_id = workspace.add_document(file_path="/local/path/to/document.pdf")
Get Documents¶
Get documents from a workspace.
workspace = client.workspaces(name="my-workspace")
documents = workspace.get_documents()
# Access the documents
for doc in documents:
print(f"Document ID: {doc.id}, URL: {doc.url}, PMID: {doc.pmid}, DOI: {doc.doi}")
Schema Operations¶
Get Schemas¶
Get schemas from a workspace.
workspace = client.workspaces(name="my-workspace")
schemas = workspace.get_schemas(prefix="schemas/", exclude=None)
# Access the schemas
for schema in schemas.schemas:
print(f"Schema: {schema.name}")
print(f"Columns: {list(schema.columns.keys())}")
Add Schema¶
Add a schema to a workspace.
import pandera as pa
# Create a schema
schema = pa.DataFrameSchema(
name="my_schema",
columns={
"text": pa.Column(pa.String),
"label": pa.Column(pa.String),
"score": pa.Column(pa.Float, nullable=True),
},
)
# Add the schema to the workspace
workspace = client.workspaces(name="my-workspace")
workspace.add_schema(schema, prefix="schemas/")
Update Schemas¶
Update schemas in a workspace.
import pandera as pa
from extralit.extraction.models import SchemaStructure
# Create schemas
schema1 = pa.DataFrameSchema(
name="schema1",
columns={
"text": pa.Column(pa.String),
"label": pa.Column(pa.String),
},
)
schema2 = pa.DataFrameSchema(
name="schema2",
columns={
"text": pa.Column(pa.String),
"score": pa.Column(pa.Float),
},
)
# Create a schema structure
schemas = SchemaStructure(schemas=[schema1, schema2])
# Update schemas in the workspace
workspace = client.workspaces(name="my-workspace")
result = workspace.update_schemas(schemas, check_existing=True, prefix="schemas/")
print(f"Updated {len(result.objects)} schemas")
Error Handling¶
All API methods include proper error handling. If an error occurs, an exception will be raised with a descriptive error message.
try:
workspace = client.workspaces(name="non-existent-workspace")
files = workspace.list_files("")
except Exception as e:
print(f"Error: {str(e)}")
Workspace
¶
Bases: Resource
Class for interacting with Argilla workspaces. Workspaces are used to organize datasets in the Argilla server.
Attributes:
Name | Type | Description |
---|---|---|
name |
str
|
The name of the workspace. |
id |
UUID
|
The ID of the workspace. This is a unique identifier for the workspace in the server. |
datasets |
List[Dataset]
|
A list of all datasets in the workspace. |
users |
WorkspaceUsers
|
A list of all users in the workspace. |
Source code in src/argilla/workspaces/_resource.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 |
|
datasets
property
¶
List all datasets in the workspace
Returns:
Type | Description |
---|---|
List[Dataset]
|
List[Dataset]: A list of all datasets in the workspace |
users
property
¶
List all users in the workspace
Returns:
Name | Type | Description |
---|---|---|
WorkspaceUsers |
WorkspaceUsers
|
A list of all users in the workspace |
__init__(name=None, id=None, client=None)
¶
Initializes a Workspace object with a client and a name or id
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the workspace |
None
|
id
|
UUID
|
The id of the workspace. If provided before a .create, the workspace will be created with this ID |
None
|
client
|
Argilla
|
The client used to interact with Argilla |
None
|
Returns:
Name | Type | Description |
---|---|---|
Workspace |
None
|
The initialized workspace object |
Source code in src/argilla/workspaces/_resource.py
add_user(user)
¶
Adds a user to the workspace. After adding a user to the workspace, it will have access to the datasets in the workspace.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
user
|
Union[User, str]
|
The user to add to the workspace. Can be a User object or a username. |
required |
Returns:
Name | Type | Description |
---|---|---|
User |
User
|
The user that was added to the workspace |
Source code in src/argilla/workspaces/_resource.py
remove_user(user)
¶
Removes a user from the workspace. After removing a user from the workspace, it will no longer have access
Parameters:
Name | Type | Description | Default |
---|---|---|---|
user
|
Union[User, str]
|
The user to remove from the workspace. Can be a User object or a username. |
required |
Returns:
Name | Type | Description |
---|---|---|
User |
User
|
The user that was removed from the workspace. |
Source code in src/argilla/workspaces/_resource.py
list_files(path, recursive=True, include_version=True)
¶
List files in the workspace.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
The path to list files from. |
required |
recursive
|
bool
|
Whether to list files recursively. |
True
|
include_version
|
bool
|
Whether to include version information. |
True
|
Returns:
Type | Description |
---|---|
ListObjectsResponse
|
A list of files. |
Source code in src/argilla/workspaces/_resource.py
get_file(path, version_id=None)
¶
Get a file from the workspace.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
The path of the file. |
required |
version_id
|
Optional[str]
|
The version ID of the file. |
None
|
Returns:
Type | Description |
---|---|
FileObjectResponse
|
The file content and metadata. |
Source code in src/argilla/workspaces/_resource.py
put_file(path, file_path)
¶
Upload a file to the workspace.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
The path to store the file. |
required |
file_path
|
Union[str, Path]
|
The local path of the file to upload. |
required |
Returns:
Type | Description |
---|---|
ObjectMetadata
|
The metadata of the uploaded file. |
Source code in src/argilla/workspaces/_resource.py
delete_file(path, version_id=None)
¶
Delete a file from the workspace.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
The path of the file to delete. |
required |
version_id
|
Optional[str]
|
The version ID of the file. |
None
|
Source code in src/argilla/workspaces/_resource.py
add_document(file_path=None, url=None, pmid=None, doi=None)
¶
Add a document to the workspace.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
Optional[str]
|
The local path of the file to upload. |
None
|
url
|
Optional[str]
|
The URL of the document. |
None
|
pmid
|
Optional[str]
|
The PMID of the document. |
None
|
doi
|
Optional[str]
|
The DOI of the document. |
None
|
Returns:
Type | Description |
---|---|
UUID
|
The ID of the added document. |
Source code in src/argilla/workspaces/_resource.py
get_documents()
¶
Get documents from the workspace.
Returns:
Type | Description |
---|---|
List[Document]
|
A list of documents. |
list_schemas(prefix=DEFAULT_SCHEMA_S3_PATH, exclude=None)
¶
Get schemas from the workspace.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prefix
|
str
|
The prefix to filter schemas. |
DEFAULT_SCHEMA_S3_PATH
|
exclude
|
Optional[List[str]]
|
List of schema names to exclude. |
None
|
Returns:
Type | Description |
---|---|
SchemaStructure
|
A SchemaStructure containing the schemas. |
Source code in src/argilla/workspaces/_resource.py
add_schema(schema, prefix=DEFAULT_SCHEMA_S3_PATH)
¶
Add a schema to the workspace.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema
|
Any
|
The schema to add. |
required |
prefix
|
str
|
The prefix to store the schema. |
DEFAULT_SCHEMA_S3_PATH
|
Source code in src/argilla/workspaces/_resource.py
update_schemas(schemas, check_existing=True, prefix=DEFAULT_SCHEMA_S3_PATH)
¶
Update schemas in the workspace.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schemas
|
Any
|
The schemas to update. |
required |
check_existing
|
bool
|
Whether to check if the schema already exists. |
True
|
prefix
|
str
|
The prefix to store the schemas. |
DEFAULT_SCHEMA_S3_PATH
|
Returns:
Type | Description |
---|---|
ListObjectsResponse
|
A list of updated schema files. |