API Reference
REST
This page lists all available endpoints. Each endpoint can be expanded for a more detailed view.
Pipelines
GET
/pipelines/id
Retrieve a pipeline by its id.
Parameters
Query
id
The pipeline's id.
Query
statistics
Wether to include statistic for the pipeline. Default is false.
Query
components
Wether to include components. Default is true.
Example Request
const response = await fetch('/pipelines/65b3db5c8c997c4ce3c4efb3', {
method: 'GET',
headers: {
Authorization: API KEY HERE
}
})Responses
All responses are returned as a JSON String.
200 - Pipeline
404 - Not found
GET
/pipelines
Retrieve multiple pipelines. Accepts limit, skip, sort, order and search query parameters.
Parameters
Query
limit
The maximum number of pipelines to return.
Query
skip
The pipelines to skip before a limit is applied.
Query
sort
The field to sort by. Can be name, description, created_at, modified_at, status and times_used.
Query
order
The order to sort by. 1 is ascending and -1 is descending.
Query
statistics
Wether to include statistic for the pipeline. Default is false.
Query
components
Wether to include components. Default is true.
Example Request
const response = await fetch('/pipelines?limit=10&sort=times_used&order=-1', {
method: 'GET',
headers: {
Authorization: API KEY HERE
}
})Responses
All responses are returned as a JSON String.
200 - Pipelines
400 - Missing parameter
404 - Not found
POST
/pipelines
Create a new Pipeline. The body must at least contain the fields listed in parameters.
Parameters
Body
name
The name of the pipeline.
Body
description
The description of the pipeline.
Body
tags
An array of tags to categorize the pipeline.
Body
components
An array of components that must follow the structure as described in the section 'components'
Example Request
const response = await fetch('/pipelines?template=false', {
method: 'POST',
body: JSON.stringify({
name: 'Pipeline Name',
description: 'A simple pipeline to count annotations.',
components: [{
name: 'Counter',
driver: 'DUUIUIMADriver',
target: 'org.texttechnologylab.DockerUnifiedUIMAInterface.tools.CountAnnotations',
options: {
scale: 2
}
}],
}),
headers: {
Authorization: API KEY HERE
}
})Responses
All responses are returned as a JSON String.
200 - Inserted
400 - Missing field
POST
/pipelines/id/start
Instantiate a pipeline and wait idle for process requests.
Parameters
Query
id
The pipeline's id.
Example Request
const response = await fetch('/pipelines/65b3db5c8c997c4ce3c4efb3/start', {
method: 'POST',
headers: {
Authorization: API KEY HERE
}
})Responses
All responses are returned as a JSON String.
200 - Instantiated
500 - Not instantiated
PUT
/pipelines/id/stop
Shut down an instantiated pipeline. This also cancels running processes.
Parameters
Query
id
The pipeline's id.
Example Request
const response = await fetch('/pipelines/65b3db5c8c997c4ce3c4efb3/stop', {
method: 'PUT',
headers: {
Authorization: API KEY HERE
}
})Responses
All responses are returned as a JSON String.
200 - Shut down
404 - Not found
500 - Not shut down
PUT
/pipelines/id
Update a pipeline given its id. The body should be a JSON string defining updates.
Parameters
Query
id
The pipeline's id.
Example Request
const response = await fetch('/pipelines/65b3db5c8c997c4ce3c4efb3', {
method: 'PUT',
body: JSON.strinfigy({
name: 'New Name'
})
headers: {
Authorization: API KEY HERE
}
})Responses
All responses are returned as a JSON String.
201 - Updated
400 - Invalid field
404 - Not found
DELETE
/pipelines/id
Delete a pipeline given its id.
Parameters
Query
id
The pipeline's id.
Example Request
const response = await fetch('/pipelines/65b3db5c8c997c4ce3c4efb3', {
method: 'DELETE',
headers: {
Authorization: API KEY HERE
}
})Responses
All responses are returned as a JSON String.
200 - Deleted
404 - Not found
500 - Not deleted
Components
GET
/components/id
Retrieve a component by its id.
Parameters
Query
id
The pipeline's id.
Example Request
const response = await fetch('/components/65b3db5c8c997c4ce3c4efb3', {
method: 'GET',
headers: {
Authorization: API KEY HERE
}
})Responses
All responses are returned as a JSON String.
200 - Component
404 - Not found
GET
/components
Retrieve multiple components.
Parameters
Query
limit
The maximum number of components to return.
Query
skip
The pipelines to skip before a limit is applied.
Query
sort
The field to sort by. Can be name, description, created_at, modified_at, status, driver and target.
Query
order
The order to sort by. 1 is ascending and -1 is descending.
Example Request
const response = await fetch('/components?limit=5&sort=name&order=1', {
method: 'GET',
headers: {
Authorization: API KEY HERE
}
})Responses
All responses are returned as a JSON String.
200 - Components
400 - Missing parameter
404 - Not found
POST
/components
Create a new component from the fields in the body.
Parameters
Body
pipeline_id
The id of the pipeline to add the component to.
Body
name
The name of the component.
Body
description
The description of the component.
Body
tags
An array of tags to categorize the component.
Body
driver
The driver of the component
Body
target
The target (class path, docker image name or url) of the component
Body
options
An object containing settings for the component. These settings are optional and the default values can be seen in the example request.
Body
parameters
An object containing extra parameters for the component.
Example Request
const response = await fetch('/components', {
method: 'POST',
body: JSON.stringify({
pipeline_id: '65b3db5c8c997c4ce3c4efb3',
name: 'Component Name',
description: 'A simple component for tokenization.',
driver: 'DUUIUIMADriver',
target: 'de.tudarmstadt.ukp.dkpro.core.tokit.BreakIteratorSegmenter',
options: {
scale: 1,
use_GPU: true,
docker_image_fetching: true,
host: null,
registry_auth: {
username: null,
password: null
},
keep_alive: false,
ignore_200_error: true,
constraint: [],
labels: []
}
}),
headers: {
Authorization: API KEY HERE
}
})Responses
All responses are returned as a JSON String.
200 - Inserted
400 - Missing field
PUT
/components/id
Update a component given its id. The body should be a JSON string defining updates.
Parameters
Query
id
The component's id.
Example Request
const response = await fetch('/components/65b3db5c8c997c4ce3c4efb3', {
method: 'PUT',
body: JSON.strinfigy({
name: 'New Name'
})
headers: {
Authorization: API KEY HERE
}
})Responses
All responses are returned as a JSON String.
201 - Updated
400 - Invalid field
404 - Not found
DELETE
/components/id
Delete a component given its id.
Parameters
Query
id
The component's id.
Example Request
const response = await fetch('/components/65b3db5c8c997c4ce3c4efb3', {
method: 'DELETE',
headers: {
Authorization: API KEY HERE
}
})Responses
All responses are returned as a JSON String.
200 - Deleted
404 - Not found
500 - Not deleted
Processes
GET
/processes/id
Retrieve a process by its id.
Parameters
Query
id
The process' id.
Example Request
const response = await fetch('/processes/65b3dba48c997c4ce3c4f09e', {
method: 'GET',
headers: {
Authorization: API KEY HERE
}
})Responses
All responses are returned as a JSON String.
200 - Process
404 - Not found
GET
/processes
Retrieve multiple processes. Requires pipeline_id as a query parameters. Accepts limit, skip, sort, order, status, input and output query parameters.
Parameters
Query
limit
The maximum number of processes to return.
Query
skip
The processes to skip before a limit is applied.
Query
sort
The field to sort by. Can be input.provider, output.provider, started_at, duration, count, progress and status.
Query
order
The order to sort by. 1 is ascending and -1 is descending.
Query
status
A set of status names separated by ';' to filter by.
Query
input
A set of input providers separated by ';' to filter by. Accepts (Dropbox, Minio, Text, File, None).
Query
output
A set of output providers separated by ';' to filter by. Accepts (Dropbox, Minio, None).
Example Request
const response = await fetch('/processes?pipeline_id=65b3db5c8c997c4ce3c4efb3&limit=10&sort=input.provider&order=-1status=Completed;Failed', {
method: 'GET',
headers: {
Authorization: API KEY HERE
}
})Responses
All responses are returned as a JSON String.
200 - Processes
400 - Missing parameter pipeline_id
GET
/processes/id/documents
Retrieve one or multiple documents belonging to a process. Accepts limit, skip, sort, order, status, and search parameters.
Parameters
Query
limit
The maximum number of documents to return.
Query
skip
The documents to skip before a limit is applied.
Query
sort
The field to sort by. Can be name, progress, status, size and duration.
Query
order
The order to sort by. 1 is ascending and -1 is descending.
Query
status
A set of status names separated by ';' to filter by.
Query
search
A string of text to filter by. The text is compared to the path of the document.
Example Request
const response = await fetch('/processes?status=Completed;Failed&search=example.txt&sort=size&limit=3', {
method: 'GET',
headers: {
Authorization: API KEY HERE
}
})Responses
All responses are returned as a JSON String.
200 - Documents
404 - Not found
GET
/processes/id/events
Retrieve events for a processes given its id.
Parameters
Query
id
The processes' id.
Example Request
const response = await fetch('/pipelines/65b3dba48c997c4ce3c4f09e', {
method: 'GET',
headers: {
Authorization: API KEY HERE
}
})Responses
All responses are returned as a JSON String.
200 - Events
404 - Not found
POST
/processes
Create a new process with the settings provided in the body.
Parameters
Body
pipeline_id
The id of the pipeline to execute with this process. This is required.
Body
input
Sets the source location of documents to process. Dropbox and Minio require a path and file_extension to be specified.
Body
output
Sets the output location of documents. Dropbox and Minio require a path and file_extension to be specified.
Body
settings
Process specific settings that influence its behavior.
Example Request
const response = await fetch('/processes', {
method: 'POST',
body: JSON.stringify({
pipeline_id: '65b3db5c8c997c4ce3c4efb3',
input: {
provider: 'Minio',
path: '/input-bucket',
file_extension: '.txt'
},
output: {
provider: 'Dropbox',
path: '/output-bucket/path/to/folder',
file_extension: '.xmi'
},
settings: {
minimum_size: 5000, // Bytes
recursive: true, // Find files recursively in the input bucket.
check_target: true, // Check the output location for existing files hat won't be processed.
sort_by_size: false, // Sort files in ascending order.
overwrite: false, // Overwrite existing files on conflict.
ignore_errors: true, // Skips to the next document instead of failing the entire pipeline in case of error.
worker_count: 4 // The number of threads to use for processing. Limited by the server.
}
})
headers: {
Authorization: API KEY HERE
}
})Responses
All responses are returned as a JSON String.
200 - Started
400 - IOException
404 - Pipeline not found
429 - Not enough resources
500 - Failed
PUT
/processes/id
Stop a process and request a cancellation of an active pipeline.
Parameters
Query
id
The process' id.
Example Request
const response = await fetch('/processes/65b3dba48c997c4ce3c4f09e', {
method: 'PUT',
headers: {
Authorization: API KEY HERE
}
})Responses
All responses are returned as a JSON String.
200 - Shut down
404 - Not found
500 - Not shut down
DELETE
/processes/id
Delete a processes given its id.
Parameters
Query
id
The processes' id.
Example Request
const response = await fetch('/pipelines/65b3dba48c997c4ce3c4f09e', {
method: 'DELETE',
headers: {
Authorization: API KEY HERE
}
})Responses
All responses are returned as a JSON String.
200 - Deleted
404 - Not found
500 - Not deleted
Java
Documentation for DUUI using Java can be found on GitHub.
Python
Using python with DUUI can be done by sending requests to the API. An example for creating a pipeline and running it afterwards can be seen below.
Create a pipeline
import requests
API_KEY = "YOUR API KEY"
from duui.client import DUUIClient
from duui.config import API_KEY
CLIENT = DUUIClient(API_KEY)
my_pipeline = CLIENT.pipelines.create(
name="My Pipeline",
components=[
{
"name": "Tokenizer",
"tags": ["Token", "Sentence"],
"description": """Split the document into Tokens and Sentences
using the DKPro BreakIteratorSegmenter AnalysisEngine.""",
"driver": "DUUIUIMADriver",
"target": "de.tudarmstadt.ukp.dkpro.core.tokit.BreakIteratorSegmenter",
},
{
"name": "GerVADER",
"description": """GerVADER is a German adaptation of the sentiment
classification tool VADER. Classify sentences into positive,
negative or neutral statements.""",
"tags": ["Sentiment", "German"],
"driver": "DUUIDockerDriver",
"target": "docker.texttechnologylab.org/gervader_duui:latest",
"options": {"scale": 2, "use_GPU": True},
},
],
description="""This pipeline has been created using the API with Python.
It splits the document text into Tokens and Sentences
and then analyzes the Sentiment of these Sentences.""",
tags=["Python", "Sentence", "Sentiment"],
)Start a process
Before starting a process, the pipeline is instantiated for better reusability. After a process has completed, the pipeline won't shutdown but remain active for further requests.
pipeline_id = my_pipeline.get("oid")
# Instantiate the pipeline so it can be used multiple times
# without the need to restart Docker components
CLIENT.pipelines.instantiate(pipeline_id)
# Start a process that finds .txt files with a minimum size of 500 bytes
# recursively start from the /input directory in Dropbox.
my_process = CLIENT.processes.start(
pipeline_id,
input={
"provider": "Dropbox",
"path": "/input",
"file_extension": ".txt"
},
output={
"provider": "Dropbox",
"path": "/output/python",
"file_extension": ".txt",
},
recursive=True,
sort_by_size=True,
minimum_size=500,
worker_count=3
)Monitor the process
...
import schedule
import sys
import time
process_id = my_process.get("oid")
def update() -> None:
process = CLIENT.processes.findOne(process_id)
documents = CLIENT.processes.documents(
process_id, status_filter=["Failed"], include_count=True
)
total = len(process["document_names"])
print(
f"""Progress: {round(process['progress'] / total * 100)}%
{documents['count']} Documents have failed.
""",
end="",
)
if process["status"] in ["Completed", "Failed", "Cancelled"]:
print(
f"""Progress: {round(process['progress'] / total * 100)}
% {documents['count']} Documents have failed.""",
)
print(f"Process finished with status {process['status']}.")
sys.exit(0)
schedule.every(5).seconds.do(update)
while True:
schedule.run_pending()
time.sleep(1)