Advanced Pipelines (beta) ⚠️
Get a better understanding of how to use Wordcab's powerful /pipeline endpoint to compelte multiple tasks at once.
Intro to Pipelines
Pipelines are a new way to run multiples tasks - such as transcription, classification, summarization, and so on - with a single endpoint, and get back a complete response with a single webhook.
The /pipeline endpoint accepts a pipeline
object and an optional webhook
. The pipeline
object should contain one or more pipelines: transcribe
, classify
, translate
, and/or summarize
.
Note that the beta
/pipeline
endpoint currently supports only thetranscribe
andclassify
pipelines.
Wordcab will internally handle any parallelization or potential conflicts. For example, some classify
tasks depend on a transcript, and some even depend on other classify
tasks. Wordcab will take care of all this logic for you.
Concepts
There are several key concepts associated with pipelines:
pipeline
: The top-level pipeline
object in both the request and response will contain your chosen pipelines.
Pipelines: Pipelines in the pipeline
object can be one or more of the following: transcribe
, classify
, translate
, and/or summarize
.
Task: Each pipeline can contain multiple tasks. For example, the classify
pipeline can include one or more of the following tasks: classify_audio
, sentiment_analysis
or clarity_scores
.
The pipeline
Object
pipeline
ObjectThe pipeline
object in the body of a /pipeline POST
request can contain one or more of the below pipelines.
transcribe
transcribe
The transcribe
pipeline can use nearly any /transcribe parameter as a key/value, with a few critical differences:
- You can only use audio URLs as input, not files. You'll need to define the
url
key and seturl_type
toaudio_url
. - You can use a webhook within the
transcribe
object, but it's not recommended. Instead, using a webhook outside thepipeline
object will return all your pipeline tasks in a nearly structured JSON. - We partner with Svix to provide users with an advanced webhook portal for our non-pipeline endpoints. However, since a pipeline response is more than likely to exceed Svix's size limit, we send a simple POST request to the webhook you define. The disadvantage here is that you do not get Svix's sophisticated retry mechanism or dashboard.
tags
(parameter) andmetadata
(header) are not supported.
transcribe
Pipeline Example
transcribe
Pipeline ExampleBelow you can find an example of a transcribe
pipeline.
{
"pipeline": {
"transcribe": {
"display_name": "transcribe_pipeline_test",
"url": "https://your-presigned-audio-url.com/",
"url_type": "audio_url",
"diarization": "true",
"redact_text": true,
"redact_audio": true,
"sensitive_data_types": "name,organization",
}
},
"webhook": "https://your-site.com/your-webhook-endpoint"
}
classify
classify
The classify
pipeline can use nearly any /classify parameter as a key/value, with a few critical differences:
- If you have define a classification task that requires a
transcript_id
from a completed transcript, theclassify
pipeline will only run after thetranscribe
pipeline is finished. If you didn't define atranscribe
pipeline as shown above, you will receive an error. - We partner with Svix to provide users with an advanced webhook portal for our non-pipeline endpoints. However, since a pipeline response is more than likely to exceed Svix's size limit, we send a simple POST request to the webhook you define. The disadvantage here is that you do not get Svix's sophisticated retry mechanism or dashboard.
tags
(parameter) andmetadata
(header) are not supported.
classify
Pipeline Example
classify
Pipeline ExampleBelow you can find an example of a classify
pipeline will all available classification tasks defined in tasks
.
{
"pipeline": {
"classify": {
"display_name": "classify_pipeline_test",
"url": "https://your-presigned-audio-url.com/",
"url_type": "audio_url",
"tasks": "classify_audio,sentiment_analysis,clarity_scores"
}
},
"webhook": "https://your-site.com/your-webhook-endpoint"
}
Stacking Pipelines
Stacking pipelines is very simple: simply add additional pipelines to the pipeline
object. Wordcab will handle any parallelization and potential conflicts.
{
"pipeline": {
"transcribe": {
"display_name": "transcribe_pipeline_test",
"url": "https://your-presigned-audio-url.com/",
"url_type": "audio_url",
"diarization": "true",
"redact_text": true,
"redact_audio": true,
"sensitive_data_types": "name,organization"
},
"classify": {
"display_name": "classify_pipeline_test",
"url": "https://your-presigned-audio-url.com/",
"url_type": "audio_url",
"tasks": "classify_audio,sentiment_analysis,clarity_scores"
}
},
"webhook": "https://your-site.com/your-webhook-endpoint"
}
/pipeline Response Object
Running different pipelines is similar to running individual endpoints like /transcribe or /translate in sequence or parallel. The difference is that some pipeline tasks may have dependencies on others.
Some pipelines will run immediately and return what you'd expect from an individual endpoint. For example, the transcribe
pipeline, if successful, will immediately return a job_name
and transcript_id
you can use to check on progress.
For tasks that have other dependencies, such as sentiment_analysis
and clarity_scores
, the response will either contain a message explaining that these will be run automatically after the dependencies are met, or there will be a detailed error message.
Both examples are show below.
Success Response Example
{
"transcribe": {
"job_name": "job_abc123",
"transcript_id": "audio_url_transcript_abc123"
},
"classify": {
"sentiment_analysis": {
"details": "The 'sentiment_analysis' classification task depends on the 'transcribe' pipeline. The 'classify' pipeline will run after the 'transcribe' pipeline is complete."
},
"clarity_scores": {
"details": "The 'clarity_scores' classification task depends on the 'transcribe' pipeline. The 'classify' pipeline will run after the 'transcribe' pipeline is complete."
}
},
"pipeline_jobs": [
"job_abc123"
],
"pipeline_id": "pipe_abc123",
"webhook": "https://your-site.com/your-webhook-endpoint",
"status": "ProcessingPipeline"
}
Error Response Example
{
"error": "All pipeline tasks failed. Please check your request body.",
"classify": {
"sentiment_analysis": {
"error": "The 'sentiment_analysis' classification task depends on the 'transcribe' pipeline. Please include the 'transcribe' pipeline in your 'pipeline' object."
},
"clarity_scores": {
"error": "The 'clarity_scores' classification task depends on the 'transcribe' pipeline. Please include the 'transcribe' pipeline in your 'pipeline' object."
}
}
}
Adding a Webhook
Note that webhooks are currently the only way to receive your results. While officially an optional parameter, a
webhook
is the only way to receive results while /pipeline is in beta.
Add a webhook to your body, outside of the pipeline
object. You will receive a POST request with results of each pipeline.
{
"pipeline": {
"transcribe": {...},
"classify": {...}
},
"webhook": "https://your-site.com/your-webhook-endpoint"
}
Webhook Response Example
The webhook response will have a similar high-level structure to your request body. There's a pipeline
object, with the name of each pipeline
that you ran. Each pipeline will contain a complete response, and varies according to the pipelines and tasks that were configured.
{
"pipeline": {
"transcribe": {
"status": "TranscriptComplete",
"job_name": "job_abc123",
"transcript": [],
"speaker_map": {
"A": "SPEAKER A"
},
"transcript_id": "audio_url_transcript_abc123",
"redacted_audio_url": "https://wordcab-presigned-url.s3.amazonaws.com/abc123"
},
"classify": {
"job_name": "job_abc456",
"classification_id": "classification_abc456",
"classification": {
"audio_classification": [],
"sentiment_analysis": {...},
"clarity_scores": {...},
"status": "ClassificationComplete"
}
},
"pipeline_id": "pipe_abc789",
"time_created": "2024-07-11T21:40:34.970516+00:00",
"time_completed": "2024-07-11T21:40:50.964913+00:00",
"status": "PipelineComplete"
}
Webhook Headers
It's highly recommended that you verify the incoming pipeline webhook. We help you do this with custom webhook headers. In your /pipeline POST
request, you can add headers that begin with X-Wordcab-Webhook-
.
As with metadata, anything after this prefix is lowercased, and all dashes are made into underscores. These are then passed to the headers of the final webhook POST
request. For example:
X-Wordcab-Webhook-Authorize=pass123
X-Wordcab-Webhook-Verify-Webhook=verify123
# After adding these headers to your /pipeline request,
# you will see the following headers in your webhook response, on your server:
authorize=pass123
verify_webhook=verify123
You should plan for anything after
X-Wordcab-Webhook-
in your /pipeline headers to be lowercased, and all dashes replaced with underscores.X-Wordcab-Webhook-
will also be removed when sending the final webhookPOST
request.
Updated 3 months ago