Advanced Pipelines (beta) ⚠️

Get a better understanding of how to use Wordcab's powerful /pipeline endpoint to compelte multiple tasks at once.

Intro to Pipelines

Pipelines are a new way to run multiples tasks - such as transcription, classification, summarization, and so on - with a single endpoint, and get back a complete response with a single webhook.

The /pipeline endpoint accepts a pipeline object and an optional webhook. The pipeline object should contain one or more pipelines: transcribe, classify, translate, and/or summarize.

🚧

Note that the beta /pipeline endpoint currently supports only the transcribe and classify pipelines.

Wordcab will internally handle any parallelization or potential conflicts. For example, some classify tasks depend on a transcript, and some even depend on other classify tasks. Wordcab will take care of all this logic for you.

Concepts

There are several key concepts associated with pipelines:

pipeline: The top-level pipeline object in both the request and response will contain your chosen pipelines.

Pipelines: Pipelines in the pipeline object can be one or more of the following: transcribe, classify, translate, and/or summarize.

Task: Each pipeline can contain multiple tasks. For example, the classify pipeline can include one or more of the following tasks: classify_audio, sentiment_analysis or clarity_scores.

The pipeline Object

The pipeline object in the body of a /pipeline POST request can contain one or more of the below pipelines.

transcribe

The transcribe pipeline can use nearly any /transcribe parameter as a key/value, with a few critical differences:

  • You can only use audio URLs as input, not files. You'll need to define the url key and set url_type to audio_url.
  • You can use a webhook within the transcribe object, but it's not recommended. Instead, using a webhook outside the pipeline object will return all your pipeline tasks in a nearly structured JSON.
  • We partner with Svix to provide users with an advanced webhook portal for our non-pipeline endpoints. However, since a pipeline response is more than likely to exceed Svix's size limit, we send a simple POST request to the webhook you define. The disadvantage here is that you do not get Svix's sophisticated retry mechanism or dashboard.
  • tags (parameter) and metadata (header) are not supported.

transcribe Pipeline Example

Below you can find an example of a transcribe pipeline.

{
  "pipeline": {
    "transcribe": {
      "display_name": "transcribe_pipeline_test", 
      "url": "https://your-presigned-audio-url.com/", 
      "url_type": "audio_url", 
      "diarization": "true", 
      "redact_text": true, 
      "redact_audio": true,
      "sensitive_data_types": "name,organization", 
    }
  },
  "webhook": "https://your-site.com/your-webhook-endpoint"
}

classify

The classify pipeline can use nearly any /classify parameter as a key/value, with a few critical differences:

  • If you have define a classification task that requires a transcript_id from a completed transcript, the classify pipeline will only run after the transcribe pipeline is finished. If you didn't define a transcribe pipeline as shown above, you will receive an error.
  • We partner with Svix to provide users with an advanced webhook portal for our non-pipeline endpoints. However, since a pipeline response is more than likely to exceed Svix's size limit, we send a simple POST request to the webhook you define. The disadvantage here is that you do not get Svix's sophisticated retry mechanism or dashboard.
  • tags (parameter) and metadata (header) are not supported.

classify Pipeline Example

Below you can find an example of a classify pipeline will all available classification tasks defined in tasks.

{
  "pipeline": {
    "classify": {
      "display_name": "classify_pipeline_test", 
      "url": "https://your-presigned-audio-url.com/", 
      "url_type": "audio_url",
      "tasks": "classify_audio,sentiment_analysis,clarity_scores"
    }
  },
  "webhook": "https://your-site.com/your-webhook-endpoint"
}

Stacking Pipelines

Stacking pipelines is very simple: simply add additional pipelines to the pipeline object. Wordcab will handle any parallelization and potential conflicts.

{
  "pipeline": {
    "transcribe": {
      "display_name": "transcribe_pipeline_test", 
      "url": "https://your-presigned-audio-url.com/", 
      "url_type": "audio_url", 
      "diarization": "true", 
      "redact_text": true, 
      "redact_audio": true,
      "sensitive_data_types": "name,organization"
    },
    "classify": {
      "display_name": "classify_pipeline_test", 
      "url": "https://your-presigned-audio-url.com/", 
      "url_type": "audio_url",
      "tasks": "classify_audio,sentiment_analysis,clarity_scores"
    }
  },
  "webhook": "https://your-site.com/your-webhook-endpoint"
}

/pipeline Response Object

Running different pipelines is similar to running individual endpoints like /transcribe or /translate in sequence or parallel. The difference is that some pipeline tasks may have dependencies on others.

Some pipelines will run immediately and return what you'd expect from an individual endpoint. For example, the transcribe pipeline, if successful, will immediately return a job_name and transcript_id you can use to check on progress.

For tasks that have other dependencies, such as sentiment_analysis and clarity_scores, the response will either contain a message explaining that these will be run automatically after the dependencies are met, or there will be a detailed error message.

Both examples are show below.

Success Response Example

{
  "transcribe": {
    "job_name": "job_abc123",
    "transcript_id": "audio_url_transcript_abc123"
  },
  "classify": {
    "sentiment_analysis": {
      "details": "The 'sentiment_analysis' classification task depends on the 'transcribe' pipeline. The 'classify' pipeline will run after the 'transcribe' pipeline is complete."
    },
    "clarity_scores": {
      "details": "The 'clarity_scores' classification task depends on the 'transcribe' pipeline. The 'classify' pipeline will run after the 'transcribe' pipeline is complete."
    }
  },
  "pipeline_jobs": [
    "job_abc123"
  ],
  "pipeline_id": "pipe_abc123",
  "webhook": "https://your-site.com/your-webhook-endpoint",
  "status": "ProcessingPipeline"
}

Error Response Example

{
  "error": "All pipeline tasks failed. Please check your request body.",
  "classify": {
    "sentiment_analysis": {
      "error": "The 'sentiment_analysis' classification task depends on the 'transcribe' pipeline. Please include the 'transcribe' pipeline in your 'pipeline' object."
    },
    "clarity_scores": {
      "error": "The 'clarity_scores' classification task depends on the 'transcribe' pipeline. Please include the 'transcribe' pipeline in your 'pipeline' object."
    }
  }
}

Adding a Webhook

🚧

Note that webhooks are currently the only way to receive your results. While officially an optional parameter, a webhook is the only way to receive results while /pipeline is in beta.

Add a webhook to your body, outside of the pipeline object. You will receive a POST request with results of each pipeline.

{
  "pipeline": {
    "transcribe": {...},
    "classify": {...}
  },
  "webhook": "https://your-site.com/your-webhook-endpoint"
}

Webhook Response Example

The webhook response will have a similar high-level structure to your request body. There's a pipeline object, with the name of each pipeline that you ran. Each pipeline will contain a complete response, and varies according to the pipelines and tasks that were configured.

{
  "pipeline": {
    "transcribe": {
      "status": "TranscriptComplete",
      "job_name": "job_abc123",
      "transcript": [],
      "speaker_map": {
        "A": "SPEAKER A"
      },
      "transcript_id": "audio_url_transcript_abc123",
      "redacted_audio_url": "https://wordcab-presigned-url.s3.amazonaws.com/abc123"
    },
    "classify": {
      "job_name": "job_abc456",
      "classification_id": "classification_abc456",
      "classification": {
        "audio_classification": [],
        "sentiment_analysis": {...},
        "clarity_scores": {...},
      "status": "ClassificationComplete"
    }
  },
  "pipeline_id": "pipe_abc789",
  "time_created": "2024-07-11T21:40:34.970516+00:00",
  "time_completed": "2024-07-11T21:40:50.964913+00:00",
  "status": "PipelineComplete"
}

Webhook Headers

It's highly recommended that you verify the incoming pipeline webhook. We help you do this with custom webhook headers. In your /pipeline POST request, you can add headers that begin with X-Wordcab-Webhook-.

As with metadata, anything after this prefix is lowercased, and all dashes are made into underscores. These are then passed to the headers of the final webhook POST request. For example:

X-Wordcab-Webhook-Authorize=pass123
X-Wordcab-Webhook-Verify-Webhook=verify123

# After adding these headers to your /pipeline request, 
# you will see the following headers in your webhook response, on your server:

authorize=pass123
verify_webhook=verify123

🚧

You should plan for anything after X-Wordcab-Webhook- in your /pipeline headers to be lowercased, and all dashes replaced with underscores. X-Wordcab-Webhook- will also be removed when sending the final webhook POST request.