Accepted Sources

To initiate a Summarization job or an Extraction job, you must specify the source of your input data.

📖

Summarization job with a text file

Open Recipe

🔎

Extraction job with a text file

Open Recipe

The data source can have different types, which are described in the following table:

🚧
Some sources could require additional parameters, which are specified in the table below in the Additional parameters column.

Source	Description	Additional parameters
`generic`	Generic-format .txt and JSON transcripts
`audio`	Supported audio file formats include flac, m4a, mp3, mpga, ogg, and wav (keep it <100Mb)
`wordcab_transcript`	Wordcab transcript	`transcript_id`
`signed_url`	AWS or GCP signed URLs pointing to an audio file	`signed_url`
`assembly_ai`	Use the JSON array from AssemblyAI's `utterances` object as input
`deepgram`	Use the JSON array from Deepgram's `utterances` object as input
`rev_ai`	Use the JSON array from Rev.ai's `monologues` object as input
`vtt`	Either a .vtt file or raw vtt text in the request body
`otter`	Otter.ai transcript
`fireflies`	Fireflies.ai transcript
`sonix`	Sonix.ai transcript
`descript`	Descript transcript

Generic

The generic source is the most basic source type. It could be either a .txt file or a .json file.

Text file

Each line of the .txt file represents a single utterance with the following format:

[<start_time> --> <end_time>] <speaker>: <utterance>

Example:

[00:01:23 --> 00:01:25] Joe: This is an example of a generic transcript.
[00:01:26 --> 00:01:27] Jill: Oh wow!

📘
Timestamps are optional and can be placed anywhere on the line.

JSON file

The .json file must be a valid JSON array of objects. It should include a transcript key with a list of strings as value. Each element of the list should be a speaker utterance.

Example:

{
  "transcript": [
    	"Joe: This is an example of a generic transcript.",
    	"Jill: Oh wow!"
}

Audio

The audio source is used to specify an audio file as input data. As mentioned in the table above, supported audio file formats are .mp3, .wav, .flac, .ogg, mpga and .m4a.

Wordcab Transcript

We defined what is a Wordcab Transcript in the API concepts section (see Transcripts).

If you have a Wordcab Transcript, you can use it as input data for a Summarization job or an Extraction job by adding the transcript_id parameter to the request.

Signed URL

The signed_url source is used to specify a signed URL as input data. Signed URLs are helpful when you want to use a file stored in a private bucket or server. The file must be accessible via a signed URL from AWS (see AWS Signed URLs) or Google Cloud Storage (see GCP Signed URLs).

AssemblyAI

AssemblyAI provides a transcription service that can be used as a source for your transcript by providing AssemblyAI utterances (see Core Transcription).

Deepgram

Deepgram provides a transcription service that can be used as a source for your transcript by providing Deepgram utterances (see Utterances).

Rev.ai

Rev.ai provides a transcription service that can be used as a source for your transcript by providing Rev.ai monologues (see Get Started).

VTT

The vtt source is used to specify a .vtt file or a raw .vtt string as input data.

The .vtt file must be a valid WebVTT file. You can use this WebVTT validator before initiating a job.

Otter.ai

Otter.ai provides a transcription service that can be used as a source.

Learn more about Otter.ai proper formatting in our blog post: Otter.ai Transcripts in Wordcab.

Fireflies.ai

Fireflies.ai provides a transcription service that can be used as a source.

Learn more about Fireflies.ai proper formatting in our blog post: Fireflies.ai Transcripts in Wordcab.

Sonix.ai

Sonix.ai provides a transcription service that can be used as a source.

Learn more about Sonix.ai proper formatting in our blog post: Sonix.ai Transcripts in Wordcab.

Descript

Descript provides a transcription service that can be used as a source.

Learn more about Descript proper formatting in our blog post: Descript Transcripts in Wordcab

Accepted Sources

🚧
Some sources could require additional parameters, which are specified in the table below in the `Additional parameters` column.

Generic

Text file

📘
Timestamps are optional and can be placed anywhere on the line.

JSON file

Audio

Wordcab Transcript

Signed URL

AssemblyAI

Deepgram

Rev.ai

VTT

Otter.ai

Fireflies.ai

Sonix.ai

Descript

🚧Some sources could require additional parameters, which are specified in the table below in the Additional parameters column.

Generic

Text file

📘Timestamps are optional and can be placed anywhere on the line.

JSON file

Audio

Wordcab Transcript

Signed URL

AssemblyAI

Deepgram

Rev.ai

VTT

Otter.ai

Fireflies.ai

Sonix.ai

Descript

🚧
Some sources could require additional parameters, which are specified in the table below in the `Additional parameters` column.

📘
Timestamps are optional and can be placed anywhere on the line.