> For the complete documentation index, see [llms.txt](https://myshell-wiki.gitbook.io/proconfig-tutorial/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://myshell-wiki.gitbook.io/proconfig-tutorial/api-reference/widgets/6-whisper-large-v3.md).

# Whisper large-v3

## Try it in the Widget Center

Click this [url](https://app.myshell.ai/robot-workshop/widget/1781991624332992512) to try this widget and copy the Pro Config template.

## Usage

\<TODO: enter description here, and remove useless inputs>

**Input Parameters**

<table><thead><tr><th>Name</th><th>Type</th><th>Description</th><th>Default</th><th data-type="checkbox">Required</th></tr></thead><tbody><tr><td>voice_url</td><td><code>string</code></td><td>The url of the audio.</td><td><a href="https://cdn.myshell.ai/audio/chat/embed_obj/172/20240312/92d43981ff534b7fa6418dc68fbf0ca1.mp3">default_url</a></td><td>true</td></tr><tr><td>chunk_length_s</td><td><code>integer</code></td><td>The length of every chunk sent to whisper.</td><td>30</td><td>false</td></tr><tr><td>batch_size</td><td><code>integer</code></td><td>To accelerate the whisper process, used for parallel processing of audio.</td><td>24</td><td>false</td></tr><tr><td>return_timestamps</td><td><code>boolean</code></td><td>Whether we need timestamp information for the transcriptions.</td><td>True</td><td>false</td></tr><tr><td>diarize</td><td><code>boolean</code></td><td>Whether we need diarize the audio and return the speaker identification for sentences. If true, require `return_timestamps` to be true.</td><td>False</td><td>false</td></tr><tr><td>language</td><td><code>string</code></td><td>Whisper uses automatic language detection technology, but it can sometimes fail. This language parameter is used to save the setting and specify the target transcription when automatic detection fails. The language should be specified in lowercase, such as 'english', 'french', etc., and defaults to english.</td><td>english</td><td>false</td></tr></tbody></table>

**Output Parameters**

| Name   | Type     | Description                                                                                                                         | File Type |
| ------ | -------- | ----------------------------------------------------------------------------------------------------------------------------------- | --------- |
| text   | `string` | The transcription of the given audio                                                                                                |           |
| chunks | `array`  | if \`return\_timestamps\` is set to true. Chunks with varying lengths, containing information such as text, timestamp, speaker, etc |           |

**Output Example**

{% tabs %}
{% tab title="success" %}
{% code fullWidth="false" %}

```json
{ // input https://cdn.myshell.ai/audio/chat/embed_obj/40295/20240423/57aea65228cc4f63895a35a262f8133f.mp3
  "chunks": [
    {
      "speaker": null,
      "text": " On the show today, cuts both ways.",
      "timestamp": [
        0,
        4.8
      ]
    },
    {
      "speaker": null,
      "text": " And so far, we have talked about humans.",
      "timestamp": [
        5.6,
        7.94
      ]
    },
    {
      "speaker": null,
      "text": " Now let's talk about machines and the strengths and weaknesses of artificial intelligence.",
      "timestamp": [
        8.48,
        15.22
      ]
    },
    {
      "speaker": null,
      "text": " Because there are so many things that make bots like ChatGPT amazing.",
      "timestamp": [
        15.32,
        21.06
      ]
    },
    {
      "speaker": null,
      "text": " It's able to pass the bar exam or college exams as long as plenty of those exam examples",
      "timestamp": [
        21.46,
        27.9
      ]
    },
    {
      "speaker": null,
      "text": " were in the training data. It can also write poetry, compose music, summarize vast amounts",
      "timestamp": [
        27.9,
        33.92
      ]
    },
    {
      "speaker": null,
      "text": " of data in a fluid, human-like way. Some people are genuinely wowed by that, like, oh my goodness,",
      "timestamp": [
        33.92,
        40.68
      ]
    },
    {
      "speaker": null,
      "text": " Chachapiti is so creative. And making powerful AI so easy to use has resurfaced a long-running debate.",
      "timestamp": [
        41,
        48.62
      ]
    },
    {
      "speaker": null,
      "text": " Whether AI will save us or steal all our jobs and kill us.",
      "timestamp": [
        49.74,
        55.02
      ]
    },
    {
      "speaker": null,
      "text": " I'd say either is a possibility.",
      "timestamp": [
        55.58,
        58.48
      ]
    },
    {
      "speaker": null,
      "text": " This is Yejin Choi.",
      "timestamp": [
        58.9,
        60.42
      ]
    },
    {
      "speaker": null,
      "text": " And we don't know what's going to happen for sure.",
      "timestamp": [
        60.42,
        64.08
      ]
    },
    {
      "speaker": null,
      "text": " Choi. And we don't know what's going to happen for sure. She's an AI expert, a MacArthur Genius Award winner, and a professor at the University of Washington. And that means a lot is up to us",
      "timestamp": [
        67.04,
        74.68
      ]
    },
    {
      "speaker": null,
      "text": " to shape the future. So Yejin, you gave a TED Talk recently about AI and a big conundrum as you see it. AI today is unbelievably intelligent",
      "timestamp": [
        74.68,
        87.98
      ]
    },
    {
      "speaker": null,
      "text": " and then shockingly stupid. You used an example of drying laundry to demonstrate",
      "timestamp": [
        87.98,
        93.72
      ]
    },
    {
      "speaker": null,
      "text": " how literal it can be. So suppose I left five clothes to dry out in the sun and it took them",
      "timestamp": [
        93.72,
        101.34
      ]
    },
    {
      "speaker": null,
      "text": " five hours to dry completely. How long would it take to dry 30 clothes?",
      "timestamp": [
        101.34,
        106.66
      ]
    },
    {
      "speaker": null,
      "text": " Yeah, it doesn't matter how many pieces of clothes. It takes the same five hours.",
      "timestamp": [
        107.28,
        111.64
      ]
    },
    {
      "speaker": null,
      "text": " But...",
      "timestamp": [
        112.94,
        113.3
      ]
    },
    {
      "speaker": null,
      "text": " GPT-4, the newest, greatest AI system, says 30 hours. Not good.",
      "timestamp": [
        113.3,
        118.18
      ]
    },
    {
      "speaker": null,
      "text": " ChatGPT supposedly learns from its mistakes. But I tried asking it something similar,",
      "timestamp": [
        119.62,
        124.86
      ]
    },
    {
      "speaker": null,
      "text": " with a couple linguistic variations, and things didn't go any better.",
      "timestamp": [
        124.86,
        129.08
      ]
    },
    {
      "speaker": null,
      "text": " If it takes seven hours to dry five pieces of clothes in the sun, how long will it take to dry ten clothes?",
      "timestamp": [
        129.98,
        138.18
      ]
    },
    {
      "speaker": null,
      "text": " The drying time in your question seems to be directly proportional to the number of clothes.",
      "timestamp": [
        138.52,
        142.86
      ]
    },
    {
      "speaker": null,
      "text": " So, if you have 10 pieces of clothes,",
      "timestamp": [
        143.16,
        145.32
      ]
    },
    {
      "speaker": null,
      "text": " it will take 14 hours to dry all of them, assuming the conditions remain the same.",
      "timestamp": [
        145.52,
        149.7
      ]
    },
    {
      "speaker": null,
      "text": " Okay, let's ask again. Here's a new one. If it takes one shirt three hours and one pair of pants",
      "timestamp": [
        150.12,
        157.1
      ]
    },
    {
      "speaker": null,
      "text": " four hours to dry in the sun, how long would it take two shirts to dry? In this scenario,",
      "timestamp": [
        157.1,
        164.32
      ]
    },
    {
      "speaker": null,
      "text": " given that it takes three hours to dry one shirt, it would take twice that time to dry two shirts.",
      "timestamp": [
        164.64,
        169.28
      ]
    },
    {
      "speaker": null,
      "text": " So it would take six hours to dry two shirts.",
      "timestamp": [
        169.5,
        171.94
      ]
    },
    {
      "speaker": null,
      "text": " Oh, poor chat GPT.",
      "timestamp": [
        172.44,
        173.94
      ]
    },
    {
      "speaker": null,
      "text": " You are not going to have a lot of clothes to wear.",
      "timestamp": [
        174.04,
        175.82
      ]
    },
    {
      "speaker": null,
      "text": " You and I, once we understood a concept, then no matter how we rephrase the question, no matter how we ask the question",
      "timestamp": [
        176.96,
        186.38
      ]
    },
    {
      "speaker": null,
      "text": " differently, to us, it's the same question. So we can answer them correctly. It's strange why",
      "timestamp": [
        186.38,
        193.96
      ]
    },
    {
      "speaker": null,
      "text": " such an impressive AI that can even pass the bar exam struggles with little variations of the same",
      "timestamp": [
        193.96,
        201.88
      ]
    },
    {
      "speaker": null,
      "text": " question that requires just common sense. But it's not surprising if you know how AI is trained.",
      "timestamp": [
        201.88,
        208.74
      ]
    },
    {
      "speaker": null,
      "text": " It's trained to predict which word will come next.",
      "timestamp": [
        209.8,
        213.16
      ]
    },
    {
      "speaker": null,
      "text": " It's just reading a lot of data and try to learn the patterns behind the data.",
      "timestamp": [
        213.68,
        218.44
      ]
    },
    {
      "speaker": null,
      "text": " So it's not trained to do critical reasoning.",
      "timestamp": [
        218.58,
        221.4
      ]
    },
    {
      "speaker": null,
      "text": " And having common sense means applying reasoning to all sorts of scenarios,",
      "timestamp": [
        221.4,
        227.12
      ]
    },
    {
      "speaker": null,
      "text": " which computers can't do, at least not like humans. So common sense is what's strikingly",
      "timestamp": [
        227.48,
        235.02
      ]
    },
    {
      "speaker": null,
      "text": " easy for humans, but surprisingly hard for machines. It's everyday knowledge that you and",
      "timestamp": [
        235.02,
        241.6
      ]
    },
    {
      "speaker": null,
      "text": " I have about different objects and events that we interact",
      "timestamp": [
        241.6,
        245.5
      ]
    },
    {
      "speaker": null,
      "text": " with in life. And it's been a longstanding challenge in AI field. Yeah, I like drawing",
      "timestamp": [
        245.5,
        253.98
      ]
    },
    {
      "speaker": null,
      "text": " inspirations from humans because when children grow up, it's not the case that we just feed them",
      "timestamp": [
        253.98,
        260.32
      ]
    },
    {
      "speaker": null,
      "text": " with internet data and then let them figure out on their own. Actually, the outcome of that",
      "timestamp": [
        260.32,
        265.92
      ]
    },
    {
      "speaker": null,
      "text": " would be pretty horrible. Yes. And so what do we do to prevent it is to tell them in a more",
      "timestamp": [
        265.92,
        273.44
      ]
    },
    {
      "speaker": null,
      "text": " declarative form what's right and what's wrong. You mean like don't hit somebody? Yeah. For",
      "timestamp": [
        273.44,
        280.2
      ]
    },
    {
      "speaker": null,
      "text": " example, we tell them that it's not right to kill people or, you know, it's not polite to yell at people, even if they get angry.",
      "timestamp": [
        280.2,
        289.48
      ]
    },
    {
      "speaker": null,
      "text": " We teach them a lot of these things from early on in their lives.",
      "timestamp": [
        289.68,
        294.1
      ]
    },
    {
      "speaker": null,
      "text": " So if most AI models are learning from the vast amount of information that's available online,",
      "timestamp": [
        294.94,
        300
      ]
    }
  ],
  "text": " On the show today, cuts both ways. And so far, we have talked about humans. Now let's talk about machines and the strengths and weaknesses of artificial intelligence. Because there are so many things that make bots like ChatGPT amazing. It's able to pass the bar exam or college exams as long as plenty of those exam examples were in the training data. It can also write poetry, compose music, summarize vast amounts of data in a fluid, human-like way. Some people are genuinely wowed by that, like, oh my goodness, Chachapiti is so creative. And making powerful AI so easy to use has resurfaced a long-running debate. Whether AI will save us or steal all our jobs and kill us. I'd say either is a possibility. This is Yejin Choi. And we don't know what's going to happen for sure. Choi. And we don't know what's going to happen for sure. She's an AI expert, a MacArthur Genius Award winner, and a professor at the University of Washington. And that means a lot is up to us to shape the future. So Yejin, you gave a TED Talk recently about AI and a big conundrum as you see it. AI today is unbelievably intelligent and then shockingly stupid. You used an example of drying laundry to demonstrate how literal it can be. So suppose I left five clothes to dry out in the sun and it took them five hours to dry completely. How long would it take to dry 30 clothes? Yeah, it doesn't matter how many pieces of clothes. It takes the same five hours. But... GPT-4, the newest, greatest AI system, says 30 hours. Not good. ChatGPT supposedly learns from its mistakes. But I tried asking it something similar, with a couple linguistic variations, and things didn't go any better. If it takes seven hours to dry five pieces of clothes in the sun, how long will it take to dry ten clothes? The drying time in your question seems to be directly proportional to the number of clothes. So, if you have 10 pieces of clothes, it will take 14 hours to dry all of them, assuming the conditions remain the same. Okay, let's ask again. Here's a new one. If it takes one shirt three hours and one pair of pants four hours to dry in the sun, how long would it take two shirts to dry? In this scenario, given that it takes three hours to dry one shirt, it would take twice that time to dry two shirts. So it would take six hours to dry two shirts. Oh, poor chat GPT. You are not going to have a lot of clothes to wear. You and I, once we understood a concept, then no matter how we rephrase the question, no matter how we ask the question differently, to us, it's the same question. So we can answer them correctly. It's strange why such an impressive AI that can even pass the bar exam struggles with little variations of the same question that requires just common sense. But it's not surprising if you know how AI is trained. It's trained to predict which word will come next. It's just reading a lot of data and try to learn the patterns behind the data. So it's not trained to do critical reasoning. And having common sense means applying reasoning to all sorts of scenarios, which computers can't do, at least not like humans. So common sense is what's strikingly easy for humans, but surprisingly hard for machines. It's everyday knowledge that you and I have about different objects and events that we interact with in life. And it's been a longstanding challenge in AI field. Yeah, I like drawing inspirations from humans because when children grow up, it's not the case that we just feed them with internet data and then let them figure out on their own. Actually, the outcome of that would be pretty horrible. Yes. And so what do we do to prevent it is to tell them in a more declarative form what's right and what's wrong. You mean like don't hit somebody? Yeah. For example, we tell them that it's not right to kill people or, you know, it's not polite to yell at people, even if they get angry. We teach them a lot of these things from early on in their lives. So if most AI models are learning from the vast amount of information that's available online,"
}
```

{% endcode %}
{% endtab %}

{% tab title="fail" %}
{% code fullWidth="false" %}

```json
{
   "results": "<the example results of this widget>"
}
```

{% endcode %}
{% endtab %}
{% endtabs %}
Name	Type	Description	Default	Required
voice_url	`string`	The url of the audio.	default_url	true
chunk_length_s	`integer`	The length of every chunk sent to whisper.	30	false
batch_size	`integer`	To accelerate the whisper process, used for parallel processing of audio.	24	false
return_timestamps	`boolean`	Whether we need timestamp information for the transcriptions.	True	false
diarize	`boolean`	Whether we need diarize the audio and return the speaker identification for sentences. If true, require `return_timestamps` to be true.	False	false
language	`string`	Whisper uses automatic language detection technology, but it can sometimes fail. This language parameter is used to save the setting and specify the target transcription when automatic detection fails. The language should be specified in lowercase, such as 'english', 'french', etc., and defaults to english.	english	false