Converts audio to text by applying powerful neural network models.
Gets the latest state of a long-running operation. Clients can use this method to poll the operation result at intervals as recommended by the API service.
Name | Data Type | Description |
---|---|---|
name
|
string |
The name of the operation resource. |
Name | Data Type | Description | |||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Operation | ||||||||||||||||||||||||||||||||||||
Properties
|
Lists operations that match the specified filter in the request. If the
server doesn't support this method, it returns UNIMPLEMENTED
.
NOTE: the name
binding allows API services to override the binding
to use different resource name schemes, such as users/*/operations
. To
override the binding, API services can add a binding such as
"/v1/{name=users/*}/operations"
to their service configuration.
For backwards compatibility, the default name includes the operations
collection id, however overriding users must ensure the name binding
is the parent resource, without the operations collection id.
Name | Data Type | Description |
---|---|---|
pageToken
|
string |
The standard list page token. |
pageSize
|
integer |
The standard list page size. |
name
|
string |
The name of the operation's parent resource. |
filter
|
string |
The standard list filter. |
Name | Data Type | Description | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
ListOperationsResponse | |||||||||||||||||||||||||||||||||
Properties
|
Lists operations that match the specified filter in the request. If the
server doesn't support this method, it returns UNIMPLEMENTED
.
NOTE: the name
binding allows API services to override the binding
to use different resource name schemes, such as users/*/operations
. To
override the binding, API services can add a binding such as
"/v1/{name=users/*}/operations"
to their service configuration.
For backwards compatibility, the default name includes the operations
collection id, however overriding users must ensure the name binding
is the parent resource, without the operations collection id.
Name | Data Type | Description |
---|---|---|
name
|
string |
The name of the operation's parent resource. |
Name | Data Type | Description |
---|---|---|
pageToken
|
string |
The standard list page token. |
pageSize
|
integer |
The standard list page size. |
filter
|
string |
The standard list filter. |
Name | Data Type | Description | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
ListOperationsResponse | |||||||||||||||||||||||||||||||||
Properties
|
Gets the latest state of a long-running operation. Clients can use this method to poll the operation result at intervals as recommended by the API service.
Name | Data Type | Description |
---|---|---|
name
|
string |
The name of the operation resource. |
Name | Data Type | Description | |||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Operation | ||||||||||||||||||||||||||||||||||||
Properties
|
Performs asynchronous speech recognition: receive results via the
google.longrunning.Operations interface. Returns either an
Operation.error
or an Operation.response
which contains
a LongRunningRecognizeResponse
message.
For more information on asynchronous speech recognition, see the
how-to.
Name | Data Type | Description | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
LongRunningRecognizeRequest | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Properties
|
Name | Data Type | Description | |||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Operation | ||||||||||||||||||||||||||||||||||||
Properties
|
Performs synchronous speech recognition: receive results after all audio has been sent and processed.
Name | Data Type | Description | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
RecognizeRequest | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Properties
|
Name | Data Type | Description | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
RecognizeResponse | |||||||||||||||||||||
Properties
|
The response message for Operations.ListOperations.
Name | Data Type | Description | |||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
operations
|
array [Operation] |
A list of operations that matches the specified filter in the request. |
|||||||||||||||||||||||||||||||||||
Properties
|
|||||||||||||||||||||||||||||||||||||
nextPageToken
|
string |
The standard List next-page token. |
Describes the progress of a long-running LongRunningRecognize
call. It is
included in the metadata
field of the Operation
returned by the
GetOperation
call of the google::longrunning::Operations
service.
Name | Data Type | Description |
---|---|---|
startTime
|
string |
Time when the request was received. |
progressPercent
|
integer |
Approximate percentage of audio processed thus far. Guaranteed to be 100 when the audio is fully processed and the results are available. |
lastUpdateTime
|
string |
Time of the most recent processing update. |
The top-level message sent by the client for the LongRunningRecognize
method.
Name | Data Type | Description | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
config
|
RecognitionConfig |
Required. Provides information to the recognizer that specifies how to process the request. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Properties
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
audio
|
RecognitionAudio |
Required. The audio data to be recognized. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Properties
|
The only message returned to the client by the LongRunningRecognize
method.
It contains the result as zero or more sequential SpeechRecognitionResult
messages. It is included in the result.response
field of the Operation
returned by the GetOperation
call of the google::longrunning::Operations
service.
Name | Data Type | Description | |||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
results
|
array [SpeechRecognitionResult] |
Sequential list of transcription results corresponding to sequential portions of audio. |
|||||||||||||||||||||||||||||||||||||||||||||
Properties
|
This resource represents a long-running operation that is the result of a network API call.
Name | Data Type | Description | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
response
|
object |
The normal response of the operation in case of success. If the original
method returns no data on success, such as |
|||||||||||||
name
|
string |
The server-assigned name, which is only unique within the same service that
originally returns it. If you use the default HTTP mapping, the
|
|||||||||||||
metadata
|
object |
Service-specific metadata associated with the operation. It typically contains progress information and common metadata such as create time. Some services might not provide such metadata. Any method that returns a long-running operation should document the metadata type, if any. |
|||||||||||||
error
|
Status |
The error result of the operation in case of failure or cancellation. |
|||||||||||||
Properties
|
|||||||||||||||
done
|
boolean |
If the value is |
Contains audio data in the encoding specified in the RecognitionConfig
.
Either content
or uri
must be supplied. Supplying both or neither
returns google.rpc.Code.INVALID_ARGUMENT. See
content limits.
Name | Data Type | Description |
---|---|---|
uri
|
string |
URI that points to a file that contains audio data bytes as specified in
|
content
|
byte |
The audio data bytes encoded as specified in
|
Provides information to the recognizer that specifies how to process the request.
Name | Data Type | Description | |||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
useEnhanced
|
boolean |
Set to true to use an enhanced model for speech recognition.
If If |
|||||||||||||||||||||||||||||||
speechContexts
|
array [SpeechContext] |
Array of SpeechContext. A means to provide context to assist the speech recognition. For more information, see speech adaptation. |
|||||||||||||||||||||||||||||||
Properties
|
|||||||||||||||||||||||||||||||||
sampleRateHertz
|
integer |
Sample rate in Hertz of the audio data sent in all
|
|||||||||||||||||||||||||||||||
profanityFilter
|
boolean |
If set to |
|||||||||||||||||||||||||||||||
model
|
string |
Which model to select for the given request. Select the model best suited to your domain to get best results. If a model is not explicitly specified, then we auto-select a model based on the parameters in the RecognitionConfig.
|
|||||||||||||||||||||||||||||||
metadata
|
RecognitionMetadata |
Metadata regarding this request. |
|||||||||||||||||||||||||||||||
Properties
|
|||||||||||||||||||||||||||||||||
maxAlternatives
|
integer |
Maximum number of recognition hypotheses to be returned.
Specifically, the maximum number of |
|||||||||||||||||||||||||||||||
languageCode
|
string |
Required. The language of the supplied audio as a BCP-47 language tag. Example: "en-US". See Language Support for a list of the currently supported language codes. |
|||||||||||||||||||||||||||||||
encoding
|
string Allowed values: - ENCODING_UNSPECIFIED - LINEAR16 - FLAC - MULAW - AMR - AMR_WB - OGG_OPUS - SPEEX_WITH_HEADER_BYTE |
Encoding of audio data sent in all |
|||||||||||||||||||||||||||||||
enableWordTimeOffsets
|
boolean |
If |
|||||||||||||||||||||||||||||||
enableSeparateRecognitionPerChannel
|
boolean |
This needs to be set to |
|||||||||||||||||||||||||||||||
enableAutomaticPunctuation
|
boolean |
If 'true', adds punctuation to recognition result hypotheses. This feature is only available in select languages. Setting this for requests in other languages has no effect at all. The default 'false' value does not add punctuation to result hypotheses. Note: This is currently offered as an experimental service, complimentary to all users. In the future this may be exclusively available as a premium feature. |
|||||||||||||||||||||||||||||||
diarizationConfig
|
SpeakerDiarizationConfig |
Config to enable speaker diarization and set additional parameters to make diarization better suited for your application. Note: When this is enabled, we send all the words from the beginning of the audio for the top alternative in every consecutive STREAMING responses. This is done in order to improve our speaker tags as our models learn to identify the speakers in the conversation over time. For non-streaming requests, the diarization results will be provided only in the top alternative of the FINAL SpeechRecognitionResult. |
|||||||||||||||||||||||||||||||
Properties
|
|||||||||||||||||||||||||||||||||
audioChannelCount
|
integer |
The number of channels in the input audio data.
ONLY set this for MULTI-CHANNEL recognition.
Valid values for LINEAR16 and FLAC are |
Description of audio data to be recognized.
Name | Data Type | Description |
---|---|---|
recordingDeviceType
|
string Allowed values: - RECORDING_DEVICE_TYPE_UNSPECIFIED - SMARTPHONE - PC - PHONE_LINE - VEHICLE - OTHER_OUTDOOR_DEVICE - OTHER_INDOOR_DEVICE |
The type of device the speech was recorded with. |
recordingDeviceName
|
string |
The device used to make the recording. Examples 'Nexus 5X' or 'Polycom SoundStation IP 6000' or 'POTS' or 'VoIP' or 'Cardioid Microphone'. |
originalMimeType
|
string |
Mime type of the original audio file. For example |
originalMediaType
|
string Allowed values: - ORIGINAL_MEDIA_TYPE_UNSPECIFIED - AUDIO - VIDEO |
The original media the speech was recorded on. |
obfuscatedId
|
string |
Obfuscated (privacy-protected) ID of the user, to identify number of unique users using the service. |
microphoneDistance
|
string Allowed values: - MICROPHONE_DISTANCE_UNSPECIFIED - NEARFIELD - MIDFIELD - FARFIELD |
The audio type that most closely describes the audio being recognized. |
interactionType
|
string Allowed values: - INTERACTION_TYPE_UNSPECIFIED - DISCUSSION - PRESENTATION - PHONE_CALL - VOICEMAIL - PROFESSIONALLY_PRODUCED - VOICE_SEARCH - VOICE_COMMAND - DICTATION |
The use case most closely describing the audio content to be recognized. |
industryNaicsCodeOfAudio
|
integer |
The industry vertical to which this speech recognition request most closely applies. This is most indicative of the topics contained in the audio. Use the 6-digit NAICS code to identify the industry vertical - see https://www.naics.com/search/. |
audioTopic
|
string |
Description of the content. Eg. "Recordings of federal supreme court hearings from 2012". |
The top-level message sent by the client for the Recognize
method.
Name | Data Type | Description | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
config
|
RecognitionConfig |
Required. Provides information to the recognizer that specifies how to process the request. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Properties
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
audio
|
RecognitionAudio |
Required. The audio data to be recognized. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Properties
|
The only message returned to the client by the Recognize
method. It
contains the result as zero or more sequential SpeechRecognitionResult
messages.
Name | Data Type | Description | |||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
results
|
array [SpeechRecognitionResult] |
Sequential list of transcription results corresponding to sequential portions of audio. |
|||||||||||||||||||||||||||||||||||||||||||||
Properties
|
Config to enable speaker diarization.
Name | Data Type | Description |
---|---|---|
speakerTag
|
integer |
Output only. Unused. |
minSpeakerCount
|
integer |
Minimum number of speakers in the conversation. This range gives you more flexibility by allowing the system to automatically determine the correct number of speakers. If not set, the default value is 2. |
maxSpeakerCount
|
integer |
Maximum number of speakers in the conversation. This range gives you more flexibility by allowing the system to automatically determine the correct number of speakers. If not set, the default value is 6. |
enableSpeakerDiarization
|
boolean |
If 'true', enables speaker detection for each recognized word in the top alternative of the recognition result using a speaker_tag provided in the WordInfo. |
Provides "hints" to the speech recognizer to favor specific words and phrases in the results.
Name | Data Type | Description |
---|---|---|
phrases
|
array [string] |
A list of strings containing words and phrases "hints" so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words to the vocabulary of the recognizer. See usage limits. List items can also be set to classes for groups of words that represent common concepts that occur in natural language. For example, rather than providing phrase hints for every month of the year, using the $MONTH class improves the likelihood of correctly transcribing audio that includes months. |
Alternative hypotheses (a.k.a. n-best list).
Name | Data Type | Description | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
words
|
array [WordInfo] |
A list of word-specific information for each recognized word.
Note: When |
||||||||||||||||
Properties
|
||||||||||||||||||
transcript
|
string |
Transcript text representing the words that the user spoke. |
||||||||||||||||
confidence
|
float |
The confidence estimate between 0.0 and 1.0. A higher number
indicates an estimated greater likelihood that the recognized words are
correct. This field is set only for the top alternative of a non-streaming
result or, of a streaming result where |
A speech recognition result corresponding to a portion of the audio.
Name | Data Type | Description | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
channelTag
|
integer |
For multi-channel audio, this is the channel number corresponding to the recognized result for the audio from that channel. For audio_channel_count = N, its output values can range from '1' to 'N'. |
||||||||||||||||||||||||||||||||
alternatives
|
array [SpeechRecognitionAlternative] |
May contain one or more recognition hypotheses (up to the
maximum specified in |
||||||||||||||||||||||||||||||||
Properties
|
The Status
type defines a logical error model that is suitable for
different programming environments, including REST APIs and RPC APIs. It is
used by gRPC. Each Status
message contains
three pieces of data: error code, error message, and error details.
You can find out more about this error model and how to work with it in the API Design Guide.
Name | Data Type | Description |
---|---|---|
message
|
string |
A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client. |
details
|
array [object] |
A list of messages that carry the error details. There is a common set of message types for APIs to use. |
code
|
integer |
The status code, which should be an enum value of google.rpc.Code. |
Word-specific information for recognized words.
Name | Data Type | Description |
---|---|---|
word
|
string |
The word corresponding to this set of information. |
startTime
|
string |
Time offset relative to the beginning of the audio,
and corresponding to the start of the spoken word.
This field is only set if |
speakerTag
|
integer |
A distinct integer value is assigned for every speaker within the audio. This field specifies which one of those speakers was detected to have spoken this word. Value ranges from '1' to diarization_speaker_count. speaker_tag is set if enable_speaker_diarization = 'true' and only in the top alternative. |
endTime
|
string |
Time offset relative to the beginning of the audio,
and corresponding to the end of the spoken word.
This field is only set if |
'OAuth' Authentication Scheme
Reference: RFC5849, Section 3.5.1
Oauth 2.0 accessCode authentication
Flow:authorizationCode
https://accounts.google.com/o/oauth2/auth
https://accounts.google.com/o/oauth2/token
https://www.googleapis.com/auth/cloud-platform
: View and manage your data across Google Cloud Platform services
'OAuth' Authentication Scheme
Reference: RFC5849, Section 3.5.1
Oauth 2.0 implicit authentication
Flow:implicit
https://accounts.google.com/o/oauth2/auth
https://www.googleapis.com/auth/cloud-platform
: View and manage your data across Google Cloud Platform servicesName | Google |
External URL | https://google.com |
OAS (OpenAPI Specification) | v3.0.0 |