Method: projects.agent.sessions.detectIntent

Processes a natural language query and returns structured, actionable data as a result. This method is not idempotent, because it may cause contexts and session entity types to be updated, which in turn might affect results of future queries.

HTTP request

POST https://dialogflow.googleapis.com/v2/{session=projects/*/agent/sessions/*}:detectIntent

The URL uses Google API HTTP annotation syntax.

Path parameters

Parameters
session

string

Required. The name of the session this query is sent to. Format: projects/<Project ID>/agent/sessions/<Session ID>. It's up to the API caller to choose an appropriate session ID. It can be a random number or some type of user identifier (preferably hashed). The length of the session ID must not exceed 36 bytes.

Authorization requires the following Google IAM permission on the specified resource session:

  • dialogflow.sessions.detectIntent

Request body

The request body contains data with the following structure:

JSON representation
{
  "queryParams": {
    object(QueryParameters)
  },
  "queryInput": {
    object(QueryInput)
  },
  "inputAudio": string
}
Fields
queryParams

object(QueryParameters)

Optional. The parameters of this query.

queryInput

object(QueryInput)

Required. The input specification. It can be set to:

  1. an audio config which instructs the speech recognizer how to process the speech audio,

  2. a conversational query in the form of text, or

  3. an event that specifies which intent to trigger.

inputAudio

string (bytes format)

Optional. The natural language speech audio to be processed. This field should be populated iff queryInput is set to an input audio config. A single request can contain up to 1 minute of speech audio data.

A base64-encoded string.

Response body

If successful, the response body contains data with the following structure:

The message returned from the sessions.detectIntent method.

JSON representation
{
  "responseId": string,
  "queryResult": {
    object(QueryResult)
  },
  "webhookStatus": {
    object(Status)
  }
}
Fields
responseId

string

The unique identifier of the response. It can be used to locate a response in the training example set or for reporting issues.

queryResult

object(QueryResult)

The results of the conversational query or event processing.

webhookStatus

object(Status)

Specifies the status of the webhook request. webhookStatus is never populated in webhook requests.

Authorization Scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the OAuth 2.0 Overview.

QueryParameters

Represents the parameters of the conversational query.

JSON representation
{
  "timeZone": string,
  "geoLocation": {
    object(LatLng)
  },
  "contexts": [
    {
      object(Context)
    }
  ],
  "resetContexts": boolean,
  "sessionEntityTypes": [
    {
      object(SessionEntityType)
    }
  ],
  "payload": {
    object
  }
}
Fields
timeZone

string

Optional. The time zone of this conversational query from the time zone database, e.g., America/New_York, Europe/Paris. If not provided, the time zone specified in agent settings is used.

geoLocation

object(LatLng)

Optional. The geo location of this conversational query.

contexts[]

object(Context)

Optional. The collection of contexts to be activated before this query is executed.

resetContexts

boolean

Optional. Specifies whether to delete all contexts in the current session before the new ones are activated.

sessionEntityTypes[]

object(SessionEntityType)

Optional. The collection of session entity types to replace or extend developer entities with for this query only. The entity synonyms apply to all languages.

payload

object (Struct format)

Optional. This field can be used to pass custom data into the webhook associated with the agent. Arbitrary JSON objects are supported.

QueryInput

Represents the query input. It can contain either:

  1. An audio config which instructs the speech recognizer how to process the speech audio.

  2. A conversational query in the form of text,.

  3. An event that specifies which intent to trigger.

JSON representation
{

  // Union field input can be only one of the following:
  "audioConfig": {
    object(InputAudioConfig)
  },
  "text": {
    object(TextInput)
  },
  "event": {
    object(EventInput)
  }
  // End of list of possible types for union field input.
}
Fields
Union field input. Required. The input specification. input can be only one of the following:
audioConfig

object(InputAudioConfig)

Instructs the speech recognizer how to process the speech audio.

text

object(TextInput)

The natural language text to be processed.

event

object(EventInput)

The event to be processed.

InputAudioConfig

Instructs the speech recognizer how to process the audio content.

JSON representation
{
  "audioEncoding": enum(AudioEncoding),
  "sampleRateHertz": number,
  "languageCode": string,
  "phraseHints": [
    string
  ]
}
Fields
audioEncoding

enum(AudioEncoding)

Required. Audio encoding of the audio content to process.

sampleRateHertz

number

Required. Sample rate (in Hertz) of the audio content sent in the query. Refer to Cloud Speech API documentation for more details.

languageCode

string

Required. The language of the supplied audio. Dialogflow does not do translations. See Language Support for a list of the currently supported language codes. Note that queries in the same session do not necessarily need to specify the same language.

phraseHints[]

string

Optional. The collection of phrase hints which are used to boost accuracy of speech recognition. Refer to Cloud Speech API documentation for more details.

AudioEncoding

Audio encoding of the audio content sent in the conversational query request. Refer to the Cloud Speech API documentation for more details.

Enums
AUDIO_ENCODING_UNSPECIFIED Not specified.
AUDIO_ENCODING_LINEAR_16 Uncompressed 16-bit signed little-endian samples (Linear PCM).
AUDIO_ENCODING_FLAC FLAC (Free Lossless Audio Codec) is the recommended encoding because it is lossless (therefore recognition is not compromised) and requires only about half the bandwidth of LINEAR16. FLAC stream encoding supports 16-bit and 24-bit samples, however, not all fields in STREAMINFO are supported.
AUDIO_ENCODING_MULAW 8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law.
AUDIO_ENCODING_AMR Adaptive Multi-Rate Narrowband codec. sampleRateHertz must be 8000.
AUDIO_ENCODING_AMR_WB Adaptive Multi-Rate Wideband codec. sampleRateHertz must be 16000.
AUDIO_ENCODING_OGG_OPUS Opus encoded audio frames in Ogg container (OggOpus). sampleRateHertz must be 16000.
AUDIO_ENCODING_SPEEX_WITH_HEADER_BYTE Although the use of lossy encodings is not recommended, if a very low bitrate encoding is required, OGG_OPUS is highly preferred over Speex encoding. The Speex encoding supported by Dialogflow API has a header byte in each block, as in MIME type audio/x-speex-with-header-byte. It is a variant of the RTP Speex encoding defined in RFC 5574. The stream is a sequence of blocks, one block per RTP packet. Each block starts with a byte containing the length of the block, in bytes, followed by one or more frames of Speex data, padded to an integral number of bytes (octets) as specified in RFC 5574. In other words, each RTP header is replaced with a single byte containing the block length. Only Speex wideband is supported. sampleRateHertz must be 16000.

TextInput

Represents the natural language text to be processed.

JSON representation
{
  "text": string,
  "languageCode": string
}
Fields
text

string

Required. The UTF-8 encoded natural language text to be processed. Text length must not exceed 256 bytes.

languageCode

string

Required. The language of this conversational query. See Language Support for a list of the currently supported language codes. Note that queries in the same session do not necessarily need to specify the same language.