Method: projects.agent.sessions.detectIntent

Processes a natural language query and returns structured, actionable data as a result. This method is not idempotent, because it may cause contexts and session entity types to be updated, which in turn might affect results of future queries.

HTTP request

POST https://dialogflow.googleapis.com/v2beta1/{session=projects/*/agent/sessions/*}:detectIntent

The URL uses Google API HTTP annotation syntax.

Path parameters

Parameters
session

string

Required. The name of the session this query is sent to. Format: projects/<Project ID>/agent/sessions/<Session ID>. It's up to the API caller to choose an appropriate session ID. It can be a random number or some type of user identifier (preferably hashed). The length of the session ID must not exceed 36 bytes.

Authorization requires the following Google IAM permission on the specified resource session:

  • dialogflow.sessions.detectIntent

Request body

The request body contains data with the following structure:

JSON representation
{
  "queryParams": {
    object(QueryParameters)
  },
  "queryInput": {
    object(QueryInput)
  },
  "inputAudio": string,
}
Fields
queryParams

object(QueryParameters)

Optional. The parameters of this query.

queryInput

object(QueryInput)

Required. The input specification. It can be set to:

  1. an audio config which instructs the speech recognizer how to process the speech audio,

  2. a conversational query in the form of text, or

  3. an event that specifies which intent to trigger.

inputAudio

string (bytes format)

Optional. The natural language speech audio to be processed. This field should be populated iff queryInput is set to an input audio config. A single request can contain up to 1 minute of speech audio data.

A base64-encoded string.

Response body

If successful, the response body contains data with the following structure:

The message returned from the sessions.detectIntent method.

JSON representation
{
  "responseId": string,
  "queryResult": {
    object(QueryResult)
  },
  "webhookStatus": {
    object(Status)
  },
}
Fields
responseId

string

The unique identifier of the response. It can be used to locate a response in the training example set or for reporting issues.

queryResult

object(QueryResult)

The results of the conversational query or event processing.

webhookStatus

object(Status)

Specifies the status of the webhook request. webhookStatus is never populated in webhook requests.

Authorization Scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Auth Guide.

QueryParameters

Represents the parameters of the conversational query.

JSON representation
{
  "timeZone": string,
  "geoLocation": {
    object(LatLng)
  },
  "contexts": [
    {
      object(Context)
    }
  ],
  "resetContexts": boolean,
  "sessionEntityTypes": [
    {
      object(SessionEntityType)
    }
  ],
  "payload": {
    object
  },
}
Fields
timeZone

string

Optional. The time zone of this conversational query from the time zone database, e.g., America/New_York, Europe/Paris. If not provided, the time zone specified in agent settings is used.

geoLocation

object(LatLng)

Optional. The geo location of this conversational query.

contexts[]

object(Context)

Optional. The collection of contexts to be activated before this query is executed.

resetContexts

boolean

Optional. Specifies whether to delete all contexts in the current session before the new ones are activated.

sessionEntityTypes[]

object(SessionEntityType)

Optional. The collection of session entity types to replace or extend developer entities with for this query only. The entity synonyms apply to all languages.

payload

object (Struct format)

Optional. This field can be used to pass custom data into the webhook associated with the agent. Arbitrary JSON objects are supported.

LatLng

An object representing a latitude/longitude pair. This is expressed as a pair of doubles representing degrees latitude and degrees longitude. Unless specified otherwise, this must conform to the WGS84 standard. Values must be within normalized ranges.

JSON representation
{
  "latitude": number,
  "longitude": number,
}
Fields
latitude

number

The latitude in degrees. It must be in the range [-90.0, +90.0].

longitude

number

The longitude in degrees. It must be in the range [-180.0, +180.0].

QueryInput

Represents the query input. It can contain either:

  1. An audio config which instructs the speech recognizer how to process the speech audio.

  2. A conversational query in the form of text,.

  3. An event that specifies which intent to trigger.

JSON representation
{

  // Union field input can be only one of the following:
  "audioConfig": {
    object(InputAudioConfig)
  },
  "text": {
    object(TextInput)
  },
  "event": {
    object(EventInput)
  },
  // End of list of possible types for union field input.
}
Fields
Union field input. Required. The input specification. input can be only one of the following:
audioConfig

object(InputAudioConfig)

Instructs the speech recognizer how to process the speech audio.

text

object(TextInput)

The natural language text to be processed.

event

object(EventInput)

The event to be processed.

InputAudioConfig

Instructs the speech recognizer how to process the audio content.

JSON representation
{
  "audioEncoding": enum(AudioEncoding),
  "sampleRateHertz": number,
  "languageCode": string,
  "phraseHints": [
    string
  ],
}
Fields
audioEncoding

enum(AudioEncoding)

Required. Audio encoding of the audio content to process.

sampleRateHertz

number

Required. Sample rate (in Hertz) of the audio content sent in the query. Refer to Cloud Speech API documentation for more details.

languageCode

string

Required. The language of the supplied audio. Dialogflow does not do translations. See Language Support for a list of the currently supported language codes. Note that queries in the same session do not necessarily need to specify the same language.

phraseHints[]

string

Optional. The collection of phrase hints which are used to boost accuracy of speech recognition. Refer to Cloud Speech API documentation for more details.

AudioEncoding

Audio encoding of the audio content sent in the conversational query request. Refer to the Cloud Speech API documentation for more details.

Enums
AUDIO_ENCODING_UNSPECIFIED Not specified.
AUDIO_ENCODING_LINEAR_16 Uncompressed 16-bit signed little-endian samples (Linear PCM).
AUDIO_ENCODING_FLAC FLAC (Free Lossless Audio Codec) is the recommended encoding because it is lossless (therefore recognition is not compromised) and requires only about half the bandwidth of LINEAR16. FLAC stream encoding supports 16-bit and 24-bit samples, however, not all fields in STREAMINFO are supported.
AUDIO_ENCODING_MULAW 8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law.
AUDIO_ENCODING_AMR Adaptive Multi-Rate Narrowband codec. sampleRateHertz must be 8000.
AUDIO_ENCODING_AMR_WB Adaptive Multi-Rate Wideband codec. sampleRateHertz must be 16000.
AUDIO_ENCODING_OGG_OPUS Opus encoded audio frames in Ogg container (OggOpus). sampleRateHertz must be 16000.
AUDIO_ENCODING_SPEEX_WITH_HEADER_BYTE Although the use of lossy encodings is not recommended, if a very low bitrate encoding is required, OGG_OPUS is highly preferred over Speex encoding. The Speex encoding supported by Dialogflow API has a header byte in each block, as in MIME type audio/x-speex-with-header-byte. It is a variant of the RTP Speex encoding defined in RFC 5574. The stream is a sequence of blocks, one block per RTP packet. Each block starts with a byte containing the length of the block, in bytes, followed by one or more frames of Speex data, padded to an integral number of bytes (octets) as specified in RFC 5574. In other words, each RTP header is replaced with a single byte containing the block length. Only Speex wideband is supported. sampleRateHertz must be 16000.

TextInput

Represents the natural language text to be processed.

JSON representation
{
  "text": string,
  "languageCode": string,
}
Fields
text

string

Required. The UTF-8 encoded natural language text to be processed. Text length must not exceed 256 bytes.

languageCode

string

Required. The language of this conversational query. See Language Support for a list of the currently supported language codes. Note that queries in the same session do not necessarily need to specify the same language.

EventInput

Events allow for matching intents by event name instead of the natural language input. For instance, input <event: { name: “welcome_event”, parameters: { name: “Sam” } }> can trigger a personalized welcome response. The parameter name may be used by the agent in the response: “Hello #welcome_event.name! What can I do for you today?”.

JSON representation
{
  "name": string,
  "parameters": {
    object
  },
  "languageCode": string,
}
Fields
name

string

Required. The unique identifier of the event.

parameters

object (Struct format)

Optional. The collection of parameters associated with the event.

languageCode

string

Required. The language of this query. See Language Support for a list of the currently supported language codes. Note that queries in the same session do not necessarily need to specify the same language.

QueryResult

Represents the result of conversational query or event processing.

JSON representation
{
  "queryText": string,
  "languageCode": string,
  "speechRecognitionConfidence": number,
  "action": string,
  "parameters": {
    object
  },
  "allRequiredParamsPresent": boolean,
  "fulfillmentText": string,
  "fulfillmentMessages": [
    {
      object(Message)
    }
  ],
  "webhookSource": string,
  "webhookPayload": {
    object
  },
  "outputContexts": [
    {
      object(Context)
    }
  ],
  "intent": {
    object(Intent)
  },
  "intentDetectionConfidence": number,
  "diagnosticInfo": {
    object
  },
}
Fields
queryText

string

The original conversational query text: - If natural language text was provided as input, queryText contains a copy of the input. - If natural language speech audio was provided as input, queryText contains the speech recognition result. If speech recognizer produced multiple alternatives, a particular one is picked. - If an event was provided as input, queryText is not set.

languageCode

string

The language that was triggered during intent detection. See Language Support for a list of the currently supported language codes.

speechRecognitionConfidence

number

The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. The default of 0.0 is a sentinel value indicating that confidence was not set. This field is populated if natural speech audio was provided as input.

action

string

The action name from the matched intent.

parameters

object (Struct format)

The collection of extracted parameters.

allRequiredParamsPresent

boolean

This field is set to: - false if the matched intent has required parameters and not all of the required parameter values have been collected. - true if all required parameter values have been collected, or if the matched intent doesn't contain any required parameters.

fulfillmentText

string

The text to be pronounced to the user or shown on the screen.

fulfillmentMessages[]

object(Message)

The collection of rich messages to present to the user.

webhookSource

string

If the query was fulfilled by a webhook call, this field is set to the value of the source field returned in the webhook response.

webhookPayload

object (Struct format)

If the query was fulfilled by a webhook call, this field is set to the value of the payload field returned in the webhook response.

outputContexts[]

object(Context)

The collection of output contexts. If applicable, outputContexts.parameters contains entries with name <parameter name>.original containing the original parameter values before the query.

intent

object(Intent)

The intent that matched the conversational query. Some, not all fields are filled in this message, including but not limited to: name, displayName and webhookState.

intentDetectionConfidence

number

The intent detection confidence. Values range from 0.0 (completely uncertain) to 1.0 (completely certain).

diagnosticInfo

object (Struct format)

The free-form diagnostic info. For example, this field could contain webhook call latency.