The ML.GENERATE_TEXT function
This document describes the ML.GENERATE_TEXT function, which lets you perform
generative natural language tasks by using text from BigQuery
standard tables, or
unstructured data from BigQuery
object tables.
The function works by sending requests to a BigQuery ML remote model that represents a Vertex AI model, and then returning that model's response. The following types of remote models are supported:
- Remote models over pre-trained Vertex AI models.
- Remote models over Anthropic Claude models (Preview).
- Remote models over supported open models.
Several of the ML.GENERATE_TEXT function's arguments provide the
parameters that shape the Vertex AI model's response.
You can use the ML.GENERATE_TEXT function to perform tasks such as
classification, sentiment analysis, image captioning, and transcription. For
more information on the types of tasks the Vertex AI
models can perform, see the following topics:
Prompt design can strongly affect the responses returned by the Vertex AI model. For more information, see Design multimodal prompts or Design text prompts.
Input
The input you can provide to ML.GENERATE_TEXT varies depending on the
Vertex AI model that you reference from your remote model.
Input for Gemini 1.5 and 2.0 models
When you use the Gemini 1.5 or 2.0 models, you can analyze content from an object table using prompt data that you provide as a function argument, or you can generate text by providing prompt data in a query or from a column in a standard table. If you are using content from an object table, it must meet the following requirements:
- Content must be in one of the supported formats that are
described in the Gemini API model
mimeTypeparameter. - The supported maximum video length is 2 minutes. If the video is longer than
2 minutes,
ML.GENERATE_TEXTonly returns results for the first 2 minutes.
Input for a gemini-1.0-pro-vision model
When you use the gemini-1.0-pro-vision model, you can analyze visual
content from an object table using
prompt data that you provide as a function argument. The visual
content must meet the following requirements:
- Content must be in one of the supported image or video formats that are
described in the Gemini API model
mimeTypeparameter. - Each piece of content must be no greater than 20 MB.
- The supported maximum video length is 2 minutes. If the video is longer than
2 minutes,
ML.GENERATE_TEXTonly returns results for the first 2 minutes.
Input for other types of models
For the following types of models, you can generate text by providing prompt data in a query or from a column in a standard table:
- Vertex AI PaLM API models
gemini-1.0-promodels- Anthropic Claude models
- Supported open models
Syntax
ML.GENERATE_TEXT syntax differs depending on the Vertex AI
model that your remote models references. Choose the option appropriate for your
use case.
gemini-2.0-flash
Analyze text data from a standard table
ML.GENERATE_TEXT( MODELproject_id.dataset.model, { TABLEproject_id.dataset.table| (query_statement) }, STRUCT( temperature AS temperature [, max_output_tokens AS max_output_tokens] [, top_p AS top_p] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences] [, ground_with_google_search AS ground_with_google_search] [, safety_settings AS safety_settings]) )
Arguments
ML.GENERATE_TEXT takes the following arguments:
project_id: your project ID.dataset: the BigQuery dataset that contains the model.model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODELstatement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table: the name of the BigQuery table that contains the prompt data. The text in the column that's namedpromptis sent to the model. If your table does not have apromptcolumn, use aSELECTstatement for this argument to provide an alias for an existing table column. An error occurs if nopromptcolumn is available.query_statement: the GoogleSQL query that generates the prompt data.
max_output_tokens: anINT64value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. Specify a lower value for shorter responses and a higher value for longer responses. The default is128.
top_p: aFLOAT64value in the range[0.0,1.0]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_pvalue. For example, if tokens A, B, and C have a probability of0.3,0.2, and0.1, and thetop_pvalue is0.5, then the model selects either A or B as the next token by using thetemperaturevalue and doesn't consider C.temperature: aFLOAT64value that controls the degree of randomness in token selection. Thetemperaturevalue must be greater than0.0and less than or equal to1.0. Lowertemperaturevalues are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperaturevalues can lead to more diverse or creative results.flatten_json_output: aBOOLvalue that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE.stop_sequences: anARRAY<STRING>value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
ground_with_google_search: aBOOLvalue that determines whether the Vertex AI model uses Grounding with Google Search when generating responses. Grounding lets the model use additional information from the internet when generating a response, in order to make model responses more specific and factual. When bothflatten_json_outputand this field are set toTRUE, an additionalml_generate_text_grounding_resultcolumn is included in the results, providing the sources that the model used to gather additional information. The default isFALSE.
safety_settings: anARRAY<STRUCT<STRING AS category, STRING AS threshold>>value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify bothSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)andSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold). If there is no safety setting for a given category, theBLOCK_MEDIUM_AND_ABOVEsafety setting is used.Supported categories are as follows:
HARM_CATEGORY_HATE_SPEECHHARM_CATEGORY_DANGEROUS_CONTENTHARM_CATEGORY_HARASSMENTHARM_CATEGORY_SEXUALLY_EXPLICIT
Supported thresholds are as follows:
BLOCK_NONE(Restricted)BLOCK_LOW_AND_ABOVEBLOCK_MEDIUM_AND_ABOVE(Default)BLOCK_ONLY_HIGHHARM_BLOCK_THRESHOLD_UNSPECIFIED
For more information, refer to the definition of safety category and blocking threshold.
Details
The model and input table must be in the same region.
Analyze unstructured data from an object table
ML.GENERATE_TEXT( MODELproject_id.dataset.model, TABLEproject_id.dataset.table, STRUCT( prompt AS prompt, temperature AS temperature [, max_output_tokens AS max_output_tokens] [, top_p AS top_p] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences] [, safety_settings AS safety_settings]) )
Arguments
ML.GENERATE_TEXT takes the following arguments:
project_id: your project ID.dataset: the BigQuery dataset that contains the model.model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODELstatement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table: the name of the object table that contains the content to analyze. For more information on what types of content you can analyze, see Input.The Cloud Storage bucket used by the input object table must be in the same project where you have created the model and where you are calling the
ML.GENERATE_TEXTfunction.
prompt: aSTRINGvalue that contains the prompt to use to analyze the visual content. Thepromptvalue must contain less than 16,000 tokens. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words.
max_output_tokens: anINT64value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. Specify a lower value for shorter responses and a higher value for longer responses. The default is128.
top_p: aFLOAT64value in the range[0.0,1.0]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_pvalue. For example, if tokens A, B, and C have a probability of0.3,0.2, and0.1, and thetop_pvalue is0.5, then the model selects either A or B as the next token by using thetemperaturevalue and doesn't consider C.temperature: aFLOAT64value that controls the degree of randomness in token selection. Thetemperaturevalue must be greater than0.0and less than or equal to1.0. Lowertemperaturevalues are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperaturevalues can lead to more diverse or creative results.flatten_json_output: aBOOLvalue that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE.stop_sequences: anARRAY<STRING>value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
safety_settings: anARRAY<STRUCT<STRING AS category, STRING AS threshold>>value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify bothSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)andSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold). If there is no safety setting for a given category, theBLOCK_MEDIUM_AND_ABOVEsafety setting is used.Supported categories are as follows:
HARM_CATEGORY_HATE_SPEECHHARM_CATEGORY_DANGEROUS_CONTENTHARM_CATEGORY_HARASSMENTHARM_CATEGORY_SEXUALLY_EXPLICIT
Supported thresholds are as follows:
BLOCK_NONE(Restricted)BLOCK_LOW_AND_ABOVEBLOCK_MEDIUM_AND_ABOVE(Default)BLOCK_ONLY_HIGHHARM_BLOCK_THRESHOLD_UNSPECIFIED
For more information, refer to the definition of safety category and blocking threshold.
Details
The model and input table must be in the same region.
gemini-1.5-flash
Analyze text data from a standard table
ML.GENERATE_TEXT( MODELproject_id.dataset.model, { TABLEproject_id.dataset.table| (query_statement) }, STRUCT( [max_output_tokens AS max_output_tokens] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences] [, ground_with_google_search AS ground_with_google_search] [, safety_settings AS safety_settings]) )
Arguments
ML.GENERATE_TEXT takes the following arguments:
project_id: your project ID.dataset: the BigQuery dataset that contains the model.model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODELstatement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table: the name of the BigQuery table that contains the prompt data. The text in the column that's namedpromptis sent to the model. If your table does not have apromptcolumn, use aSELECTstatement for this argument to provide an alias for an existing table column. An error occurs if nopromptcolumn is available.query_statement: the GoogleSQL query that generates the prompt data.
max_output_tokens: anINT64value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,8192]. Specify a lower value for shorter responses and a higher value for longer responses. The default is128.
top_p: aFLOAT64value in the range[0.0,1.0]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_pvalue. For example, if tokens A, B, and C have a probability of0.3,0.2, and0.1, and thetop_pvalue is0.5, then the model selects either A or B as the next token by using thetemperaturevalue and doesn't consider C.temperature: aFLOAT64value in the range[0.0,1.0]that controls the degree of randomness in token selection. Lowertemperaturevalues are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperaturevalues can lead to more diverse or creative results. Atemperaturevalue of0is deterministic, meaning that the highest probability response is always selected. The default is1.0.flatten_json_output: aBOOLvalue that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE.stop_sequences: anARRAY<STRING>value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
ground_with_google_search: aBOOLvalue that determines whether the Vertex AI model uses Grounding with Google Search when generating responses. Grounding lets the model use additional information from the internet when generating a response, in order to make model responses more specific and factual. When bothflatten_json_outputand this field are set toTRUE, an additionalml_generate_text_grounding_resultcolumn is included in the results, providing the sources that the model used to gather additional information. The default isFALSE.
safety_settings: anARRAY<STRUCT<STRING AS category, STRING AS threshold>>value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify bothSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)andSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold). If there is no safety setting for a given category, theBLOCK_MEDIUM_AND_ABOVEsafety setting is used.Supported categories are as follows:
HARM_CATEGORY_HATE_SPEECHHARM_CATEGORY_DANGEROUS_CONTENTHARM_CATEGORY_HARASSMENTHARM_CATEGORY_SEXUALLY_EXPLICIT
Supported thresholds are as follows:
BLOCK_NONE(Restricted)BLOCK_LOW_AND_ABOVEBLOCK_MEDIUM_AND_ABOVE(Default)BLOCK_ONLY_HIGHHARM_BLOCK_THRESHOLD_UNSPECIFIED
For more information, refer to the definition of safety category and blocking threshold.
Details
The model and input table must be in the same region.
Analyze unstructured data from an object table
ML.GENERATE_TEXT( MODELproject_id.dataset.model, TABLEproject_id.dataset.table, STRUCT( prompt AS prompt [, max_output_tokens AS max_output_tokens] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences] [, safety_settings AS safety_settings]) )
Arguments
ML.GENERATE_TEXT takes the following arguments:
project_id: your project ID.dataset: the BigQuery dataset that contains the model.model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODELstatement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table: the name of the object table that contains the content to analyze. For more information on what types of content you can analyze, see Input.The Cloud Storage bucket used by the input object table must be in the same project where you have created the model and where you are calling the
ML.GENERATE_TEXTfunction.
prompt: aSTRINGvalue that contains the prompt to use to analyze the visual content. Thepromptvalue must contain less than 16,000 tokens. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words.
max_output_tokens: anINT64value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,8192]. Specify a lower value for shorter responses and a higher value for longer responses. The default is128.
top_p: aFLOAT64value in the range[0.0,1.0]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_pvalue. For example, if tokens A, B, and C have a probability of0.3,0.2, and0.1, and thetop_pvalue is0.5, then the model selects either A or B as the next token by using thetemperaturevalue and doesn't consider C.temperature: aFLOAT64value in the range[0.0,1.0]that controls the degree of randomness in token selection. Lowertemperaturevalues are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperaturevalues can lead to more diverse or creative results. Atemperaturevalue of0is deterministic, meaning that the highest probability response is always selected. The default is1.0.flatten_json_output: aBOOLvalue that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE.stop_sequences: anARRAY<STRING>value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
safety_settings: anARRAY<STRUCT<STRING AS category, STRING AS threshold>>value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify bothSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)andSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold). If there is no safety setting for a given category, theBLOCK_MEDIUM_AND_ABOVEsafety setting is used.Supported categories are as follows:
HARM_CATEGORY_HATE_SPEECHHARM_CATEGORY_DANGEROUS_CONTENTHARM_CATEGORY_HARASSMENTHARM_CATEGORY_SEXUALLY_EXPLICIT
Supported thresholds are as follows:
BLOCK_NONE(Restricted)BLOCK_LOW_AND_ABOVEBLOCK_MEDIUM_AND_ABOVE(Default)BLOCK_ONLY_HIGHHARM_BLOCK_THRESHOLD_UNSPECIFIED
For more information, refer to the definition of safety category and blocking threshold.
Details
The model and input table must be in the same region.
gemini-1.5-pro
Analyze text data from a standard table
ML.GENERATE_TEXT( MODELproject_id.dataset.model, { TABLEproject_id.dataset.table| (query_statement) }, STRUCT( [max_output_tokens AS max_output_tokens] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences] [, ground_with_google_search AS ground_with_google_search] [, safety_settings AS safety_settings]) )
Arguments
ML.GENERATE_TEXT takes the following arguments:
project_id: your project ID.dataset: the BigQuery dataset that contains the model.model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODELstatement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table: the name of the BigQuery table that contains the prompt data. The text in the column that's namedpromptis sent to the model. If your table does not have apromptcolumn, use aSELECTstatement for this argument to provide an alias for an existing table column. An error occurs if nopromptcolumn is available.query_statement: the GoogleSQL query that generates the prompt data.
max_output_tokens: anINT64value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,8192]. Specify a lower value for shorter responses and a higher value for longer responses. The default is128.
top_p: aFLOAT64value in the range[0.0,1.0]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_pvalue. For example, if tokens A, B, and C have a probability of0.3,0.2, and0.1, and thetop_pvalue is0.5, then the model selects either A or B as the next token by using thetemperaturevalue and doesn't consider C.temperature: aFLOAT64value in the range[0.0,1.0]that controls the degree of randomness in token selection. Lowertemperaturevalues are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperaturevalues can lead to more diverse or creative results. Atemperaturevalue of0is deterministic, meaning that the highest probability response is always selected. The default is1.0.flatten_json_output: aBOOLvalue that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE.stop_sequences: anARRAY<STRING>value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
ground_with_google_search: aBOOLvalue that determines whether the Vertex AI model uses Grounding with Google Search when generating responses. Grounding lets the model use additional information from the internet when generating a response, in order to make model responses more specific and factual. When bothflatten_json_outputand this field are set toTRUE, an additionalml_generate_text_grounding_resultcolumn is included in the results, providing the sources that the model used to gather additional information. The default isFALSE.
safety_settings: anARRAY<STRUCT<STRING AS category, STRING AS threshold>>value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify bothSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)andSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold). If there is no safety setting for a given category, theBLOCK_MEDIUM_AND_ABOVEsafety setting is used.Supported categories are as follows:
HARM_CATEGORY_HATE_SPEECHHARM_CATEGORY_DANGEROUS_CONTENTHARM_CATEGORY_HARASSMENTHARM_CATEGORY_SEXUALLY_EXPLICIT
Supported thresholds are as follows:
BLOCK_NONE(Restricted)BLOCK_LOW_AND_ABOVEBLOCK_MEDIUM_AND_ABOVE(Default)BLOCK_ONLY_HIGHHARM_BLOCK_THRESHOLD_UNSPECIFIED
For more information, refer to the definition of safety category and blocking threshold.
Details
The model and input table must be in the same region.
To analyze unstructured data from an object table
ML.GENERATE_TEXT( MODELproject_id.dataset.model, TABLEproject_id.dataset.table, STRUCT( prompt AS prompt [, max_output_tokens AS max_output_tokens] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences] [, safety_settings AS safety_settings]) )
Arguments
ML.GENERATE_TEXT takes the following arguments:
project_id: your project ID.dataset: the BigQuery dataset that contains the model.model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODELstatement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table: the name of the object table that contains the content to analyze. For more information on what types of content you can analyze, see Input.The Cloud Storage bucket used by the input object table must be in the same project where you have created the model and where you are calling the
ML.GENERATE_TEXTfunction.
prompt: aSTRINGvalue that contains the prompt to use to analyze the visual content. Thepromptvalue must contain less than 16,000 tokens. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words.
max_output_tokens: anINT64value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,8192]. Specify a lower value for shorter responses and a higher value for longer responses. The default is128.
top_p: aFLOAT64value in the range[0.0,1.0]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_pvalue. For example, if tokens A, B, and C have a probability of0.3,0.2, and0.1, and thetop_pvalue is0.5, then the model selects either A or B as the next token by using thetemperaturevalue and doesn't consider C.temperature: aFLOAT64value in the range[0.0,1.0]that controls the degree of randomness in token selection. Lowertemperaturevalues are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperaturevalues can lead to more diverse or creative results. Atemperaturevalue of0is deterministic, meaning that the highest probability response is always selected. The default is1.0.flatten_json_output: aBOOLvalue that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE.stop_sequences: anARRAY<STRING>value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
safety_settings: anARRAY<STRUCT<STRING AS category, STRING AS threshold>>value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify bothSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)andSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold). If there is no safety setting for a given category, theBLOCK_MEDIUM_AND_ABOVEsafety setting is used.Supported categories are as follows:
HARM_CATEGORY_HATE_SPEECHHARM_CATEGORY_DANGEROUS_CONTENTHARM_CATEGORY_HARASSMENTHARM_CATEGORY_SEXUALLY_EXPLICIT
Supported thresholds are as follows:
BLOCK_NONE(Restricted)BLOCK_LOW_AND_ABOVEBLOCK_MEDIUM_AND_ABOVE(Default)BLOCK_ONLY_HIGHHARM_BLOCK_THRESHOLD_UNSPECIFIED
For more information, refer to the definition of safety category and blocking threshold.
Details
The model and input table must be in the same region.
gemini-pro
ML.GENERATE_TEXT( MODELproject_id.dataset.model, { TABLEproject_id.dataset.table| (query_statement) }, STRUCT( [max_output_tokens AS max_output_tokens] [, top_k AS top_k] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences] [, ground_with_google_search AS ground_with_google_search] [, safety_settings AS safety_settings]) )
Arguments
ML.GENERATE_TEXT takes the following arguments:
project_id: your project ID.dataset: the BigQuery dataset that contains the model.model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODELstatement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table: the name of the BigQuery table that contains the prompt data. The text in the column that's namedpromptis sent to the model. If your table does not have apromptcolumn, use aSELECTstatement for this argument to provide an alias for an existing table column. An error occurs if nopromptcolumn is available.query_statement: the GoogleSQL query that generates the prompt data.
max_output_tokens: anINT64value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,8192]. Specify a lower value for shorter responses and a higher value for longer responses. The default is128.
top_k: anINT64value in the range[1,40]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is40.A
top_kvalue of1means the next selected token is the most probable among all tokens in the model's vocabulary, while atop_kvalue of3means that the next token is selected from among the three most probable tokens by using thetemperaturevalue.For each token selection step, the
top_ktokens with the highest probabilities are sampled. Then tokens are further filtered based on thetop_pvalue, with the final token selected using temperature sampling.
top_p: aFLOAT64value in the range[0.0,1.0]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_pvalue. For example, if tokens A, B, and C have a probability of0.3,0.2, and0.1, and thetop_pvalue is0.5, then the model selects either A or B as the next token by using thetemperaturevalue and doesn't consider C.temperature: aFLOAT64value in the range[0.0,1.0]that controls the degree of randomness in token selection. Lowertemperaturevalues are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperaturevalues can lead to more diverse or creative results. Atemperaturevalue of0is deterministic, meaning that the highest probability response is always selected. The default is0.flatten_json_output: aBOOLvalue that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE.stop_sequences: anARRAY<STRING>value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
ground_with_google_search: aBOOLvalue that determines whether the Vertex AI model uses Grounding with Google Search when generating responses. Grounding lets the model use additional information from the internet when generating a response, in order to make model responses more specific and factual. When bothflatten_json_outputand this field are set toTRUE, an additionalml_generate_text_grounding_resultcolumn is included in the results, providing the sources that the model used to gather additional information. The default isFALSE.
safety_settings: anARRAY<STRUCT<STRING AS category, STRING AS threshold>>value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify bothSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)andSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold). If there is no safety setting for a given category, theBLOCK_MEDIUM_AND_ABOVEsafety setting is used.Supported categories are as follows:
HARM_CATEGORY_HATE_SPEECHHARM_CATEGORY_DANGEROUS_CONTENTHARM_CATEGORY_HARASSMENTHARM_CATEGORY_SEXUALLY_EXPLICIT
Supported thresholds are as follows:
BLOCK_NONE(Restricted)BLOCK_LOW_AND_ABOVEBLOCK_MEDIUM_AND_ABOVE(Default)BLOCK_ONLY_HIGHHARM_BLOCK_THRESHOLD_UNSPECIFIED
For more information, refer to the definition of safety category and blocking threshold.
Details
The model and input table must be in the same region.
gemini-pro-vision
ML.GENERATE_TEXT( MODELproject_id.dataset.model, TABLEproject_id.dataset.table, STRUCT( prompt AS prompt [, max_output_tokens AS max_output_tokens] [, top_k AS top_k] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences] [, safety_settings AS safety_settings]) )
Arguments
ML.GENERATE_TEXT takes the following arguments:
project_id: your project ID.dataset: the BigQuery dataset that contains the model.model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODELstatement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table: the name of the object table that contains the content to analyze. For more information on what types of content you can analyze, see Input.The Cloud Storage bucket used by the input object table must be in the same project where you have created the model and where you are calling the
ML.GENERATE_TEXTfunction.
prompt: aSTRINGvalue that contains the prompt to use to analyze the visual content. Thepromptvalue must contain less than 16,000 tokens. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words.
max_output_tokens: anINT64value that sets the maximum number of tokens that can be generated in the response. This value must be in the range[1,2048]. Specify a lower value for shorter responses and a higher value for longer responses. The default is2048.
top_k: anINT64value in the range[1,40]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is32.A
top_kvalue of1means the next selected token is the most probable among all tokens in the model's vocabulary, while atop_kvalue of3means that the next token is selected from among the three most probable tokens by using thetemperaturevalue.For each token selection step, the
top_ktokens with the highest probabilities are sampled. Then tokens are further filtered based on thetop_pvalue, with the final token selected using temperature sampling.
top_p: aFLOAT64value in the range[0.0,1.0]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_pvalue. For example, if tokens A, B, and C have a probability of0.3,0.2, and0.1, and thetop_pvalue is0.5, then the model selects either A or B as the next token by using thetemperaturevalue and doesn't consider C.temperature: aFLOAT64value in the range[0.0,1.0]that controls the degree of randomness in token selection. Lowertemperaturevalues are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperaturevalues can lead to more diverse or creative results. Atemperaturevalue of0is deterministic, meaning that the highest probability response is always selected. The default is0.4.flatten_json_output: aBOOLvalue that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE.stop_sequences: anARRAY<STRING>value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
safety_settings: anARRAY<STRUCT<STRING AS category, STRING AS threshold>>value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify bothSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)andSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold). If there is no safety setting for a given category, theBLOCK_MEDIUM_AND_ABOVEsafety setting is used.Supported categories are as follows:
HARM_CATEGORY_HATE_SPEECHHARM_CATEGORY_DANGEROUS_CONTENTHARM_CATEGORY_HARASSMENTHARM_CATEGORY_SEXUALLY_EXPLICIT
Supported thresholds are as follows:
BLOCK_NONE(Restricted)BLOCK_LOW_AND_ABOVEBLOCK_MEDIUM_AND_ABOVE(Default)BLOCK_ONLY_HIGHHARM_BLOCK_THRESHOLD_UNSPECIFIED
For more information, refer to the definition of safety category and blocking threshold.
Details
The model and input table must be in the same region.
Claude
ML.GENERATE_TEXT( MODELproject_id.dataset.model, { TABLEproject_id.dataset.table| (query_statement) }, STRUCT( [max_output_tokens AS max_output_tokens] [, top_k AS top_k] [, top_p AS top_p] [, flatten_json_output AS flatten_json_output]) )
Arguments
ML.GENERATE_TEXT takes the following arguments:
project_id: your project ID.dataset: the BigQuery dataset that contains the model.model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODELstatement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table: the name of the BigQuery table that contains the prompt data. The text in the column that's namedpromptis sent to the model. If your table does not have apromptcolumn, use aSELECTstatement for this argument to provide an alias for an existing table column. An error occurs if nopromptcolumn is available.query_statement: the GoogleSQL query that generates the prompt data.
max_output_tokens: anINT64value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,4096]. Specify a lower value for shorter responses and a higher value for longer responses. The default is128.
top_k: anINT64value in the range[1,40]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. If you don't specify a value, the model determines an appropriate value.A
top_kvalue of1means the next selected token is the most probable among all tokens in the model's vocabulary, while atop_kvalue of3means that the next token is selected from among the three most probable tokens by using thetemperaturevalue.For each token selection step, the
top_ktokens with the highest probabilities are sampled. Then tokens are further filtered based on thetop_pvalue, with the final token selected using temperature sampling.
top_p: aFLOAT64value in the range[0.0,1.0]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. If you don't specify a value, the model determines an appropriate value.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_pvalue. For example, if tokens A, B, and C have a probability of0.3,0.2, and0.1, and thetop_pvalue is0.5, then the model selects either A or B as the next token by using thetemperaturevalue and doesn't consider C.flatten_json_output: aBOOLvalue that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE.
Details
The model and input table must be in the same region.
Open models
ML.GENERATE_TEXT( MODELproject_id.dataset.model, { TABLEproject_id.dataset.table| (query_statement) }, STRUCT( [max_output_tokens AS max_output_tokens] [, top_k AS top_k] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output]) )
Arguments
ML.GENERATE_TEXT takes the following arguments:
project_id: your project ID.dataset: the BigQuery dataset that contains the model.model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODELstatement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table: the name of the BigQuery table that contains the prompt data. The text in the column that's namedpromptis sent to the model. If your table does not have apromptcolumn, use aSELECTstatement for this argument to provide an alias for an existing table column. An error occurs if nopromptcolumn is available.query_statement: the GoogleSQL query that generates the prompt data.
max_output_tokens: anINT64value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,4096]. Specify a lower value for shorter responses and a higher value for longer responses. If you don't specify a value, the model determines an appropriate value.
top_k: anINT64value in the range[1,40]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. If you don't specify a value, the model determines an appropriate value.A
top_kvalue of1means the next selected token is the most probable among all tokens in the model's vocabulary, while atop_kvalue of3means that the next token is selected from among the three most probable tokens by using thetemperaturevalue.For each token selection step, the
top_ktokens with the highest probabilities are sampled. Then tokens are further filtered based on thetop_pvalue, with the final token selected using temperature sampling.
top_p: aFLOAT64value in the range[0.0,1.0]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. If you don't specify a value, the model determines an appropriate value.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_pvalue. For example, if tokens A, B, and C have a probability of0.3,0.2, and0.1, and thetop_pvalue is0.5, then the model selects either A or B as the next token by using thetemperaturevalue and doesn't consider C.temperature: aFLOAT64value in the range[0.0,1.0]that controls the degree of randomness in token selection. Lowertemperaturevalues are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperaturevalues can lead to more diverse or creative results. Atemperaturevalue of0is deterministic, meaning that the highest probability response is always selected. If you don't specify a value, the model determines an appropriate value.flatten_json_output: aBOOLvalue that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE.
Details
The model and input table must be in the same region.
text-bison
ML.GENERATE_TEXT( MODELproject_id.dataset.model, { TABLEproject_id.dataset.table| (query_statement) }, STRUCT( [max_output_tokens AS max_output_tokens] [, top_k AS top_k] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences]) )
Arguments
ML.GENERATE_TEXT takes the following arguments:
project_id: your project ID.dataset: the BigQuery dataset that contains the model.model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODELstatement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table: the name of the BigQuery table that contains the prompt data. The text in the column that's namedpromptis sent to the model. If your table does not have apromptcolumn, use aSELECTstatement for this argument to provide an alias for an existing table column. An error occurs if nopromptcolumn is available.query_statement: the GoogleSQL query that generates the prompt data.
max_output_tokens: anINT64value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,1024]. Specify a lower value for shorter responses and a higher value for longer responses. The default is128.
top_k: anINT64value in the range[1,40]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is40.A
top_kvalue of1means the next selected token is the most probable among all tokens in the model's vocabulary, while atop_kvalue of3means that the next token is selected from among the three most probable tokens by using thetemperaturevalue.For each token selection step, the
top_ktokens with the highest probabilities are sampled. Then tokens are further filtered based on thetop_pvalue, with the final token selected using temperature sampling.
top_p: aFLOAT64value in the range[0.0,1.0]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_pvalue. For example, if tokens A, B, and C have a probability of0.3,0.2, and0.1, and thetop_pvalue is0.5, then the model selects either A or B as the next token by using thetemperaturevalue and doesn't consider C.temperature: aFLOAT64value in the range[0.0,1.0]that controls the degree of randomness in token selection. Lowertemperaturevalues are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperaturevalues can lead to more diverse or creative results. Atemperaturevalue of0is deterministic, meaning that the highest probability response is always selected. The default is0.flatten_json_output: aBOOLvalue that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE.stop_sequences: anARRAY<STRING>value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
Details
The model and input table must be in the same region.
text-bison-32
ML.GENERATE_TEXT( MODELproject_id.dataset.model, { TABLEproject_id.dataset.table| (query_statement) }, STRUCT( [max_output_tokens AS max_output_tokens] [, top_k AS top_k] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences]) )
Arguments
ML.GENERATE_TEXT takes the following arguments:
project_id: your project ID.dataset: the BigQuery dataset that contains the model.model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODELstatement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table: the name of the BigQuery table that contains the prompt data. The text in the column that's namedpromptis sent to the model. If your table does not have apromptcolumn, use aSELECTstatement for this argument to provide an alias for an existing table column. An error occurs if nopromptcolumn is available.query_statement: the GoogleSQL query that generates the prompt data.
max_output_tokens: anINT64value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,8192]. Specify a lower value for shorter responses and a higher value for longer responses. The default is128.
top_k: anINT64value in the range[1,40]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is40.A
top_kvalue of1means the next selected token is the most probable among all tokens in the model's vocabulary, while atop_kvalue of3means that the next token is selected from among the three most probable tokens by using thetemperaturevalue.For each token selection step, the
top_ktokens with the highest probabilities are sampled. Then tokens are further filtered based on thetop_pvalue, with the final token selected using temperature sampling.
top_p: aFLOAT64value in the range[0.0,1.0]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_pvalue. For example, if tokens A, B, and C have a probability of0.3,0.2, and0.1, and thetop_pvalue is0.5, then the model selects either A or B as the next token by using thetemperaturevalue and doesn't consider C.temperature: aFLOAT64value in the range[0.0,1.0]that controls the degree of randomness in token selection. Lowertemperaturevalues are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperaturevalues can lead to more diverse or creative results. Atemperaturevalue of0is deterministic, meaning that the highest probability response is always selected. The default is0.flatten_json_output: aBOOLvalue that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE.stop_sequences: anARRAY<STRING>value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
Details
The model and input table must be in the same region.
text-unicorn
ML.GENERATE_TEXT( MODELproject_id.dataset.model, { TABLEproject_id.dataset.table| (query_statement) }, STRUCT( [max_output_tokens AS max_output_tokens] [, top_k AS top_k] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences]) )
Arguments
ML.GENERATE_TEXT takes the following arguments:
project_id: your project ID.dataset: the BigQuery dataset that contains the model.model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODELstatement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table: the name of the BigQuery table that contains the prompt data. The text in the column that's namedpromptis sent to the model. If your table does not have apromptcolumn, use aSELECTstatement for this argument to provide an alias for an existing table column. An error occurs if nopromptcolumn is available.query_statement: the GoogleSQL query that generates the prompt data.
max_output_tokens: anINT64value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,1024]. Specify a lower value for shorter responses and a higher value for longer responses. The default is128.
top_k: anINT64value in the range[1,40]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is40.A
top_kvalue of1means the next selected token is the most probable among all tokens in the model's vocabulary, while atop_kvalue of3means that the next token is selected from among the three most probable tokens by using thetemperaturevalue.For each token selection step, the
top_ktokens with the highest probabilities are sampled. Then tokens are further filtered based on thetop_pvalue, with the final token selected using temperature sampling.
top_p: aFLOAT64value in the range[0.0,1.0]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95.Tokens are selected from the most to least probable until the sum of their probabilities equals the
top_pvalue. For example, if tokens A, B, and C have a probability of0.3,0.2, and0.1, and thetop_pvalue is0.5, then the model selects either A or B as the next token by using thetemperaturevalue and doesn't consider C.temperature: aFLOAT64value in the range[0.0,1.0]that controls the degree of randomness in token selection. Lowertemperaturevalues are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperaturevalues can lead to more diverse or creative results. Atemperaturevalue of0is deterministic, meaning that the highest probability response is always selected. The default is0.flatten_json_output: aBOOLvalue that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE.stop_sequences: anARRAY<STRING>value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
Details
The model and input table must be in the same region.
Output
ML.GENERATE_TEXT returns the input table plus the following columns:
Gemini API models
ml_generate_text_result: This is the JSON response from theprojects.locations.endpoints.generateContentcall to the model. The generated text is in thetextelement. The safety attributes are in thesafety_ratingselement. This column is returned whenflatten_json_outputisFALSE.ml_generate_text_llm_result: aSTRINGvalue that contains the generated text. This column is returned whenflatten_json_outputisTRUE.ml_generate_text_status: aSTRINGvalue that contains the API response status for the corresponding row. This value is empty if the operation was successful.ml_generate_text_grounding_result: aSTRINGvalue that contains a list of the grounding sources that the model used to gather additional information. This column is returned when bothflatten_json_outputandground_with_google_searchareTRUE.
Claude models
ml_generate_text_result: This is the JSON response from theprojects.locations.endpoints.rawPredictcall to the model. The generated text is in thecontentelement. This column is returned whenflatten_json_outputisFALSE.ml_generate_text_llm_result: aSTRINGvalue that contains the generated text. This column is returned whenflatten_json_outputisTRUE.ml_generate_text_status: aSTRINGvalue that contains the API response status for the corresponding row. This value is empty if the operation was successful.
Open models
ml_generate_text_result: This is the JSON response from theprojects.locations.endpoints.predictcall to the model. The generated text is in thepredictionselement. This column is returned whenflatten_json_outputisFALSE.ml_generate_text_llm_result: aSTRINGvalue that contains the generated text. This column is returned whenflatten_json_outputisTRUE.ml_generate_text_status: aSTRINGvalue that contains the API response status for the corresponding row. This value is empty if the operation was successful.
PaLM API models
ml_generate_text_result: the JSON response from theprojects.locations.endpoints.predictcall to the model. The generated text is in thecontentelement. The safety attributes are in thesafetyAttributeselement. This column is returned whenflatten_json_outputisFALSE.ml_generate_text_llm_result: aSTRINGvalue that contains the generated text. This column is returned whenflatten_json_outputisTRUE.ml_generate_text_status: aSTRINGvalue that contains the API response status for the corresponding row. This value is empty if the operation was successful.
Examples
Text analysis
Example 1
This example shows a request to a Claude model that provides a single prompt.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.claude_model`, (SELECT 'What is the purpose of dreams?' AS prompt));
Example 2
This example shows a request to a gemini-1.0-pro model with the following characteristics:
- Provides prompt data from a table column that's named
prompt. - Flattens the JSON response into separate columns.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.pro1_model`, TABLE `mydataset.prompt_table`, STRUCT(TRUE AS flatten_json_output));
Example 3
This example shows a request to a gemini-1.5-pro model that provides prompt data from a table
column named question that is aliased as prompt.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.pro15_model`, (SELECT question AS prompt FROM `mydataset.prompt_table`));
Example 4
This example shows a request to a gemini-1.5-flash model that concatenates strings and a table column
to provide the prompt data.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.flash15_model`, ( SELECT CONCAT( 'Classify the sentiment of the following text as positive or negative.Text:', input_column, 'Sentiment:') AS prompt FROM `mydataset.input_table`));
Example 5
This example shows a request a gemini-2.0-flash-exp model
that excludes model responses that contain
the strings Golf or football.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.flash2_model`, TABLE `mydataset.prompt_table`, STRUCT( .15 AS TEMPERATURE, TRUE AS flatten_json_output, ['Golf', 'football'] AS stop_sequences));
Example 6
This example shows a request to a gemini-1.5-flash model with the
following characteristics:
- Provides prompt data from a table column that's named
prompt. - Flattens the JSON response into separate columns.
- Retrieves and returns public web data for response grounding.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.flash15_model`, TABLE `mydataset.prompt_table`, STRUCT( TRUE AS flatten_json_output, TRUE AS ground_with_google_search));
Example 7
This example shows a request to a gemini-1.5-flash model with the
following characteristics:
- Provides prompt data from a table column that's named
prompt. - Returns a shorter generated text response.
- Filters out unsafe responses by using safety settings.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.flash15_model`, TABLE `mydataset.prompt_table`, STRUCT( 75 AS max_output_tokens, [STRUCT('HARM_CATEGORY_HATE_SPEECH' AS category, 'BLOCK_LOW_AND_ABOVE' AS threshold), STRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)] AS safety_settings));
Visual content analysis
This example analyzes visual content from an object table that's named
dogs and identifies the breed of dog contained in the content. The content
returned is filtered by the specified safety settings:
SELECT uri, ml_generate_text_llm_result FROM ML.GENERATE_TEXT( MODEL `mydataset.dog_identifier_model`, TABLE `mydataset.dogs` STRUCT( 'What is the breed of the dog?' AS PROMPT, .01 AS TEMPERATURE, TRUE AS FLATTEN_JSON_OUTPUT, [STRUCT('HARM_CATEGORY_HATE_SPEECH' AS category, 'BLOCK_LOW_AND_ABOVE' AS threshold), STRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)] AS safety_settings));
Audio content analysis
This example translates and transcribes audio content from an object table
that's named feedback:
SELECT uri, ml_generate_text_llm_result FROM ML.GENERATE_TEXT( MODEL `mydataset.audio_model`, TABLE `mydataset.feedback`, STRUCT( 'What is the content of this audio clip, translated into Spanish?' AS PROMPT, .01 AS TEMPERATURE, TRUE AS FLATTEN_JSON_OUTPUT));
PDF content analysis
This example classifies PDF content from an object table
that's named documents:
SELECT uri, ml_generate_text_llm_result FROM ML.GENERATE_TEXT( MODEL `mydataset.classify_model` TABLE `mydataset.documents` STRUCT( 'Classify this document using the following categories: legal, tax-related, real estate' AS PROMPT, .2 AS TEMPERATURE, TRUE AS FLATTEN_JSON_OUTPUT));
Locations
ML.GENERATE_TEXT must run in the same
region or multi-region as the remote model that the
function references.
With the exception of Gemini 2.0 models, you can create remote
models over built-in
Vertex AI models in all of the
regions
that support Generative AI APIS, and also in the US and EU multi-regions.
For Gemini 2.0 models, you can create remote models in the
us-central1 region and the US multi-region.
You can create remote models over Claude models in all of the supported regions for Claude models.
Quotas
See Vertex AI and Cloud AI service functions quotas and limits.
Known issues
This section contains information about known issues.Resource exhausted errors
Sometimes after a query job that uses this function finishes successfully, some returned rows contain the following error message:
A retryable error occurred: RESOURCE EXHAUSTED error from <remote endpoint>
This issue occurs because BigQuery query jobs finish successfully
even if the function fails for some of the rows. The function fails when the
volume of API calls to the remote endpoint exceeds the quota limits for that
service. This issue occurs most often when you are running multiple parallel
batch queries. BigQuery retries these calls, but if the retries
fail, the resource exhausted error message is returned.
To iterate through inference calls until all rows are successfully processed,
you can use the
BigQuery remote inference SQL scripts
or the
BigQuery remote inference pipeline Dataform package.
To try the BigQuery ML remote inference SQL script, see
Handle quota errors by calling ML.GENERATE_TEXT iteratively.
Gemini 2.0 temperature option
If you create a remote model based on a Gemini 2.0 model, or
update an existing remote model to use Gemini 2.0, then you must
set the temperature option to a FLOAT64 value greater than 0. Failing
to do so causes queries against the remote model to return errors.
What's next
- Try a tutorial on generating text using a public dataset.
- Get step-by-step instructions on how to generate text using your own data.
- Get step-by-step instructions on how to tune an LLM and use it to generate text.
- For more information about using Vertex AI models to generate text and embeddings, see Generative AI overview.
- For more information about using Cloud AI APIs to perform AI tasks, see AI application overview.

