Gandiva Expression Compiler#

TreeExprBuilder Class#

class TreeExprBuilder#

Tree Builder for a nested expression.

Public Static Functions

static NodePtr MakeLiteral(bool value)#

create a node on a literal.

static NodePtr MakeNull(DataTypePtr data_type)#

create a node on a null literal.

returns null if data_type is null or if it’s not a supported datatype.

static NodePtr MakeField(FieldPtr field)#

create a node on arrow field.

returns null if input is null.

static NodePtr MakeFunction(const std::string &name, const NodeVector &params, DataTypePtr return_type)#

create a node with a function.

returns null if return_type is null

static NodePtr MakeIf(NodePtr condition, NodePtr then_node, NodePtr else_node, DataTypePtr result_type)#

create a node with an if-else expression.

returns null if any of the inputs is null.

static NodePtr MakeAnd(const NodeVector &children)#

create a node with a boolean AND expression.

static NodePtr MakeOr(const NodeVector &children)#

create a node with a boolean OR expression.

static ExpressionPtr MakeExpression(NodePtr root_node, FieldPtr result_field)#

create an expression with the specified root_node, and the result written to result_field.

returns null if the result_field is null.

static ExpressionPtr MakeExpression(const std::string &function, const FieldVector &in_fields, FieldPtr out_field)#

convenience function for simple function expressions.

returns null if the out_field is null.

static ConditionPtr MakeCondition(NodePtr root_node)#

create a condition with the specified root_node

static ConditionPtr MakeCondition(const std::string &function, const FieldVector &in_fields)#

convenience function for simple function conditions.

static NodePtr MakeInExpressionInt32(NodePtr node, const std::unordered_set<int32_t> &constants)#

creates an in expression

static NodePtr MakeInExpressionFloat(NodePtr node, const std::unordered_set<float> &constants)#

creates an in expression for float

static NodePtr MakeInExpressionDouble(NodePtr node, const std::unordered_set<double> &constants)#

creates an in expression for double

static NodePtr MakeInExpressionDate32(NodePtr node, const std::unordered_set<int32_t> &constants)#

Date as s/millis since epoch.

static NodePtr MakeInExpressionDate64(NodePtr node, const std::unordered_set<int64_t> &constants)#

Date as millis/us/ns since epoch.

static NodePtr MakeInExpressionTime32(NodePtr node, const std::unordered_set<int32_t> &constants)#

Time as s/millis of day.

static NodePtr MakeInExpressionTime64(NodePtr node, const std::unordered_set<int64_t> &constants)#

Time as millis/us/ns of day.

static NodePtr MakeInExpressionTimeStamp(NodePtr node, const std::unordered_set<int64_t> &constants)#

Timestamp as millis since epoch.

class Node#

Represents a node in the expression tree.

Validity and value are in a joined state.

Subclassed by gandiva::BooleanNode, gandiva::FieldNode, gandiva::FunctionNode, gandiva::IfNode, gandiva::InExpressionNode< Type >, gandiva::InExpressionNode< gandiva::DecimalScalar128 >, gandiva::LiteralNode

Public Functions

virtual Status Accept(NodeVisitor &visitor) const = 0#

Derived classes should simply invoke the Visit api of the visitor.

class Expression#

An expression tree with a root node, and a result field.

Subclassed by gandiva::Condition

class Condition : public gandiva::Expression#

A condition expression.

Function registry#

class FunctionSignature#

Signature for a function : includes the base name, input param types and output types.

Public Functions

std::size_t Hash() const#

calculated based on name, datatype id of parameters and datatype id of return type.

std::vector<std::shared_ptr<FunctionSignature>> gandiva::GetRegisteredFunctionSignatures()#

Get the list of all function signatures.

Configuration#

class Configuration#

runtime config for gandiva

It contains elements to customize gandiva execution at run time.

class ConfigurationBuilder#

configuration builder for gandiva

Provides a default configuration and convenience methods to override specific values and build a custom instance

Projector#

class Projector#

projection using expressions.

A projector is built for a specific schema and vector of expressions. Once the projector is built, it can be used to evaluate many row batches.

Public Functions

Status Evaluate(const arrow::RecordBatch &batch, arrow::MemoryPool *pool, arrow::ArrayVector *output) const#

Evaluate the specified record batch, and return the allocated and populated output arrays.

The output arrays will be allocated from the memory pool β€˜pool’, and added to the vector β€˜output’.

Parameters:
  • batch – [in] the record batch. schema should be the same as the one in β€˜Make’

  • pool – [in] memory pool used to allocate output arrays (if required).

  • output – [out] the vector of allocated/populated arrays.

Status Evaluate(const arrow::RecordBatch &batch, const ArrayDataVector &output) const#

Evaluate the specified record batch, and populate the output arrays.

The output arrays of sufficient capacity must be allocated by the caller.

Parameters:
  • batch – [in] the record batch. schema should be the same as the one in β€˜Make’

  • output – [inout] vector of arrays, the arrays are allocated by the caller and populated by Evaluate.

Status Evaluate(const arrow::RecordBatch &batch, const SelectionVector *selection_vector, arrow::MemoryPool *pool, arrow::ArrayVector *output) const#

Evaluate the specified record batch, and return the allocated and populated output arrays.

The output arrays will be allocated from the memory pool β€˜pool’, and added to the vector β€˜output’.

Parameters:
  • batch – [in] the record batch. schema should be the same as the one in β€˜Make’

  • selection_vector – [in] selection vector which has filtered row positions.

  • pool – [in] memory pool used to allocate output arrays (if required).

  • output – [out] the vector of allocated/populated arrays.

Status Evaluate(const arrow::RecordBatch &batch, const SelectionVector *selection_vector, const ArrayDataVector &output) const#

Evaluate the specified record batch, and populate the output arrays at the filtered positions.

The output arrays of sufficient capacity must be allocated by the caller.

Parameters:
  • batch – [in] the record batch. schema should be the same as the one in β€˜Make’

  • selection_vector – [in] selection vector which has the filtered row positions

  • output – [inout] vector of arrays, the arrays are allocated by the caller and populated by Evaluate.

Public Static Functions

static Status Make(SchemaPtr schema, const ExpressionVector &exprs, std::shared_ptr<Projector> *projector)#

Build a default projector for the given schema to evaluate the vector of expressions.

Parameters:
  • schema – [in] schema for the record batches, and the expressions.

  • exprs – [in] vector of expressions.

  • projector – [out] the returned projector object

static Status Make(SchemaPtr schema, const ExpressionVector &exprs, std::shared_ptr<Configuration> configuration, std::shared_ptr<Projector> *projector)#

Build a projector for the given schema to evaluate the vector of expressions.

Customize the projector with runtime configuration.

Parameters:
  • schema – [in] schema for the record batches, and the expressions.

  • exprs – [in] vector of expressions.

  • configuration – [in] run time configuration.

  • projector – [out] the returned projector object

static Status Make(SchemaPtr schema, const ExpressionVector &exprs, SelectionVector::Mode selection_vector_mode, std::shared_ptr<Configuration> configuration, std::shared_ptr<Projector> *projector)#

Build a projector for the given schema to evaluate the vector of expressions.

Customize the projector with runtime configuration.

Parameters:
  • schema – [in] schema for the record batches, and the expressions.

  • exprs – [in] vector of expressions.

  • selection_vector_mode – [in] mode of selection vector

  • configuration – [in] run time configuration.

  • projector – [out] the returned projector object

Filter#

class Filter#

filter records based on a condition.

A filter is built for a specific schema and condition. Once the filter is built, it can be used to evaluate many row batches.

Public Functions

Status Evaluate(const arrow::RecordBatch &batch, std::shared_ptr<SelectionVector> out_selection)#

Evaluate the specified record batch, and populate output selection vector.

Parameters:
  • batch – [in] the record batch. schema should be the same as the one in β€˜Make’

  • out_selection – [inout] the selection array with indices of rows that match the condition.

Public Static Functions

static inline Status Make(SchemaPtr schema, ConditionPtr condition, std::shared_ptr<Filter> *filter)#

Build a filter for the given schema and condition, with the default configuration.

Parameters:
  • schema – [in] schema for the record batches, and the condition.

  • condition – [in] filter condition.

  • filter – [out] the returned filter object

static Status Make(SchemaPtr schema, ConditionPtr condition, std::shared_ptr<Configuration> config, std::shared_ptr<Filter> *filter)#

Build a filter for the given schema and condition.

Customize the filter with runtime configuration.

Parameters:
  • schema – [in] schema for the record batches, and the condition.

  • condition – [in] filter conditions.

  • config – [in] run time configuration.

  • filter – [out] the returned filter object

class SelectionVector#

Selection Vector : vector of indices in a row-batch for a selection, backed by an arrow-array.

Subclassed by gandiva::SelectionVectorImpl< C_TYPE, A_TYPE, mode >

Public Functions

virtual uint64_t GetIndex(int64_t index) const = 0#

Get the value at a given index.

virtual void SetIndex(int64_t index, uint64_t value) = 0#

Set the value at a given index.

virtual int64_t GetMaxSlots() const = 0#

The maximum slots (capacity) of the selection vector.

virtual int64_t GetNumSlots() const = 0#

The number of slots (size) of the selection vector.

virtual void SetNumSlots(int64_t num_slots) = 0#

Set the number of slots in the selection vector.

virtual ArrayPtr ToArray() const = 0#

Convert to arrow-array.

virtual arrow::Buffer &GetBuffer() const = 0#

Get the underlying arrow buffer.

virtual Mode GetMode() const = 0#

Mode of SelectionVector.

Status PopulateFromBitMap(const uint8_t *bitmap, int64_t bitmap_size, int64_t max_bitmap_index)#

populate selection vector for all the set bits in the bitmap.

Parameters:
  • bitmap – [in] the bitmap

  • bitmap_size – [in] size of the bitmap in bytes

  • max_bitmap_index – [in] max valid index in bitmap (can be lesser than capacity in the bitmap, due to alignment/padding).

Public Static Functions

static Status MakeInt16(int64_t max_slots, std::shared_ptr<arrow::Buffer> buffer, std::shared_ptr<SelectionVector> *selection_vector)#

make selection vector with int16 type records.

Parameters:
  • max_slots – [in] max number of slots

  • buffer – [in] buffer sized to accommodate max_slots

  • selection_vector – [out] selection vector backed by β€˜buffer’

static Status MakeInt16(int64_t max_slots, arrow::MemoryPool *pool, std::shared_ptr<SelectionVector> *selection_vector)#
Parameters:
  • max_slots – [in] max number of slots

  • pool – [in] memory pool to allocate buffer

  • selection_vector – [out] selection vector backed by a buffer allocated from the pool.

static Status MakeImmutableInt16(int64_t num_slots, std::shared_ptr<arrow::Buffer> buffer, std::shared_ptr<SelectionVector> *selection_vector)#

creates a selection vector with pre populated buffer.

Parameters:
  • num_slots – [in] size of the selection vector

  • buffer – [in] pre-populated buffer

  • selection_vector – [out] selection vector backed by β€˜buffer’

static Status MakeInt32(int64_t max_slots, std::shared_ptr<arrow::Buffer> buffer, std::shared_ptr<SelectionVector> *selection_vector)#

make selection vector with int32 type records.

Parameters:
  • max_slots – [in] max number of slots

  • buffer – [in] buffer sized to accommodate max_slots

  • selection_vector – [out] selection vector backed by β€˜buffer’

static Status MakeInt32(int64_t max_slots, arrow::MemoryPool *pool, std::shared_ptr<SelectionVector> *selection_vector)#

make selection vector with int32 type records.

Parameters:
  • max_slots – [in] max number of slots

  • pool – [in] memory pool to allocate buffer

  • selection_vector – [out] selection vector backed by a buffer allocated from the pool.

static Status MakeImmutableInt32(int64_t num_slots, std::shared_ptr<arrow::Buffer> buffer, std::shared_ptr<SelectionVector> *selection_vector)#

creates a selection vector with pre populated buffer.

Parameters:
  • num_slots – [in] size of the selection vector

  • buffer – [in] pre-populated buffer

  • selection_vector – [out] selection vector backed by β€˜buffer’

static Status MakeInt64(int64_t max_slots, std::shared_ptr<arrow::Buffer> buffer, std::shared_ptr<SelectionVector> *selection_vector)#

make selection vector with int64 type records.

Parameters:
  • max_slots – [in] max number of slots

  • buffer – [in] buffer sized to accommodate max_slots

  • selection_vector – [out] selection vector backed by β€˜buffer’

static Status MakeInt64(int64_t max_slots, arrow::MemoryPool *pool, std::shared_ptr<SelectionVector> *selection_vector)#

make selection vector with int64 type records.

Parameters:
  • max_slots – [in] max number of slots

  • pool – [in] memory pool to allocate buffer

  • selection_vector – [out] selection vector backed by a buffer allocated from the pool.