IO Descriptors#

EVA supports three key data types. The inputs and outputs of the user-defined functions (UDFs) must be of one of these types.

NumpyArray#

Used when the inputs or outputs of the UDF is of type Numpy Array.

Parameters#

name (str): name of the numpy array.

is_nullable (bool): boolean value indicating if the numpy array can be NULL.

type (NdArrayType): data type of all the elements in the numpy array. The available types can be found in eva/catalog/catalog_type.py in the class NdArrayType

dimensions(Tuple(int)): shape of the numpy array

from eva.catalog.catalog_type import NdArrayType
NumpyArray(
        name="input_arr",
        is_nullable=False,
        type=NdArrayType.INT32,
        dimensions=(2, 2),
)

PyTorchTensor#

name (str): name of the pytorch tensor.

is_nullable (bool): boolean value indicating if the pytorch tensor can be NULL.

type (NdArrayType): data type of elements in the pytorch tensor. The available types can be found in eva/catalog/catalog_type.py in class NdArrayType

dimensions(Tuple(int)): shape of the numpy array

from eva.catalog.catalog_type import NdArrayType
PyTorchTensor(
                name="input_arr",
                is_nullable=False,
                type=NdArrayType.INT32,
                dimensions=(2, 2),
)

PandasDataframe#

columns (List[str]): list of strings that represent the expected column names in the pandas dataframe that is returned from the UDF.

column_types (NdArrayType): expected datatype of the column in the pandas dataframe returned from the UDF. The NdArrayType class is inherited from eva.catalog.catalog_type.

column_shapes (List[tuples]): list of tuples that represent the expected shapes of columns in the pandas dataframe returned from the UDF.

PandasDataframe(
            columns=["labels", "bboxes", "scores"],
            column_types=[
                NdArrayType.STR,
                NdArrayType.FLOAT32,
                NdArrayType.FLOAT32,
            ],
            column_shapes=[(None,), (None,), (None,)],
)