Getting Started

Docker Images

The easiest way to get started locally is to use one of the docker images we publish on DockerHub:

latest - includes CUDA and libcudnn bindings to support GPU and CPU accelerated inference.
latest-cpu - a minimial image, lacking CUDA + libcuddn dylibs that allows for only CPU inference.

Once you have a docker image, you can then simply run it using docker-compose or docker:

docker run -p 0.0.0.0:50051:50051 \
           -p 0.0.0.0:50052:50052 \
           -v $(PWD)/.cache:/app/.cache \
           anansi:embeddings-latest-cpu

With the above setup, you can then send HTTP1 requests to 50051 or issue gRPC requests to 50052.

Configuration

Environment Variables

embedds can be configured using environment variables. All environment variables are prefixed with EMBEDDS_ and are outlined below, along with their effects:

The list of environment variables that are supported are as follows:

Environment Variable	Usage
`EMBEDDS_GRPC_PORT`	port to listen for and server gRPC requests [default=50051]
`EMBEDDS_HTTP_PORT`	port to listen for and server HTTP requests [default=50052]
`EMBEDDS_CONFIG_FILE`	filepath to store the runtime configuration for models - more on this file is available below [default=/app/config.yaml]
`EMBEDDS_CACHE_FOLDER`	folder in which to store the cached model files - these are typically on the order of ~100s of MBs and can grow to GBs if you bin-pack different types of models [default=/app/.cache]
`EMBEDDS_ALLOW_ADMIN`	whether or not to honor `Initalize(..)` requests, which load a model into the current ONNX runtime. we recommend that models be loaded once on startup; however, this conveninece is included for experimentation [default=false]

Config Files

The `EMBEDDS_CONFIG_FILE` points to an accessible filepath that stores a list of models that should be instantiated on startup of the process.If these models are missing, embedds will attempt to download them and store them

in the filepath pointed to by EMBEDDS_CACHE_FOLDER. An example configuration is outlined below:

models:
  # class must match one of the available models, defined at:
  # https://github.com/infrawhispers/anansi/blob/main/embeddings/proto/api.proto
  - name: VIT_L_14_336_OPENAI
    class: ModelClass_CLIP
    # [optional] set to zero or leave empty for parallelism to be determined
    num_threads: 4
    # [optional] enable | disable parallel execution of the onnx graph, which may improve
    # performance at the cost of memory usage.
    parallel_execution: true
  - name: INSTRUCTOR_LARGE
    class: ModelClass_INSTRUCTOR
  - name: INSTRUCTOR_LARGE
    class: ModelClass_INSTRUCTOR

This configuration would create ONE VIT_L_14_336_OPENAI and TWO INSTRUCTOR_LARGE models. This is useful for running multiple embedding models on a single GPU. The list of devices and available models can be found here. By default, embedds will instantiate one instance of INSTRUCTOR_LARGE.

Issuing Requests

With a running server, you can now issue requests against the server. Take a look at the swagger-api docs for the HTTP methods available to you.

You can also pull the proto definition from source to build your grpc client. Native clients are on the roadmap, we are also open for pull-requests 🤩.

Getting Started

Docker Images​

Configuration​

Environment Variables​

Config Files​

Issuing Requests​

Docker Images

Configuration

Environment Variables

Config Files

Issuing Requests