Getting Started
Docker Images
The easiest way to get started locally is to use one of the docker images we publish on DockerHub:
- latest - includes CUDA and libcudnn bindings to support GPU and CPU accelerated inference.
- latest-cpu - a minimial image, lacking CUDA + libcuddn dylibs that allows for only CPU inference.
Once you have a docker image, you can then simply run it using docker-compose
or docker
:
docker run -p 0.0.0.0:50051:50051 \
-p 0.0.0.0:50052:50052 \
-v $(PWD)/.cache:/app/.cache \
anansi:embeddings-latest-cpu
With the above setup, you can then send HTTP1 requests to 50051 or issue gRPC requests to 50052.
Configuration
Environment Variables
embedds can be configured using environment variables. All environment variables are prefixed with EMBEDDS_
and are outlined below, along with their effects:
The list of environment variables that are supported are as follows:
Environment Variable | Usage |
| port to listen for and server gRPC requests [default=50051] |
| port to listen for and server HTTP requests [default=50052] |
| filepath to store the runtime configuration for models - more on this file is available below [default=/app/config.yaml] |
| folder in which to store the cached model files - these are typically on the order of ~100s of MBs and can grow to GBs if you bin-pack different types of models [default=/app/.cache] |
| whether or not to honor |
Config Files
The `EMBEDDS_CONFIG_FILE` points to an accessible filepath that stores a list of models that should be instantiated on startup of the process.If these models are missing, embedds will attempt to download them and store them
in the filepath pointed to by EMBEDDS_CACHE_FOLDER
. An example configuration is outlined below:
models:
# class must match one of the available models, defined at:
# https://github.com/infrawhispers/anansi/blob/main/embeddings/proto/api.proto
- name: VIT_L_14_336_OPENAI
class: ModelClass_CLIP
# [optional] set to zero or leave empty for parallelism to be determined
num_threads: 4
# [optional] enable | disable parallel execution of the onnx graph, which may improve
# performance at the cost of memory usage.
parallel_execution: true
- name: INSTRUCTOR_LARGE
class: ModelClass_INSTRUCTOR
- name: INSTRUCTOR_LARGE
class: ModelClass_INSTRUCTOR
This configuration would create ONE VIT_L_14_336_OPENAI
and TWO INSTRUCTOR_LARGE
models. This is useful for running multiple embedding models on a single GPU. The list of devices and available models can be found here. By default, embedds will instantiate one instance of INSTRUCTOR_LARGE.
Issuing Requests
With a running server, you can now issue requests against the server. Take a look at the swagger-api docs for the HTTP methods available to you.
You can also pull the proto definition from source to build your grpc client. Native clients are on the roadmap, we are also open for pull-requests 🤩.