Optimizing IVF-PQ Performance with RAPIDS cuVS: Key Tuning Techniques

Tony Kim
Jul 18, 2024 19:39

Learn how to optimize the IVF-PQ algorithm for vector search performance using RAPIDS cuVS, with practical tips on tuning hyper-parameters and improving recall.

In the first part of the series, an overview of the IVF-PQ algorithm was presented, explaining its foundation on the IVF-Flat algorithm and the use of Product Quantization (PQ) to compress the index and support larger datasets. In part two, the focus shifts to the practical aspects of tuning IVF-PQ performance, which is crucial for achieving optimal results, especially when dealing with billion-scale datasets.

Tuning Parameters for Index Building

IVF-PQ shares some parameters with IVF-Flat, such as coarse-level indexing and search hyper-parameters. However, IVF-PQ introduces additional parameters that control compression. One of the critical parameters is n_lists, which determines the number of partitions (inverted lists) into which the input dataset is clustered. The performance is influenced by the number of lists probed and their sizes. Experiments suggest that n_lists in the range of 10K to 50K yield good performance across recall levels, though this can vary depending on the dataset.

Another crucial parameter is pq_dim, which controls compression. Starting with one fourth the number of features in the dataset and increasing in steps is a good technique for tuning this parameter. Figure 2 in the original blog post illustrates significant drops in QPS, which can be attributed to factors such as increased compute work and shared memory requirements per CUDA block.

The pq_bits parameter, ranging from 4 to 8, controls the number of bits used in each individual PQ code, affecting the codebook size and recall. Reducing pq_bits can improve search speed by fitting the look-up table (LUT) in shared memory, although this comes at the cost of recall.

Additional Parameters

The codebook_kind parameter determines how the codebooks for the second-level quantizer are constructed, either for each subspace or for each cluster. The choice between these options can impact training time, GPU shared memory utilization, and recall. Parameters such as kmeans_n_iters and kmeans_trainset_fraction are also important, though they rarely need adjustment.

Tuning Parameters for Search

The n_probes parameter, discussed in the previous blog post on IVF-Flat, is essential for search accuracy and throughput. IVF-PQ provides additional parameters like internal_distance_dtype and lut_dtype, which control the representation of distance or similarity during the search and the datatype used to store the LUT, respectively. Adjusting these parameters can significantly impact performance, especially for datasets with large dimensionality.

Improving Recall with Refinement

When tuning parameters is not enough to achieve the desired recall, refinement offers a promising alternative. This separate operation, performed after the ANN search, recomputes exact distances for selected candidates and reranks them. The refinement operation can significantly improve recall, as demonstrated in Figure 4 of the original blog post, though it requires access to the source dataset.

Summary

The series on accelerating vector search with inverted-file indexes covers two cuVS algorithms: IVF-Flat and IVF-PQ. IVF-PQ extends IVF-Flat with PQ compression, enabling faster searches and the ability to handle billion-scale datasets with limited GPU memory. By fine-tuning parameters for index building and search, data practitioners can achieve the best results efficiently. The RAPIDS cuVS library offers a range of vector search algorithms to cater to various use cases, from exact searches to low-accuracy-high-QPS ANN methods.

For practical tuning of IVF-PQ parameters, refer to the IVF-PQ notebook on GitHub. For more details on the provided APIs, see the cuVS documentation.

Image source: Shutterstock

Share it on social networks