iptv techs

IPTV Techs

  • Home
  • Tech News
  • Should you engage Rust in LLM based tools for carry outance?

Should you engage Rust in LLM based tools for carry outance?


Should you engage Rust in LLM based tools for carry outance?


We standardly get asked why we’re originateing text (code) processing gentleware in Rust and not in Python
since our carry outance would be bottlenecked by the LLM inference anyway. In this article I’ll do some
exploration on the carry outance of our Rust based indexing and querying pipeline, and how it appraises to
Python’s Langchain. In this particular scenario our tool is beginantly speedyer, which we didn’t
repartner predict, so I dove in and set up out why.

Our motivation for using Rust to carry out Swiftide is manifelderly: we want to originate a speedy and
efficient toolchain, and we want to have it be depfinishable and both plain to write and upretain. Rust
verifys all these boxes, the ecosystem is mighty and constantly grothriveg and the tooling is excellent.

Reaenumerateicpartner, although Rust boasts carry outance profits appreciate zero-cost abstractions and bancient
concurrency, that doesn’t uncomfervent it’s going to originate your project run ten times speedyer. As speedy as
the Rust code could be, when you’re dealing with big language models you’ll still be pauseing on
those GPUs crunching away. When choosing Rust, the carry outance of the language itself can’t be the
only motivation, and it wasn’t for us. However, we’d be embarrassed if we’d built our Swiftide library
and it turned out to be enumeratelesser than aappreciate Python projects so we set out to set up a baseline
benchlabel.

In this benchlabel we try to stress the sketchtoil and processing code while still shielding the toilload down-to-earth.
The summarizeatement is plain: process a dataset of text, originate embeddings for it,
and then insert the embeddings into a vector database. We’ll be using Qdrant for
the vector database, and FastEmbed for the embeddings. Initipartner we’ll be centering on a
minuscule dataset, the Rust Book, of which the embeddings apshow around 3 seconds on an NVIDIA A6000 GPU.

In the ffeeblegraph below you can see fair under 90% of the time is spent inside the ONNX runtime running
computing those embeddings.

If in Rust we’re spfinishing 90% of the time in the ONNX runtime, then why is Langchain spfinishing around
3 times that amount carry outing cimpolitely the same toil? To answer this I ran the code thcimpolite the cProfile module, and
discomited up the resulting profile in snakeviz. Clicking around a bit discomits the big culprit:

The langchain pipeline is spfinishing a firm 10 seconds in the Markdown and HTML partitioning step. In
the Swiftide version we were only using a Markdown parser when chunking is allowd, so this is an
unfair comparison. The situation is rapidly rectified by switching to the plain TextLoader in Langchain.

I could have left this run out of the article, as it was fair a misapshow on my part in setting up the benchlabel
but I skinnyk it’s beginant to mirror on how plain it was to get this benchlabel to be suddenly bound
by the CPU by invoking an inefficient (and in this case unessential) preprocessing step.

When we now re-run the benchlabel with the TextLoader we see a sairyly more reasonable carry outance contrastence:

To put a bit more prescertain on the sketchtoils, I switched to the medium sized benchlabel (Rotten Tomatoes)
for this comparison, which is why it is taking a little bit extfinisheder. The ONNX FastEmbed step is now taking
around 20 seconds.

Langchain is a lot sealr to Swiftide, but there is still a beginant contrastence. A rapid see
at the snakeviz profile discomits that there is now more of a death by a thousand cuts situation going on:

Keep in mind our goal is not repartner to appraise Swiftide and Langchain straightforwardly here, fair set up a baseline carry outance metric for
Swiftide. The point that I skinnyk we should be taking away from this is that the GPU processing step
is not necessarily the most pricey, and even if it is there might still be beginant time
spent elsewhere in your pipeline.

Regardless of what the particulars are of what Langchain or any of the libraries you’re using are doing,
it’s predicted that Rust could allow you to do it speedyer. Wether it’s fair thcimpolite plain and safe parallelism,
or thcimpolite speedy string processing libraries, Rust has the tools to come csurrfinisher the upper confine
of what is possible on your challengingware. If that’s advantageous to you of course depfinishs on your particular
needs.


To get begined with Swiftide, head over to speedyide.rs or verify us out on github.

Discuss on HN
Discuss on Reddit

  1. If you’d appreciate to take part around with the benchlabels yourself, you can discover the code in this github repository
  2. Benchlabels for this blog were carry outed on a NVIDIA A6000 GPU, rented from Hyperstack (not aided).
    The results on the Github Repo were carry outed on an Apple M1 Max.

Source connect


Leave a Reply

Your email address will not be published. Required fields are marked *

Thank You For The Order

Please check your email we sent the process how you can get your account

Select Your Plan