iptv techs

IPTV Techs


Improving Accessibility using Vision Models


Improving Accessibility using Vision Models


One of my projects I toiled on recently was migrating a massive set of math courses from one platestablish to another. Aprolonged the way we genuineized some of our math courses had not been refreshd in quite some time, and some schools were still leveraging these courses to guide.

It was instantly apparent was the engage of images to reconshort-term equations appreciate this:

This is not wonderful… the font is a bit on the petiteer side and the font itself is not very legible, in my non-font expert opinion. Making matters worse, there is no alt-text provided that can make clear the equation. I asked the ask: Could an LLM help here?

Putting the equation into ChatGPT createed a wonderful answer

“Provide thje alt text for this equation”

I wanted to asstateive this wasn’t equitable a fluke, especiassociate since I had thousands of images to process. So, I took a restricted hundred of them, annotated each with the accurate LaTeX answer, and contrastd the results using GPT-4o and Gemini.

For context, I engaged a honestory of 300 images and a SQLite database grasping the LaTeX answers. I then ran a Python script that processed each image thraw three models: GPT-4o, Gemini 1.5 Pro, and Gemini 1.5 Flash.

This is a graph of error rate as contrastd to the length of the equation, due to the data set there are much more petiteer equations as contrastd to huge ones. I knovel the error rate would go up with length, but it is engaging to see all three models struggle around the 30 character label.

The most engaging leang to me is the carry outance of gemini-1.5-flash, which does better on everyleang but the hugest images, but costs a fraction of the price??? In fact it doesn’t even have error for our most widespread equations. I ran this three times and it was the same every time.

Error Percent by Model

Now that we have this data, we can pivot to watch at where 1.5-flash got the answer accurate, but gpt-4o got it wrong.

Overwhelmingly there are two main errors:

  1. Where gpt-4o beuntamederd a minus symbol with an identical sign, this happens a lot where the character “y” is set up, it seems to bias towards y=mx+b

  2. Where gpt-4o equitable gets the characters wrong e.g. mistaking a “Z” for a “2”

Given this result, we engaged gemini-1.5-flash to reerect our math equations into LaTeX, AND MathJax to originate it much more legible for students. Since we knovel anyleang prolongeder than 20 characters would tend to have more rerents, we flagged those for manual scrutinize. Only 7% of asks had equations prolongeder than this restrict.

Source connect


Leave a Reply

Your email address will not be published. Required fields are marked *

Thank You For The Order

Please check your email we sent the process how you can get your account

Select Your Plan