AI Accelerators and GPUs
We have Vicona running on our CPU but it is slow and the answers seem of lower quality. Our current GPU does not have enough memory to run Vicona. So we have a sudden interest in hardware to run LLMs.
I have ordered a NVIDIA RTX A6000 GPU which has 48GB. One disadvantage of living on a tropical island is that it takes awhile for boats to bring stuff from Amazon.
This is a fair amount of money and it seems at least some day there will be AI accelerators just for running models that will be lower cost. It is not clear they are a good answer today. If anyone knows of an AI Accelerator that would work for Vicona please let me know and I would like to buy one and try it out..
The AI accelerators can do far more calculation with less hardware and power than GPUs. For example, the Hailo-8™ M.2 AI Acceleration Module sounds amazing. The Falcon-H8 uses several of these. But it is not clear if getting a LLM like Vicona to run on this is possible.
The GroqCard™ Accelerator also sounds amazing. It seems Groq has 7B LLaMA working on at least one of their accelerators. It seems if LLaMA is working that Vicona should as well as it is the same model just with different weights.
Also, Cerebras sounds amazing. This is for data-centers and not end users.
Also, Graphcore BOW-200, looks amazing. Again, probably for data-centers and not end users.
The Habana Gaudi2 seems to work with LLMs. Claims to be twice as fast as Nvidia A100 80GB.
It seems clear that in the future AI accelerators will be the way to go. Again if anyone knows of an AI accelerator that today can run a LLM with leading edge performance I would love to try it out (assuming cost is less than the GPU I just ordered).