Amazingly easy to run LLMs with Ollama

  There is a new tool called Ollama that makes it really easy to try out different LLMs on your local machine.   Here are several youtube videos about this.   Ollama - Local Models on your machine Sam Witteveen Running Mistral AI on your machine with Ollama Learn Data with Mark Ollama: The Easiest Way to RUN LLMs Locally Prompt Engineering Ollama on Linux: Easily Install Any LLM on Your Server Ian Wootten   It is so easy to run models with Ollama! To install Ollama on our Windows machine we opened a Windows/Linux terminal and entered:   curl | sh   Then to run a model we can do things like:  ollama run llama2:70b   At you can see what models are available. After you click on a model you can click "tags" to see all the  different versions. Then you can click to copy the command to run that model.  I look for the largest one that fits in my 48 GB GPU.    At there are different model


  A new LLM called Orca is getting near GPT4 with only 13B model.   It is based on Meta/Facebook 13B LLaMA model.   They plan to release the diffs in weights and so make the model open source.  Once this is done we plan to try it out.  It seems like the new BestBot and it is really amazing that it is so small.  Might be 1% of the size of GPT4 (don't really know the size of GPT4 so just a guess).

Open Source LLM Gaining

  The following is claimed to have been leaked from a Google internal memo. It seems interesting.    

Chatbot Arena and Leaderboard

   The people at have Chatbot Arena   where you can give a question, see answers from two different bots and rate them.   Only after are you told which bots they were.    Using these human rated competitions between random bots, they can give ratings to the different bots.  As of May 3rd 2023 this is what their Leaderboard looked like: I do think that Vicuna is the best of these.   So I think the results are right.   And I like the method.   Hope they keep doing this.   It will make it easier to tell which is the best open source bot quickly.  Also interesting is their open source leaderboard .     

FreedomGPT Install

  This was the easiest install so far.  There was some simple sign up.  We went to and downloaded the Windows installer.   We ran that and it downloaded about 4GB and installed everything. The special selling point of this chatbot is that it has minimal filtering.   Our testing confirms this.   It is reasonably fast on our machine.

ChatGLM6b Install

  The ChatGLM6b model has impressive results and can run on our current 12 GB GPU so we decided to download and install it while waiting for our new 48 GB GPU to come on the boat to Anguilla.   Note it was trained on both Chinese and English and is still impressive in English. To download the model from we made a unsecure script that you can download and then use.  It is unsecure as it will download and run code so don't use on a computer with any secrets or important stuff.  To our script to download maybe 12GB with all the weights etc. do:  python3 THUDM/chatglm-6b Then we used the English README file form GitHub as instructions and from the right directory to line up with the huggingface  weights did: git clone Our GPU can only handle INT8 version of this model so we had to change one line of to from   model = AutoModel.from_pretrained("THUDM/chatglm-6b", trus

AI Accelerators and GPUs

  We have Vicona running on our CPU but it is slow and the answers seem of lower quality.   Our current GPU does not have enough memory to run Vicona.   So we have a sudden interest in hardware to run LLMs. I have ordered a NVIDIA RTX A6000 GPU which has 48GB.  One disadvantage of living on a tropical island is that it takes awhile for boats to bring stuff from Amazon. This is a fair amount of money and it seems at least some day there will be AI accelerators just for running models that will be lower cost.   It is not clear they are a good answer today.   If anyone knows of an AI Accelerator that would work for Vicona please let me know and I would like to buy one and try it out. . The AI accelerators can do far more calculation with less hardware and power than GPUs.   For example, the Hailo-8™ M.2 AI Acceleration Module sounds amazing.  The Falcon-H8 uses several of these.   But it is not clear if getting a LLM like Vicona to run on this is possible. The GroqCard™ Accelerator al