The ChatGLM6b model has impressive results and can run on our current 12 GB GPU so we decided to download and install it while waiting for our new 48 GB GPU to come on the boat to Anguilla. Note it was trained on both Chinese and English and is still impressive in English.
To download the model from HugginFace.co we made a unsecure script hugginface.download.py that you can download and then use. It is unsecure as it will download and run code so don't use on a computer with any secrets or important stuff. To our script to download maybe 12GB with all the weights etc. do:
python3 hugginface.download.py THUDM/chatglm-6b
Then we used the English README file form GitHub as instructions and from the right directory to line up with the huggingface weights did:
git clone https://github.com/THUDM/ChatGLM-6B
Our GPU can only handle INT8 version of this model so we had to change one line of demo.cli.py to from
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().quantize(8).cuda()
Then we needed to install Nvidia Cuda so we went to:
We used type of OS as "Linux/X86-64/WSL-Ubuntu" as we have Microsoft's WSL for running Ubuntu on windows.
Then clicked "deb(network)" as type of installer and did what it said. This took more than a half hour to download/install.
Then we could run it with:
Takes a minute to start up but then it runs fast and does well for a 6B model and only INT8.
It does make up lots of stuff.