-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
static-tesseract
A docker recipe to build a statically linked version of Tesseract, the opensource OCR software. Perfect for deployment on AWS Lambda or other cloud functions!
https://replicate.com/pharmapsychotic/clip-interrogator
using:
cfg.apply_low_vram_defaults()
interrogate_fast()
I tried lighter models like vit32/laion400 and others etc all are very very slow to load or use (model list: https://github.com/mlfoundations/open_clip)
I'm desperately looking for something more modest and light.
None of the models out in the open really have GPT-3-ish capabilities, not even llama-65b.
That aside, https://github.com/ggerganov/llama.cpp is the best option given the constraints that you describe. However, you will still need considerable amount of system RAM for larger models - 20 Gb for llama-30b, 40 Gb for llama-65b. Swap is an option here, but performance with it is abysmal.
These things are huge, and I don't think there is any way around that. You can play tricks with quantization, and larger models seem to be more tolerant of it, but even if, as some claim, 65b could be quantized to 2-bit while retaining decent performance, that would still need 20 Gb of RAM (CPU or GPU) to load - never mind additional requirements for actual inference.
The breakthrough here is more likely to come from new hardware that is highly optimized for those LLMs - basically just matmul for various bitnesses all the way down to 2-bit plus as much fast memory as can be packed into it for a given price point.
Oh sure! So the OCR is a statically linked build of tesseract based on [1] and pytesseract [2] which is a super thin wrapper but easier than writing it yourself. The I stole/modified the prompt from [3] to get the bot to write Python programs that do date calculations. Then I used [4] to take the output in case the Llm didn't use the format I asked. I run the Python code it generates in a rootless container that uses the Lambda RIE [5] because I was too lazy to make my own thing.
[1] https://github.com/wingedrhino/static-tesseract
[2] https://pypi.org/project/pytesseract/
[3] https://python.langchain.com/en/latest/modules/agents/tools/...
[4] https://pypi.org/project/dateparser/
[5] https://docs.aws.amazon.com/lambda/latest/dg/images-test.htm...