intel-extension-for-tensorflow
bitsandbytes
intel-extension-for-tensorflow | bitsandbytes | |
---|---|---|
9 | 61 | |
303 | 5,482 | |
-0.3% | - | |
9.6 | 9.4 | |
4 days ago | 4 days ago | |
C++ | Python | |
GNU General Public License v3.0 or later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
intel-extension-for-tensorflow
-
Watch out AMD: Intel Arc A580 could be the next great affordable GPU
Intel already has a working GPGPU stack, using oneAPI/SYCL.
They also have arguably pretty good OpenCL support, as well as downstream support for PyTorch and Tensorflow using their custom extensions https://github.com/intel/intel-extension-for-tensorflow and https://github.com/intel/intel-extension-for-pytorch which are actively developed and just recently brought up-to-date with upstream releases.
-
How do you allocate more than 4GB of memory for OpenCL in A770 16GB?
I tried Intel® Extension for PyTorch* v1.13.10+xpu and intel-extension-for-tensorflow
-
I'm really happy with the card although the Ti version offers much better performance
Yeah I recently stubbled on it when I was looking into buying a 16gb a770 and wondering what was possible now. GitHub Intel extension for tensorflow
-
Does anyone uses Intel Arc A770 GPU for machine learning? [D]
Intel publish extensions for PyTorch and Tensorflow. I’ve been working with PyTorch so I just needed to follow these instructions to get everything set up.
- Intel Extension for TensorFlow
- Intel Extension for TensorFlow Released
-
SD on intel arc?
Actually I was just on GitHub trying to submit issues related to me testing Intel's PyTorch and Tensorflow extensions when I saw this; it seems that someone has already ported SD over to the tensorflow framework and so you can probably start using intel's extension for tensorflow with it immediately; and according to this article you can use Intel's extension within WSL under windows as well. But unfortunately given how the guy whose issue I linked to has been facing pretty serious performance issues of inferencing taking many minutes longer than it should when using an A770 to do SD-related inferencing, you might be better off waiting for intel's extension for tensorflow versions 1.2 and greater or something like that, so that when it's your turn to use it, Intel has already ironed out most of the major bugs within the software :)
bitsandbytes
-
French AI startup Mistral secures €2B valuation
No. Without the inference code, the best we can have are guesses on its implementation, so the benchmark figures we can get could be quite wrong. It does seem better than Llama2-70B in my tests, which rely on the work done by Dmytro Dzhulgakov[0] and DiscoResearch[1].
But the point of releasing on bittorrent is to see the effervescence in hobbyist research and early attempts at MoE quantization, which are already ongoing[2]. They are benefitting from the community.
[0]: https://github.com/dzhulgakov/llama-mistral
[1]: https://huggingface.co/DiscoResearch/mixtral-7b-8expert
[2]: https://github.com/TimDettmers/bitsandbytes/tree/sparse_moe
-
Lora training with Kohya issue
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
- FLaNK Stack Weekly for 30 Oct 2023
-
A comprehensive guide to running Llama 2 locally
While on the surface, a 192GB Mac Studio seems like a great deal (it's not much more than a 48GB A6000!), there are several reasons why this might not be a good idea:
* I assume most people have never used llama.cpp Metal w/ large models. It will drop to CPU speeds whenever the context window is full: https://github.com/ggerganov/llama.cpp/issues/1730#issuecomm... - while sure this might be fixed in the future, it's been an issue since Metal support was added, and is a significant problem if you are actually trying to actually use it for inferencing. With 192GB of memory, you could probably run larger models w/o quantization, but I've never seen anyone post benchmarks of their experiences. Note that at that point, the limited memory bandwidth will be a big factor.
* If you are planning on using Apple Silicon for ML/training, I'd also be wary. There are multi-year long open bugs in PyTorch[1], and most major LLM libs like deepspeed, bitsandbytes, etc don't have Apple Silicon support[2][3].
You can see similar patterns w/ Stable Diffusion support [4][5] - support lagging by months, lots of problems and poor performance with inference, much less fine tuning. You can apply this to basically any ML application you want (srt, tts, video, etc)
Macs are fine to poke around with, but if you actually plan to do more than run a small LLM and say "neat", especially for a business, recommending a Mac for anyone getting started w/ ML workloads is a bad take. (In general, for anyone getting started, unless you're just burning budget, renting cloud GPU is going to be the best cost/perf, although on-prem/local obviously has other advantages.)
[1] https://github.com/pytorch/pytorch/issues?q=is%3Aissue+is%3A...
[2] https://github.com/microsoft/DeepSpeed/issues/1580
[3] https://github.com/TimDettmers/bitsandbytes/issues/485
[4] https://github.com/AUTOMATIC1111/stable-diffusion-webui/disc...
[5] https://forums.macrumors.com/threads/ai-generated-art-stable...
-
Bit inference 4.2x faster than 16 bit
Release notes: https://github.com/TimDettmers/bitsandbytes/releases/tag/0.4...
-
Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0']
Welcome to bitsandbytes. For bug reports, please run python -m bitsandbytes and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues ================================================================================ bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cpu.so /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32 CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths... ERROR: /usr/bin/python3: undefined symbol: cudaRuntimeGetVersion CUDA SETUP: libcudart.so path is None CUDA SETUP: Is seems that your cuda installation is not in your path. See https://github.com/TimDettmers/bitsandbytes/issues/85 for more information. CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: Highest compute capability among GPUs detected: 7.5 CUDA SETUP: Detected CUDA version 00 CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cpu.so... /usr/local/lib/python3.10/dist-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. warn("The installed version of bitsandbytes was compiled without GPU support. " /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/lib64-nvidia did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('http'), PosixPath('//172.28.0.1'), PosixPath('8013')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-t4-s-1b6gsytv7z9le --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true'), PosixPath('--logtostderr --listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('module'), PosixPath('//ipykernel.pylab.backend_inline')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
-
Having trouble using the multimodal tools.
RuntimeError: CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment! If you cannot find any issues and suspect a bug, please open an issue with detals about your environment: https://github.com/TimDettmers/bitsandbytes/issues
- [TextGen WebUI] Service terminated error? (Screenshots in post)
- Considering getting a Jetson AGX Orin.. anyone have experience with it?
-
How to disable the `bitsandbytes` intro message:
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run python -m bitsandbytes and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues ================================================================================ bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda121.so CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths... CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 8.9 CUDA SETUP: Detected CUDA version 121 CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda121.so...
What are some alternatives?
stable-diffusion-tensorflow - Stable Diffusion in TensorFlow / Keras
GPTQ-for-LLaMa - 4 bits quantization of LLaMA using GPTQ
intel-extension-for-pytorch - A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
accelerate - 🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
FluidX3D - The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL.
FastChat - An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
OpenCL-Wrapper - OpenCL is the most powerful programming language ever created. Yet the OpenCL C++ bindings are cumbersome and the code overhead prevents many people from getting started. I created this lightweight OpenCL-Wrapper to greatly simplify OpenCL software development with C++ while keeping functionality and performance.
Dreambooth-Stable-Diffusion-cpu - Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion
compute-runtime - Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
llama.cpp - LLM inference in C/C++
diffusers - 🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
alpaca.cpp - Locally run an Instruction-Tuned Chat-Style LLM