Brave Leo now uses Mixtral 8x7B as default

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • llama.cpp

    LLM inference in C/C++

    Nice graphs here: https://github.com/ggerganov/llama.cpp/pull/1684

    So for example, 2 bit version of the 30B is much worse than the original, but still better than the 13B model.

    Also, there are lots of extra details, eg, not all of the weights are 2 bit, and even the 2 bit weights are higher than that overall as groups of quantised weights share scale factors stored elsewhere.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Cgml

    GPU-targeted vendor-agnostic AI library for Windows, and Mistral model implementation.

    Here’s an example of a custom 4 bits/weight codec for ML weights:

    https://github.com/Const-me/Cgml/blob/master/Readme.md#bcml1...

    llama.cpp does it slightly differently but still, AFAIK their quantized data formats are conceptually similar to my codec.

  • uBlock

    uBlock Origin - An efficient blocker for Chromium and Firefox. Fast and lean.

    > It allows for 30,000 dynamic rules

    That is not what we mean by dynamic filters. From https://developer.chrome.com/blog/improvements-to-content-fi...

    > However, to support more frequent updates and user-defined rules, extensions can add rules dynamically too, without their developers having to upload a new version of the extension to the Chrome Web Store.

    What Chrome is talking about is the ability to specify rules at runtime. What critics of Manifest V3 are talking about is not the ability to dynamically add rules (although that can be an issue), it is the ability to add dynamic rules -- ie rules that analyze and rewrite requests in the style of the blockingWebRequest permission.

    It's a little deceptive to claim that the concerns here are outdated and to point to vague terminology that sounds like it's correcting the problem, but on actual inspection turns out to be entirely separate functionality from what the GP was talking about.

    > Giving this ability to extensions can slow down the browser for the user. These ads can still be blocked through other means.

    This is the debate; most of the adblocking community disagrees with this assertion. uBO maintains a list of some common features that are already not possible to support in Chrome ( https://github.com/gorhill/uBlock/wiki/uBlock-Origin-works-b... ) and has written about features that are not able to be supported via Chrome's current V3 API ( https://github.com/uBlockOrigin/uBOL-home/wiki/Frequently-as... ). Of particular note are filtering for large media elements (I use this a lot on mobile Firefox, it's great for reducing page size), and top-level filtering of domains/fonts.

  • uBOL-home

    uBO Lite home (MV3)

    > It allows for 30,000 dynamic rules

    That is not what we mean by dynamic filters. From https://developer.chrome.com/blog/improvements-to-content-fi...

    > However, to support more frequent updates and user-defined rules, extensions can add rules dynamically too, without their developers having to upload a new version of the extension to the Chrome Web Store.

    What Chrome is talking about is the ability to specify rules at runtime. What critics of Manifest V3 are talking about is not the ability to dynamically add rules (although that can be an issue), it is the ability to add dynamic rules -- ie rules that analyze and rewrite requests in the style of the blockingWebRequest permission.

    It's a little deceptive to claim that the concerns here are outdated and to point to vague terminology that sounds like it's correcting the problem, but on actual inspection turns out to be entirely separate functionality from what the GP was talking about.

    > Giving this ability to extensions can slow down the browser for the user. These ads can still be blocked through other means.

    This is the debate; most of the adblocking community disagrees with this assertion. uBO maintains a list of some common features that are already not possible to support in Chrome ( https://github.com/gorhill/uBlock/wiki/uBlock-Origin-works-b... ) and has written about features that are not able to be supported via Chrome's current V3 API ( https://github.com/uBlockOrigin/uBOL-home/wiki/Frequently-as... ). Of particular note are filtering for large media elements (I use this a lot on mobile Firefox, it's great for reducing page size), and top-level filtering of domains/fonts.

  • willow-inference-server

    Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS

    I think this perspective comes from a lack of historical experience and hands-on experience overall.

    Nvidia more broadly has very impressive support for their GPUs. If you look at the support lifecycles for their Jetson hardware over time it's significantly worse. I encourage you to look at what support lifecycles have looked like, with the most "egregious" example being dropping of support for the Jetson Nano in from what I recall was within a couple of years.

    Another consideration - Jetson is optimized for power efficiency/form-factor and on a per $ basis CUDA performance is terrible. The power efficiency and form-factor come at significant cost. See this discussion from one of my projects[0]. I evaluated the use of WIS on an Orin that I have and from what I can recall it was significantly slower than a GTX 1070 which is... Unimpressive.

    In the end what do I care what people use, I'm offering the perspective and experience of someone who has actually used the Jetson line for many years and frequently struggled with all of these issues and more.

    [0] - https://github.com/toverainc/willow-inference-server/discuss...

  • Not so sure about that. Check out https://github.com/ray-project/llmperf-leaderboard

    And try mixtral on chat.groq.com

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Some thoughts on the web. On web engines, content, and web rent (2023)

    1 project | news.ycombinator.com | 12 Jul 2024
  • Brave launches Search Ads in key markets after successful test phase

    1 project | news.ycombinator.com | 31 May 2024
  • Anonymous Source Shared Leaked Google Search API Documents

    5 projects | news.ycombinator.com | 27 May 2024
  • Apr 24th is JavaScript Naked Day – Browse the web without JavaScript

    1 project | news.ycombinator.com | 21 Apr 2024
  • Some notes on Firefox's media autoplay settings in practice as of Firefox 124

    2 projects | news.ycombinator.com | 30 Mar 2024