cody VS TensorRT

Compare cody vs TensorRT and see what are their differences.

cody

AI that knows your entire codebase (by sourcegraph)

TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT. (by NVIDIA)
SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
surveyjs.io
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
cody TensorRT
22 22
1,942 9,110
10.0% 1.8%
9.9 5.0
about 16 hours ago 7 days ago
TypeScript C++
Apache License 2.0 Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

cody

Posts with mentions or reviews of cody. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-07.
  • Ask HN: Cheapest way to use LLM coding assistance?
    1 project | news.ycombinator.com | 13 Apr 2024
    checkout the cody extension https://github.com/sourcegraph/cody available for various editors like vscode
  • The lifecycle of a code AI completion
    6 projects | news.ycombinator.com | 7 Apr 2024
    I don't think it is. There is a test file which includes C#, Kotlin, etc among supported languages, which aren't included in the file you linked: https://github.com/sourcegraph/cody/blob/main/vscode/src/com...

    But this test didn't seem to include TypeScript so it's obviously not comprehensive. I'm not convinced this information is actually in one place.

  • Ollama is now available on Windows in preview
    7 projects | news.ycombinator.com | 17 Feb 2024
    Cody (https://github.com/sourcegraph/cody) supports using Ollama for autocomplete in VS Code. See the release notes at https://sourcegraph.com/blog/cody-vscode-1.1.0-release for instructions. And soon it'll support Ollama for chat/refactoring as well (https://twitter.com/sqs/status/1750045006382162346/video/1).

    Disclaimer: I work on Cody.

  • My 2024 AI Predictions
    3 projects | news.ycombinator.com | 8 Jan 2024
    Have you tried Cody (https://cody.dev)? Cody has a deep understanding of your codebase and generally does much better at code gen than just one-shotting GPT4 without context.

    (disclaimer: I work at Sourcegraph)

  • 🚀 7 AI Tools to Improve your productivity: A Deep Dive 🪄✨
    5 projects | dev.to | 3 Jan 2024
    3️⃣ Cody AI 🤖
  • An ex-Googler's guide to dev tools
    2 projects | news.ycombinator.com | 28 Nov 2023
    Author of the post here—as another commenter mentioned, this is indeed a bit dated now, someone should probably write an updated post!

    There's been a ton of evolution in dev tools in the past 3 years with some old workhorses retiring (RIP Phabricator) and new ones (like Graphite, which is awesome) emerging... and of course AI-AI-AI. LLMs have created some great new tools for the developer inner loop—that's probably the most glaring omission here. If I were to include that category today, it would mention tools like ChatGPT, GH Copilot, Cursor, and our own Sourcegraph Cody (https://cody.dev). I'm told that Google has internal AI dev tools now that generate more code than humans.

    Excited to see what changes the next 3 years bring—the pace of innovation is only accelerating!

  • LocalPilot: Open-source GitHub Copilot on your MacBook
    6 projects | news.ycombinator.com | 19 Oct 2023
    I'm sorry to hear that. We have made a lot of improvements to Cody recently. We had a big release on Oct 4 that significantly decreased latency while improving completion quality. You can read all about it here: https://about.sourcegraph.com/blog/feature-release-october-2...

    We love feedback and ideas as well, and like I said are constantly iterating on the UI to improve it. I'm actually wrapping up a blog post on how to better leverage Cody w/ VS Studio, that'll be out either later today or sometime tomorrow. As far as feedback though: https://github.com/sourcegraph/cody/discussions/new?category... would be the place to share ideas :)

  • Show HN: Ollama for Linux – Run LLMs on Linux with GPU Acceleration
    14 projects | news.ycombinator.com | 26 Sep 2023
    Ollama is awesome. I am part of a team building a code AI application[1], and we want to give devs the option to run it locally instead of only supporting external LLMs from Anthropic, OpenAI, etc. Those big remote LLMs are incredibly powerful and probably the right choice for most devs, but it's good for devs to have a local option as well—for security, privacy, cost, latency, simplicity, freedom, etc.

    As an app dev, we have 2 choices:

    (1) Build our own support for LLMs, GPU/CPU execution, model downloading, inference optimizations, etc.

    (2) Just tell users "run Ollama" and have our app hit the Ollama API on localhost (or shell out to `ollama`).

    Obviously choice 2 is much, much simpler. There are some things in the middle, like less polished wrappers around llama.cpp, but Ollama is the only thing that 100% of people I've told about have been able to install without any problems.

    That's huge because it's finally possible to build real apps that use local LLMs—and still reach a big userbase. Your userbase is now (pretty much) "anyone who can download and run a desktop app and who has a relatively modern laptop", which is a big population.

    I'm really excited to see what people build on Ollama.

    (And Ollama will simplify deploying server-side LLM apps as well, but right now from participating in the community, it seems most people are only thinking of it for local apps. I expect that to change when people realize that they can ship a self-contained server app that runs on a cheap AWS/GCP instance and uses an Ollama-executed LLM for various features.)

    [1] Shameless plug for the WIP PR where I'm implementing Ollama support in Cody, our code AI app: https://github.com/sourcegraph/cody/pull/905.

  • Cody – The AI that knows your entire codebase
    14 projects | news.ycombinator.com | 26 Aug 2023
    Awesome. The repository is at https://github.com/sourcegraph/cody for anyone who hasn't seen it yet.
  • Code AI with Codebase Context
    1 project | news.ycombinator.com | 20 Jul 2023

TensorRT

Posts with mentions or reviews of TensorRT. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-09-26.
  • AMD MI300X 30% higher performance than Nvidia H100, even with optimized stack
    1 project | news.ycombinator.com | 17 Dec 2023
    > It's not rocket science to implement matrix multiplication in any GPU.

    You're right, it's harder. Saying this as someone who's done more work on the former than the latter. (I have, with a team, built a rocket engine. And not your school or backyard project size, but nozzle bigger than your face kind. I've also written CUDA kernels and boy is there a big learning curve to the latter that you gotta fundamentally rethink how you view a problem. It's unquestionable why CUDA devs are paid so much. Really it's only questionable why they aren't paid more)

    I know it is easy to think this problem is easy, it really looks that way. But there's an incredible amount of optimization that goes into all of this and that's what's really hard. You aren't going to get away with just N for loops for a tensor rank N. You got to chop the data up, be intelligent about it, manage memory, how you load memory, handle many data types, take into consideration different results for different FMA operations, and a whole lot more. There's a whole lot of non-obvious things that result in high optimization (maybe obvious __after__ the fact, but that's not truthfully "obvious"). The thing is, the space is so well researched and implemented that you can't get away with naive implementations, you have to be on the bleeding edge.

    Then you have to do that and make it reasonably usable for the programmer too, abstracting away all of that. Cuda also has a huge head start and momentum is not a force to be reckoned with (pun intended).

    Look at TensorRT[0]. The software isn't even complete and it still isn't going to cover all neural networks on all GPUs. I've had stuff work on a V100 and H100 but not an A100, then later get fixed. They even have the "Apple Advantage" in that they have control of the hardware. I'm not certain AMD will have the same advantage. We talk a lot about the difficulties of being first mover, but I think we can also recognize that momentum is an advantage of being first mover. And it isn't one to scoff at.

    [0] https://github.com/NVIDIA/TensorRT

  • Getting SDXL-turbo running with tensorRT
    1 project | /r/StableDiffusion | 6 Dec 2023
    (python demo_txt2img.py "a beautiful photograph of Mt. Fuji during cherry blossom"). https://github.com/NVIDIA/TensorRT/tree/release/8.6/demo/Diffusion
  • Show HN: Ollama for Linux – Run LLMs on Linux with GPU Acceleration
    14 projects | news.ycombinator.com | 26 Sep 2023
    - https://github.com/NVIDIA/TensorRT

    TVM and other compiler-based approaches seem to really perform really well and make supporting different backends really easy. A good friend who's been in this space for a while told me llama.cpp is sort of a "hand crafted" version of what these compilers could output, which I think speaks to the craftmanship Georgi and the ggml team have put into llama.cpp, but also the opportunity to "compile" versions of llama.cpp for other model architectures or platforms.

  • Nvidia Introduces TensorRT-LLM for Accelerating LLM Inference on H100/A100 GPUs
    3 projects | news.ycombinator.com | 8 Sep 2023
    https://github.com/NVIDIA/TensorRT/issues/982

    Maybe? Looks like tensorRT does work, but I couldn't find much.

  • Train Your AI Model Once and Deploy on Any Cloud
    3 projects | news.ycombinator.com | 8 Jul 2023
    highly optimized transformer-based encoder and decoder component, supported on pytorch, tensorflow and triton

    TensorRT, custom ml framework/ inference runtime from nvidia, https://developer.nvidia.com/tensorrt, but you have to port your models

  • A1111 just added support for TensorRT for webui as an extension!
    5 projects | /r/StableDiffusion | 27 May 2023
  • WIP - TensorRT accelerated stable diffusion img2img from mobile camera over webrtc + whisper speech to text. Interdimensional cable is here! Code: https://github.com/venetanji/videosd
    3 projects | /r/StableDiffusion | 21 Feb 2023
    It uses the nvidia demo code from: https://github.com/NVIDIA/TensorRT/tree/main/demo/Diffusion
  • [P] Get 2x Faster Transcriptions with OpenAI Whisper Large on Kernl
    7 projects | /r/MachineLearning | 8 Feb 2023
    The traditional way to deploy a model is to export it to Onnx, then to TensorRT plan format. Each step requires its own tooling, its own mental model, and may raise some issues. The most annoying thing is that you need Microsoft or Nvidia support to get the best performances, and sometimes model support takes time. For instance, T5, a model released in 2019, is not yet correctly supported on TensorRT, in particular K/V cache is missing (soon it will be according to TensorRT maintainers, but I wrote the very same thing almost 1 year ago and then 4 months ago so… I don’t know).
  • Speeding up T5
    2 projects | /r/LanguageTechnology | 22 Jan 2023
    I've tried to speed it up with TensorRT and followed this example: https://github.com/NVIDIA/TensorRT/blob/main/demo/HuggingFace/notebooks/t5.ipynb - it does give considerable speedup for batch-size=1 but it does not work with bigger batch sizes, which is useless as I can simply increase the batch-size of HuggingFace model.
  • demoDiffusion on TensorRT - supports 3090, 4090, and A100
    1 project | /r/StableDiffusion | 10 Dec 2022

What are some alternatives?

When comparing cody and TensorRT you can also consider the following projects:

ollama - Get up and running with Llama 3, Mistral, Gemma, and other large language models.

DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

zoekt - Fast trigram based code search

FasterTransformer - Transformer related optimization, including BERT, GPT

lsp-cody - A Client to Connect to the Cody LSP Gateway

onnx-tensorrt - ONNX-TensorRT: TensorRT backend for ONNX

koboldcpp - A simple one-file way to run various GGML and GGUF models with KoboldAI's UI

vllm - A high-throughput and memory-efficient inference and serving engine for LLMs

llm-ls - LSP server leveraging LLMs for code completion (and more?)

openvino - OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

localpilot

stable-diffusion-webui - Stable Diffusion web UI