GPT-4 Turbo with Vision is a step backwards for coding

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • codespin

    CodeSpin.AI Code Generation Tools

  • Shameless plug. I have a VS Code extension that's very nearly ready.

    Codespin CLI tools (ready to use): https://github.com/codespin-ai/codespin

    VS Code extension for the CLI tool (soon): https://www.youtube.com/watch?v=2TJqosFmkao

    I'll do a Show HN in a week or two.

  • askai

    Command Line Interface for OpenAi ChatGPT (by yudax42)

  • Maybe I am bit dim, but how one can choose GPT-4 Turbo? Is this available from https://chat.openai.com/ ?

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • refactor-benchmark

    Aider's refactoring benchmark exercises based on popular python repos

  • FWIW, I agree with you that each model has its own personality and that models may do better or worse on different kinds of coding tasks. Aider leans into both of these concepts.

    The GPT-4 Turbo models have a lazy coding personality, and I spent months of effort figuring out how to both measure and reduce that laziness. This resulted in aider supporting "unified diffs" as a code editing format to reduce such laziness by 3X [0] and the aider refactoring benchmark as a way to quantify these benefits [1].

    The benchmark results I just shared about GPT-4 Turbo with Vision cover both smaller, toy coding problems [2] as well as larger edits to larger source files [3]. The new model slightly underperforms on smaller coding tasks, and significantly underperforms on the larger edits where laziness is often a culprit.

    [0] https://aider.chat/2023/12/21/unified-diffs.html

    [1] https://github.com/paul-gauthier/refactor-benchmark

    [2] https://aider.chat/2024/04/09/gpt-4-turbo.html#code-editing-...

    [3] https://aider.chat/2024/04/09/gpt-4-turbo.html#lazy-coding

  • openai-python

    The official Python library for the OpenAI API

  • The ongoing model anchoring/grounding issue likely affects all GPT-4 checkpoints/variants, but is most prominent with the latest "gpt-4-turbo-2024-04-09" variant due to its most recent cutoff date, might imply deeper issues with the current model architecture, or at least how it's been updated:

    https://github.com/openai/openai-python/issues/1310

    See also the original thread on OpenAI's developer forums (linked on the GitHub issue) with confirmations from others.

    A test code snippet is included in the GitHub issue to A/B test the problem yourself with your own questions if need be.

  • big-AGI

    Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts