The unexpected return of JavaScript for Automation

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • nut.js

    Native UI testing / controlling with node

  • One really cool little JS library I've been using for a bunch of desktop automation tasks lately is nut.js and the lower level libnut library it's implemented on top of:

    https://github.com/nut-tree/nut.js

    https://github.com/nut-tree/libnut

    It provides a means to send user input (mouse movement/clicks and key presses) and read and react to changes in visual state (through screenshots), and works across Windows, Linux and MacOS. It automates at a much lower level of abstraction than the approaches mentioned in the article that script against programmatic APIs.

    What I really like about this lower level approach is that you don't need to get anyone's permission to automate anything, since there's no programmatic API that the system owners has to provide for you and thus can limit or take away when it becomes inconvenient.

    Any task that can be accomplished though looking at stuff on the screen and clicking the mouse and pressing keys on a keyboard (i.e. what a real person would do to accomplish the same task) can be automated, and it's actually surprisingly easy and effective to do this with nut.js. What really helps is that OpenCV has become ridiculously good and ridiculously fast at matching/identifying objects from a screenshot, with latencies usually in the low double digits, so latency-based flakiness isn't nearly as much of an issue as I remember it in the old days. I've also played around with OCR with tesseract but haven't had as much success with it in terms of perf, and remember seeing latencies of several seconds for even recognizing a single word from a tiny pre-cropped screenshot containing only the word itself.

    The main tradeoff to this approach compared to automation through APIs is that because it works by simulating real user inputs, it's not very amenable to running in the background while a user is actively interacting with the same machine, so a separate machine or VM is often needed. That's an acceptable tradeoff for some use cases but complete deal breaker for others, so YMMV, but just wanted to bring this cool little tool to people's attention.

  • libnut

    Discontinued An Node-API addon for desktop automation

  • One really cool little JS library I've been using for a bunch of desktop automation tasks lately is nut.js and the lower level libnut library it's implemented on top of:

    https://github.com/nut-tree/nut.js

    https://github.com/nut-tree/libnut

    It provides a means to send user input (mouse movement/clicks and key presses) and read and react to changes in visual state (through screenshots), and works across Windows, Linux and MacOS. It automates at a much lower level of abstraction than the approaches mentioned in the article that script against programmatic APIs.

    What I really like about this lower level approach is that you don't need to get anyone's permission to automate anything, since there's no programmatic API that the system owners has to provide for you and thus can limit or take away when it becomes inconvenient.

    Any task that can be accomplished though looking at stuff on the screen and clicking the mouse and pressing keys on a keyboard (i.e. what a real person would do to accomplish the same task) can be automated, and it's actually surprisingly easy and effective to do this with nut.js. What really helps is that OpenCV has become ridiculously good and ridiculously fast at matching/identifying objects from a screenshot, with latencies usually in the low double digits, so latency-based flakiness isn't nearly as much of an issue as I remember it in the old days. I've also played around with OCR with tesseract but haven't had as much success with it in terms of perf, and remember seeing latencies of several seconds for even recognizing a single word from a tiny pre-cropped screenshot containing only the word itself.

    The main tradeoff to this approach compared to automation through APIs is that because it works by simulating real user inputs, it's not very amenable to running in the background while a user is actively interacting with the same machine, so a separate machine or VM is often needed. That's an acceptable tradeoff for some use cases but complete deal breaker for others, so YMMV, but just wanted to bring this cool little tool to people's attention.

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • node-csv

    Full featured CSV parser with simple api and tested against large datasets.

  • I still use it a lot, particularly in unit tests and configuration. Take the tests of the CSV package, for example https://github.com/adaltas/node-csv/blob/master/packages/csv..., once you get used to its syntax, it is easier to read than plain JS. For Nikita, TypeScript would be appealing for code completion. In term of type checking, it will be double usage since all arguments are checked at runtime with JSON schema.

  • node-jxa

    Use your favorite node.js modules (and JS editor) for your Javascript OSX automation scripts

  • A, and by the way. Inevitably with a decent and widespread language you want to use third-party libraries, especially seeing as there are thousands of them by now. Well, apparently you can load .scpt libraries—presumably done in either AppleScript or Javascript—but you can't just load a JS module from a file. There's a project that wraps JXA and uses Browserify to cram additional libraries down JXA's throat by merging them with the main script: https://github.com/johnelm/node-jxa

  • node-ffi-napi

    A foreign function interface (FFI) for Node.js, N-API style

  • I actually came from AutoHotKey as well, specifically for the cross platform support!

    I've found the dev experience with nut.js to be worlds ahead of AutoHotKey as well. You get to use a real programming language with proper modules, data structures, first-class functions, asynchrony, and have access to a vast ecosystem of third party libraries and tooling.

    Some Windows specific APIs are easier to work with on AHK due to the collection of built-in functions specifically tailored for automation, but everything AHK is still possible with nut.js since you're just writing a node.js script and have access to libraries like https://github.com/node-ffi-napi/node-ffi-napi that can call native system libraries, with a bit more work involved.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts