Data Science with JavaScript: What we've learned so far?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • hal9ai

    Discontinued Hal9 — Data apps powered by code and LLMs [Moved to: https://github.com/hal9ai/hal9]

  • Author here, we can save some clicks by reposting content as comment:

    Hi there, since the beginning of this year we've been exploring how far we can take Data Science with JavaScript. As part of this journey, we started hal9.ai [1], an integrated environment to help us be more productive when analyzing data with JavaScript.

    We want to ask for your feedback, but more importantly, we want to use this post to share what we've learned so far:

    1. Visualizations: JavaScript is great at visualizing interactive data, this is probably obvious but worth mentioning nonetheless. Some of the highlights here, D3.js [2] is still a great library to perform visualizations; however, D3.js is really low level -- Kinda like TensorFlow [3], not Keras [4]. We actually went to create our own charting library to combine the flexibility of D3 with the ease-of-use of other libraries like Plotly [5]; just to find out later on that Plot.js [6] got launched as an amazing library that builds on top of D3. So we ended up integrating Plot.js as our recommended charting library.

    2. Transformations: We found out that JavaScript in combination with D3.js has a pretty decent set of data import and transformation functions; however, it comes nowhere near to Pandas [7] or dplyr [8]. After shopping around, we found out about Tidy.js [9], loved it, and adopted it. The combination of Tidy.js and D3.js and Plot.js is absolutely amazing for visualizations and data wrangling with small datasets, say 10-100K rows. We were very happy with this for a while; however, once you moved away from visualizations into data analysis, we found out 100K rows is quite restrictive, which is also slow when having 1K-10K columns. So we switched gears and started using Arquero.js [10], a columnar JS library that enabled us to process +1M rows in the browser, decent size for real-world data analysis.

    3. Modeling: We are currently exploring this space so our findings are not final, but let us share what we've found so far. TensorFlow.js [11] is absolutely amazing, it provides a native port from TensorFlow to JavaScript with support for CPU, WebGL [12], WebAssembly [13] and NodeJS [14] backends. We were able to write an LSTM [15] to do time series prediction, so far so good. However, we started hitting issues when we started to do simpler models, like a linear regression. We tried Regression.js [16] but we found it falls short, so we wrote our own script to handle single-variable regressions using TF.js and gradient descent. It actually worked quite well but exposed a flaw in this approach; basically, there is a lot of work to be done to bring many models into the web. We also found out Arquero.js does not play nicely with TF.js since well, Arquero.js does not support tensors; so we went on to explore Danfo.js [17], which integrates great with TF.js but we found out it's hard to scale transformations to +1M rows and found other rough edges. Since then, well, we started exploring Pyodide [18] and perhaps contributing to Danfo.js, or perhaps involving more server-side compute with NodeJS, but that will be a story for another time.

    So net-net, we are still super excited about exploring Data Science, Data Engineering, Visualization and Artificial Intelligence with JavaScript; but realistically, it is going to take a few years for this to mature.

    In the meantime, we think Data Science with JavaScript shines with smaller datasets and interactive visualizations; which we believe Hal9 can help you be productive at. That said, we do believe that motivated JavaScript users can help unblock themselves by adding new functionality and contributing back libraries to NPM or components to our open source project, please do reach out in Hal9's GitHub repo [19] if you wanna lend a hand!

    Alright, so call to action? Please head to hal9.ai and give it a shot! We would love to hear where you think this could be useful, what features we are missing, and any feedback you may have.

    To keep in touch, please subscribe to our weekly email at news.hal9.ai [20], contact us at [email protected], or follow us on Twitter as @hal9.ai

    Thanks for reading along!

    [1] https://hal9.ai

  • regression-js

    Curve Fitting in JavaScript.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • examples

    TensorFlow examples (by tensorflow)

  • Pandas

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

  • Keras

    Deep Learning for humans

  • dplyr

    dplyr: A grammar of data manipulation

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • PSA: You don't need fancy stuff to do good work.

    10 projects | /r/datascience | 9 May 2023
  • DO YOU YAML?

    7 projects | dev.to | 16 Jan 2023
  • Does anyone feel like R is actually vastly worse for dependency/environment management than Python?

    3 projects | /r/datascience | 15 Nov 2022
  • When did WG21 decide this is what networking looks like?

    4 projects | /r/cpp | 3 Oct 2021
  • Top 10 Python Libraries for Machine Learning

    14 projects | dev.to | 9 Sep 2021