rust
spaCy
Our great sponsors
rust | spaCy | |
---|---|---|
2184 | 87 | |
77,440 | 25,158 | |
1.7% | 1.1% | |
10.0 | 9.7 | |
about 13 hours ago | 4 days ago | |
Rust | Python | |
GNU General Public License v3.0 or later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
rust
- I Use C When I Believe in Memory Safety
-
Why the slow compile times?
I wanted to comment on this a little. The Rust team have gone out of their way to keep Rust's grammar easy to parse! A good example is the turbofish, ::<>, which is used with generics. This PR shows an example of why it's necessary, which lead to the creation of the infamous Bastion of the Turbofish.
-
I wrote a library to expand byte string literals for pattern matching
The pattern matching is cool! Also, it looks like there's a concat_bytes!() in nightly (issue here) if concatenation is all you need (when it stabilizes or if you're on nightly).
-
I ❤️ PEG
Rust lexer, still no regex.
-
How to be able to contribute to languages/compilers?
A huge part of it is working on a compiler that's written in a language that helps avoid mistakes. Even when I was doing C++ regularly I never even tried diving into Clang, because I "knew" I'd be in a mess of complicated manual stuff that I was sure I'd break somehow. But with Rust, I first did a trivial compiler change https://github.com/rust-lang/rust/pull/42275/files#diff-3675ead66a843fefc1a0ac141fac8adeac7899e87979e79d2b4cd2dddd11c2b2, and that was non-terrible enough that I tried a slightly bigger change https://github.com/rust-lang/rust/pull/46264/files#diff-265ef672b5d778c5debaca696bc903a604165df54c44ea4bff07a2369b92e90d, and while I'm far from an expert on the compiler, now I can just go add stuff https://github.com/rust-lang/rust/pull/96376 and it's no big deal.
- Show HN: Mass Dissent – Easily send a letter to U.S. Congress representatives
-
"My Reaction to Dr. Stroustrup’s Recent Memory Safety Comments"
ICE's are not something C++ exclusive. Plenty of that in rustc. That fact that you hit that while working with C++ is unfortunate, but it could be your experience with Rust as well. The only difference would be Rust's faster cadence and more open community/process so there'd be a chance your issue would be fixed in the next 6-12 weeks.
-
Stop Comparing Rust to Old C++
The partial borrow issue is from a desire to assign names to tuple indices so you can access elements without sensitivity to their order. Without that, any change to the arrangement of components in an ECS archetype would affect downstream code attempting to query-iterate those tuples. Ideally I would use an intermediate struct, but as I discovered over the course of this project, rustc/LLVM aren't great about converting between tuples and structs. The only way to be sure you're not taking a perf hit would be to use a code-generated trait to rename the tuple fields, but you can't do that because traits don't give you direct field access and you can't partial borrow from function access.
There are three. The official one, mrustc (no borrow checker, but can essentially compile the official rustc) and GCC (can't really compile anything substantial yet). Only rustc is production-ready though.
-
Moving and re-exporting a Rust type can be a major breaking change
By following this issue to this issue comment, I think the reason is the sheer complexity of tuple struct constructor: * The syntax of tuple struct constructor is the same as either a function call or a constant, so to make life easier, the rust devs made it a function or a constant, i.e. declaring a tuple struct with pub fields also fills the value namespace with a function or constant. * Named field struct constructor is cannot be mistaken for a function call, so rust devs are free to let user construct them with type aliases.
spaCy
-
Looking for open source projects in Machine Learning and Data Science
You could try spaCy. This is the brains of the operation - an open-source NLP library for advanced NLP in Python. Another is DocArray - It's built on top of NumPy and Dask, and good for preprocessing, modeling, and analysis of text data.
-
One does not simply "create a visualization" from unstructured data!
In this example given in the article, I can't just use SQL functions to extract the age and phone number. I guess the phone number could be regexed but ideally I should use something like spaCy and also record some kind of confidence score. This is where Spark/Dask/etc really shine. Does Airbyte support user defined functions in a language like Python?
-
Training on BERT without any 'context' just questions/answer tuples?
(1) For large scale processing/tokenizing your data I would consider using something like NLTK or Spacy. That's if your books are already in text form. If they are scans, you'll need to use some OCR software first.
-
Has anyone here ever used the seaNMF model for short text topic modeling, and be willing to help me get started with it?
Tokenize with NLTK, SpaCy or CoreNLP
-
Transforming free-form geospatial directions into addresses - SOTA?
If you've got a specific area you're looking at, and already have street data, you could: 1. Follow the ArcGis blog's directions, creating intersection features. 2. Train a classifier (or a specific NER entity type; SpaCy would be a good package for that) on the types of cross-street references you're finding in your text. You can see some of the relevant tokens in the examples you provided - "Corner of", "along", and I'd imagine "intersection of" etc. Even simple string lookups could help you bootstrap the training data. 3. Use some sort of embedding similarity to compare the hit terms to potential cross-streets.
-
Tell HN: Selling My SaaS
Great question! short answer, it doesn't.
While I did start with a vision of presbot being a self-learning chatbot built to act as an interactive agent that would represent its owner (primarily b2c) in all sorts of situations. Based on the feedback, I realized that until that interaction is smooth, believable and closer to an actual dynamic conversation, it provides much less value. I was using a combination of https://www.nltk.org/" rel="nofollow">NLTK,https://spacy.io/" rel="nofollow">spaCy and https://textblob.readthedocs.io/en/dev/" rel="nofollow">textblob for NLP then.
I pivoted to a rule-based bot focusing on lead capture via a linear conversation driven by user specified questions, more like an interactive version of a static form, with prescribed Q&A (FAQs on the platform).
-
Which not so well known Python packages do you like to use on a regular basis and why?
i work mostly in the NLP space, so other libraries i like are spaCy, nltk, and pynlp lib
-
Is it home bias or is data wrangling for machine learning in python much less intuitive and much more burdensome than in R?
Standout python NLP libraries include Spacy and Gensim, as well as pre-trained model availability in Hugginface. These libraries have widespread use in and support from industry and it shows. Spacy has best-in-class methods for pre-processing text for further applications. Gensim helps you manage your corpus of documents, and contains a lot of different tools for solving a common industry task, topic modeling.
-
How to get started with machine learning.
Given your need, I think you'll be better off with libraries like Spacy, which does NLP (rather than just DNN inference). You'll get your app much faster this way.
- There is framework for everything.
What are some alternatives?
TextBlob - Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
carbon-lang - Carbon Language's main repository: documents, design, implementation, and related tools. (NOTE: Carbon Language is experimental; see README)
NLTK - NLTK Source
Stanza - Official Stanford NLP Python Library for Many Human Languages
polyglot - Multilingual text (NLP) processing toolkit
BERT-NER - Pytorch-Named-Entity-Recognition-with-BERT
zig - General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
textacy - NLP, before and after spaCy
Jieba - 结巴中文分词
Nim - Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).
CoreNLP - Stanford CoreNLP: A Java suite of core NLP tools.
PyTorch-NLP - Basic Utilities for PyTorch Natural Language Processing (NLP)