datefinder
dateparser
datefinder | dateparser | |
---|---|---|
3 | 7 | |
625 | 2,467 | |
- | 0.8% | |
0.0 | 6.7 | |
about 1 year ago | about 1 month ago | |
HTML | Python | |
MIT License | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
datefinder
-
Sneller Regex vs Ripgrep
That's with DFA minimization. Also, '\w' has 311 states while '(?-u)\w' has 5 states.
I don't have a precise definition of enormous or impractical. Does it matter? I suppose one obvious one is when DFA construction time starts having a significant impact on total search times.
> Additionally, the results are not the same: the number of matches is not equal to 7882. How could I make `\w` conform to other regex implementations in ripgrep?
By following UTS#18: https://unicode.org/reports/tr18/#word
Most regex engines make \w be ASCII-only by default. But most also have a way to opt into Unicode-aware mode. RE2, Go's regexp and ECMAScript are popular regex engines that have no way to change the interpretation of \w.
> Fair question how regex compilers handle nefarious regexes. Go does not handle NFA with more than 1000 states, and, as you observed, we added some more restrictions when processing the NFA. It can be an interesting academic exercise to find monstrous regexes, but we haven't encountered useful regexes that hit these limits. But I guess you know some...
It's definitely not academic. People use regexes for lexers. People use big regexes to recognize certain things like email addresses and dates. Here's a real regex used in real software to identify dates in unstructured text for example: https://github.com/akoumjian/datefinder/blob/5376ece0a522c44...
Otherwise, as I hinted at above, the thing that can make regexes very large very quickly is when you mix Unicode classes with counted repetitions. It doesn't take a lot to make them "big."
- Is there a Python library for reading human-written times?
-
Tuesday Daily Thread: Advanced questions
Looking at this issue it seems a recent pull request should fix the strict mode problem. That said, the pull request is still open due to a failing test so you can either build from source with the pull request or looking at the comments in the issue, look at dateparser as is mentioned. It might suit your needs.
dateparser
-
Guidance on creating a very lightweight model that does one task very well
you don't need an LLM for this, if you're using python, https://github.com/scrapinghub/dateparser works quite well.
-
Everyone is talking about how ChatGPT has improved their workflow. Are you using ChatGPT extensively in your workflow?
In this project, for example, behavior is 90% about what happens when you call the parse function: https://github.com/scrapinghub/dateparser
- Desperately looking for a natural language dates parser module
- How to detect multiple dates or datetime formats and convert them accordingly
-
Tuesday Daily Thread: Advanced questions
As I'm not familiar with the library, I won't be the greatest of help at that. Best recommendations I have for you is scrolling through the settings in the documentation and looking through the issues on their github, particularly the closed ones, to see if someone else is looking for the same features you are.
- Scrapinghub/dateparser: Python parser for human readable dates
What are some alternatives?
timefhuman - Convert natural language date-like strings--dates, date ranges, and lists of dates--to Python objects
developer - the first library to let you embed a developer agent in your own app!
Giveme5W1H - Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?
Scrapy - Scrapy, a fast high-level web crawling & scraping framework for Python.
Sherlock - Natural-language event parser for Javascript
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
pyate - PYthon Automated Term Extraction
kor - LLM(😽)
sneller - World's fastest log analysis: λ + SQL + JSON + S3
TheAlgorithms - All Algorithms implemented in Python
Crafting Interpreters - Repository for the book "Crafting Interpreters"
rust-memchr - Optimized string search routines for Rust.