DataProfiler
pyWhat
Our great sponsors
DataProfiler | pyWhat | |
---|---|---|
61 | 16 | |
1,357 | 6,346 | |
2.1% | - | |
6.3 | 0.0 | |
4 days ago | 6 months ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
DataProfiler
-
LongRoPE: Extending LLM Context Window Beyond 2M Tokens
It's been possible to skip tokenization for a long time, my team and I did it here - https://github.com/capitalone/DataProfiler
For what it's worth, we actually were working with LSTMs with nearly a billion params back in 2016-2017 area. Transformers made it far more effective to train and execute, but ultimately LSTMs are able to achieve similar results, though slow & require more training data.
- Data Profiler โ What's in your data?
-
Data Profiler 0.9.0 -- offering a massive improvement to memory usage during profiling of large datasets
Great call out -- would you be willing to write up an issue for that on the repo? Thank you! https://github.com/capitalone/DataProfiler/issues/new/choose
- FLiPN-FLaNK Stack Weekly for 20 March 2023
- Release 0.8.3 ยท capitalone/DataProfiler
pyWhat
-
Go Library like PyWhat?
Is there a library written in Go similar to PyWhat? I want to use a subset of the functionality for a simple go program I'm writing. I could just call PyWhat, link to lemmeknow, or even write a simple go implementation myself, but I wanted to ask if there was a pure go implementation. Thanks!
-
lemmeknow v0.7.0 is here with support for identifying bytes with help of regex crate!
Lemmeknow is basically used for identifying text as mentioned in README and video. It is Rust implementation of PyWhat. You can see various usecases there too.
-
lemmeknow - The fastest way to identify anything!
For rarity, we have got the database from pyWhat and the wiki says:
-
lemmeknow - the fastest way to identify anything!
This project was inspired by u/beesec 's pyWhat
- Tips for Making a Popular Open-Source Project in 2021 [Ultimate Guide]
- PyWhat - Identify Anything
- PyWhat - Identify Anything. Easily identify API keys, secrets, cryptocurrency wallets and more.
-
Is there an application or way to find hashes?
Do you mean something like this: https://github.com/bee-san/pyWhat
- Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is
-
IT Pro Tuesday #155 - Carrier Lookup, Network Podcast, Identification Tool & More
pyWhat enables you to easily identify emails, IP addresses and more. Feed it a .pcap file or some mysterious text or hex of a file, and it will tell you what it is. The tool is recursive, so it can identify everything in text, files and more. A shout out to the tool's author for sharing his creation.
What are some alternatives?
ydata-profiling - 1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
arkime - Arkime is an open source, large scale, full packet capturing, indexing, and database system.
usaddress - :us: a python library for parsing unstructured United States address strings into address components
BruteShark - Network Analysis Tool
superset - Apache Superset is a Data Visualization and Data Exploration Platform
chepy - Chepy is a python lib/cli equivalent of the awesome CyberChef tool.
XlsxWriter - A Python module for creating Excel XLSX files.
TryHackMe - This is a repository containing TryHackMe Writeups in Somali language on various of rooms & challenges, including notes, files and solutions.
vtuber-livechat-dataset - ๐ VTuber 1B: Billion-scale Live Chat and Moderation Event Dataset
elementary - The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
maltrail - Malicious traffic detection system