warc-parquet VS abracabra

Compare warc-parquet vs abracabra and see what are their differences.

warc-parquet

🗄️ A simple CLI for converting WARC to Parquet. (by maxcountryman)

abracabra

Eventually a search engine, but currently a filtering pipeline for HTML and soon WARC files. (by hadrianw)
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
warc-parquet abracabra
4 1
99 0
- -
6.8 0.0
2 days ago almost 2 years ago
Rust Rust
- BSD Zero Clause License
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

warc-parquet

Posts with mentions or reviews of warc-parquet. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-06-24.

abracabra

Posts with mentions or reviews of abracabra. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-01-09.
  • Ask HN: Show me your Half Baked project
    154 projects | news.ycombinator.com | 9 Jan 2021
    Half-baked as in eating it can cause gastric problems, not as in 50% done?

    https://github.com/hadrianw/werf a graphical mouse driven text editor inspired by Plan 9's acme. It can open quite big files, you can WIMP around a bit, but README is just wishful thinking, it can't even save files. Written in C with cairo and fontconfig. Currently for a few years I'm in process of rewriting text buffer, I have something nice, but did not test it enough and did not integrate it. Now I'm thinking of a rewrite in Zig to learn it and also make it easier to test. But that's my wishful thinking again.

    https://github.com/hadrianw/tomatoaster a ChromeOS like Linux distribution based on Void Linux build system, AB partition scheme, building squashfs image without root privileges. Currently I did a nice and almost proper script to handle it and do not need to patch as match to build an image, that runs, but is not entirely useful. Need to clean-up the script and commit. Mostly bash, bunch of patches and config files and a bit of C.

    https://github.com/hadrianw/abracabra a search engine, that will not index pages with ads (all results would be uBlock-Origin clean), that is not yet even a proper pipeline to check whether a page does contain ads or not, no crawler yet at all. I want to go through Common Crawl archives first. I did something in Go first (https://github.com/hadrianw/abracabra-legacy), but now I'm rewriting it in Rust, because of awesome lol_html crate, that will make filtering fast and easy. Currently writing code to filter URLs with Rabin-Karp and a bit of loops. It created an e-mail thread years ago with people wanting to help, but I've been too slow.

    I don't want much help to code things, I would appreciate however a bit of pointers on a couple of things regarding Rust and watchdogs (to recognize a partition as unbootable and reset the system to the previous partition).

What are some alternatives?

When comparing warc-parquet and abracabra you can also consider the following projects:

sqlite-parquet-vtable - A SQLite vtable extension to read Parquet files

spyglass - A personal search engine: Create a searchable library from your personal documents, interests, and more!

spotblock-rs - A spotify advertisment muter for Pipewire

privaxy - Privaxy is the next generation tracker and advertisement blocker. It blocks ads and trackers by MITMing HTTP(s) traffic.

observable-state-tree - An observable state tree is a normal object except that listeners can be bound to any subtree of the state tree.

DIY-arcade - How to build your own full-size arcade machine from scratch

dflex - The sophisticated Drag and Drop library you've been waiting for 🥳

abs_cd - CI/CD for the Arch build system with webinterface.

pyodide - Pyodide is a Python distribution for the browser and Node.js based on WebAssembly

hacn - A "monad" or DSL for creating React components using Fable and F# computation expressions

pcopy - pcopy is a temporary file host, nopaste and clipboard across machines. It can be used from the Web UI, via a CLI or without a client by using curl.

dupver - Deduplicating VCS for large binary files in Go