floc
Yacy
floc | Yacy | |
---|---|---|
92 | 115 | |
928 | 3,265 | |
- | 1.0% | |
1.1 | 8.7 | |
about 1 year ago | 5 days ago | |
Makefile | Java | |
GNU General Public License v3.0 or later | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
floc
-
Google starts trialing its FLoC cookie alternative in Chrome.
Draft: https://github.com/WICG/floc
- Chrome vulnerability reported for 3.2 billion users
-
[D] Google FLoC and Topics API suspiciously similar.
"The browser uses machine learning algorithms to develop a cohort based on the sites that an individual visits. The algorithms might be based on the URLs of the visited sites, on the content of those pages, or other factors. The central idea is that these input features to the algorithm, including the web history, are kept local on the browser and are not uploaded elsewhere — the browser only exposes the generated cohort." Source: https://github.com/WICG/floc
-
Will a VPN help me? And is Kape Technologies ruining everything?
Google (or other third-party tracking) is also not effected by VPN. These groups use cookie syncing to assign you a unique ID and then collect this ID again as you browse the internet. That buyerID can then be cross-referenced (even with other buyerIDs) to generate all sorts of different demographic/psychographic information and used to fingerprint your online life for audience targeting. Google actually is in the works to take this a step forward with the FloC experiment. FloC (Federated League of Cohorts) actually deprecates the Set-Cookie header in favor of in-browser history scanning. Basically, in a year or two they plan to incorporate Chrome into their adtech stack and have it report your history/behavior to Google (regardless of whether you save history or not). Here is some good info on that: https://github.com/WICG/floc
-
Google Play Services now lets you delete your advertising ID when you opt out of ad personalization
Instead they propose new standards, like HTML Imports or FLoC, and the W3C decides as a whole whether or not they become official standards.
-
Google considers switching FLoC to a topic-based approach
With cross-site cookies, adnetwork.com has full information about what sites you've visited (among sites that incorporate their cookies). This isn't good either! But generally speaking, an individual site using adnetwork.com for advertising won't have or want access to that vector of your interests; many site operators don't even have visibility into what ads win real-time bidding, just that they're receiving money for providing their inventory. Certainly there are players that can provide demographic targeting metadata to site operators, but to my knowledge they are less widely known and certainly not cheap, and I imagine (or hope) any players with wide enough cookie reach would be discouraged from maintaining a database that could associate metadata with PII.
With FLoC, though, the idea was that the browser would provide document.interestCohort() and the individual site's JS could react accordingly: https://github.com/WICG/floc . This means that any site, regardless of its contracts with ad networks, could immediately identify your cohort and associate it with your activity. Web developers working in good faith would be encouraged to have user.cohort or user.topic fields from day one "just so you have it" - imagine all the ways someone could use this in bad faith. Inevitably this data would leak (or be intentionally leaked) and could trivially become a target list for doxxing closeted people. It's a dangerous, dangerous proposal.
-
Trying to understand Addressability (for native mobile, and in general)
You can't find any info about this because there isn't really any. Josh Karlin, who is the maintainer of the FLoC working document, said at an event that it might make sense to swap to topics. It's essentially just reducing the entropy of the cohorts and giving them a more comprehensible (and probably less useful) taxonomy. That's all the info there is.
-
Apple's Plan to "Think Different" About Encryption Opens a Backdoor to Your Private Life
https://github.com/WICG/floc explains the overall goals.
- Firefox Users Continue to Decrease Despite Proton Update
-
Amazon is blocking Google’s FLoC
It's pretty complicated and my understanding could be wrong and definitely not an expert. All the stupid CIA-style names that keep changing don't help. Turtledove, fledge, sparrow lol.
But from what I think I know that's kind of right technically, but kind of not in terms of actual real privacy.
Yes, the actual browsing data, e.g. for the basic floc cohorts only what amazon product page you visited, is no longer 'sent' to ad networks (that's a pretty big oversimplification of how ad networks track you but for brevity). That data is parsed in your browser to generate a cohort ID for you.
But this cohort ID is exposed to the world document.interestCohort() and is what's used for targeting and tracking.
To me it seems that the cohorts are so small "thousands of people" + IP or UA it's basically the same as a semi-long lasting uuid.
Here's an image from google's site.
https://web-dev.imgix.net/image/80mq7dk16vVEg8BBhsVe42n6zn82...
It also seems like Chrome/google might be still defaulting browser settings to give themselves even more data just like they currently do?
https://github.com/WICG/floc#qualifying-users-for-whom-a-coh...
BUT when you layer on the other proposals (Fledge/Turtledove/Dovekey or whatever) - which I don't understand that much maybe someone else can explain - it seems like it basically collect this page/product level data and makes it available to DSP etc for tracking/ad serving (again if not technically 1:1 basically in consequence given the sizes of these groups).
Like one of the proposals talks about a 'trusted' key/value server which doesn't seem that different from what already happens? The original proposal wanted to move the entire ad bid/target/serve process into the browser.
Yacy
- New ways we're tackling spammy, low-quality content on Search
- YaCy, a distributed Web Search Engine, based on a peer-to-peer network
-
New 60% of OpenAI model's responses contain plagiarism
It turns out you can make it all the way to become president of Harvard [1] while ignoring this rule so it is questionable whether it is as set in stone as you make it out to be, at least in certain disciplines.
In a way these models are a perfect mirror of the current academic climate. They plagiarise without remorse, they follow the latest identity-politics diktat to a point and make up 'facts' when needed to reach a desired narrative. Google Gemini is the latest example [2] of where this leads.
Given that it is plausible that models like these will soon be used in educational settings this is a recipe for disaster. The same goes for the trend to replace search engine results with 'interpreted' results in which LLMs take up the same role as Winston in 1984: Winston works in the Ministry of Truth where he alters historical records to fit the needs of the Party.
It is time for a decentralised distributed search engine which limits itself to pure search, something like YaCy [3]. Something to replace Winstonian search engines like Google and Bing (et al.).
[1] https://www.campusreform.org/article/claudine-gay-is-a-dei-h...
[2] https://news.ycombinator.com/item?id=39465255
[3] https://yacy.net/
-
Is Google Getting Worse? A Longitudinal Investigation of SEO Spam in Search [pdf]
> Now I just need some kind of open source search engine to run on it ...
Here you go: https://yacy.net
-
Welcome to mwmbl, the free, open-source and non-profit search engine
I remember https://yacy.net/ but the big problem of this project was java and had not implementations in others languages. I mean it as imagine torrent was only in perl.
-
admarus alternatives - ipfs-search and Yacy
3 projects | 9 Aug 2023
Admarus is similar as Yacy but aims to be distributed where Yacy is federated. Both are made for the web
- Brave Search launches own image and video search
-
Show HN: DiskerNet – Browse the Internet from Your Disk, Now Open Source
You should check out https://yacy.net: a global, P2P web search engine, where each peer can build and share its own index, etc.
-
How do you organize your data?
I also have an instance of Yacy installed, which I use to index the entire system, giving me my own private, internal search engine.
- Ask HN: Best search engine alternatives to Google?
What are some alternatives?
bypass-paywalls-chrome - Bypass Paywalls web browser extension for Chrome and Firefox.
Searx - Privacy-respecting metasearch engine
ungoogled-chromium-archlinux - Arch Linux packaging for ungoogled-chromium
MeiliSearch - A lightning-fast search API that fits effortlessly into your apps, websites, and workflow
uBlock - uBlock Origin - An efficient blocker for Chromium and Firefox. Fast and lean.
searxng - SearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither tracked nor profiled.
chromium - The official GitHub mirror of the Chromium source
Gigablast - Nov 20 2017 -- A distributed open source search engine and spider/crawler written in C/C++ for Linux on Intel/AMD. From gigablast dot com, which has binaries for download. See the README.md file at the very bottom of this page for instructions.
AmIUnique - Learn how identifiable you are on the Internet
Seeks - Seeks is a decentralized p2p websearch and collaborative tool.
brave-browser - Brave browser for Android, iOS, Linux, macOS, Windows.
Typesense - Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences