dcs
rum
dcs | rum | |
---|---|---|
3 | 11 | |
222 | 795 | |
2.3% | 1.5% | |
5.1 | 6.6 | |
23 days ago | 2 months ago | |
Go | C | |
GNU General Public License v3.0 or later | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dcs
- Code Search Is Hard
-
Sourcegraph is no longer Open Source
What is a good open-source system for code search if I want to plug 100 or so git repos into it and have it available over the web? GH search is not desirable because it would search too broadly and would not cover repos on Gitlab etc.
I looked at the Debian code search [1] in the past, but for some reason thought it required a bit too much effort and didn't complete my investigation of it. Though [2] looks pretty approachable.
Sourcegraph mentioned Zoekt [3], but I am not sure how usable it is. If it was pretty good, why did Sourcegraph OSS exist?
Finally, from all the discussion how Sourcegraph OSS was very behind in the past few years, I guess there is no serious plan to fork it?
[1]: https://github.com/Debian/dcs
[2]: https://github.com/Debian/dcs/blob/main/howto/building.md
[3]: https://github.com/sourcegraph/zoekt
- Building a custom code search index in Go for searchcode.com
rum
-
Code Search Is Hard
the rum index has worked well for us on roughly 1TB of pdfs. written by postgrespro, same folks who wrote core text search and json indexing. not sure why rum not in core. we have no problems.
https://github.com/postgrespro/rum
-
Is it worth using Postgres' builtin full-text search or should I go straight to Elastic?
If you need ranking, and you have the possibility to install PostgreSQL extensions, then you can consider an extension providing RUM indexes: https://github.com/postgrespro/rum. Otherwise, you'll have to use an "external" FTS engine like ElasticSearch.
-
Features I'd Like in PostgreSQL
>Reduce the memory usage of prepared queries
Yes query plan reuse like every other db, this still blows me away PG replans every time unless you explicitly prepare and that's still per connection.
Better full-text scoring is one for me that's missing in that list, TF/IDF or BM25 please see: https://github.com/postgrespro/rum
-
Ask HN: Books about full text search
for postgres, i highly recommend the rum index over the core fts. rum is written by postgrespro, who also wrote core fts and json indexing in pg.
https://github.com/postgrespro/rum
-
Postgres Full Text Search vs. the Rest
My experience with Postgres FTS (did a comparison with Elastic a couple years back), is that filtering works fine and is speedy enough, but ranking crumbles when the resulting set is large.
If you have a large-ish data set with lots of similar data (4M addresses and location names was the test case), Postgres FTS just doesn't perform.
There is no index that helps scoring results. You would have to install an extension like RUM index (https://github.com/postgrespro/rum) to improve this, which may or may not be an option (often not if you use managed databases).
If you want a best of both worlds, one could investigate this extensions (again, often not an option for managed databases): https://github.com/matthewfranglen/postgres-elasticsearch-fd...
Either way, writing something that indexes your postgres database into elastic/opensearch is a one time investment that usually pays off in the long run.
-
Postgres Full-Text Search: A Search Engine in a Database
Mandatory mention of the RUM extension (https://github.com/postgrespro/rum) if this caught your eye. Lots of tutorials and conference presentations out there showcasing the advantages in terms of ranking, timestamps...
You might be just fine adding an unindexed tsvector column, since you've already filtered down the results.
The GIN indexes for FTS don't really work in conjunction with other indices, which is why https://github.com/postgrespro/rum exists. Luckily, it sounds like you can use your existing indices to filter and let postgres scan for matches on the tsvector.
- Postgrespro/rum: RUM access method – inverted index with additional information
- Debugging random slow writes in PostgreSQL
-
Show HN: Full text search Project Gutenberg (60m paragraphs)
I suggest to have a look at https://github.com/postgrespro/rum if you haven’t yet. It solves the issue of slow ranking in PostgreSQL FTS.
What are some alternatives?
git-peek - git repo to local editor instantly
postgres-elasticsearch-fdw - Postgres to Elastic Search Foreign Data Wrapper
repo
ora2pg - Ora2Pg is a free tool used to migrate an Oracle database to a PostgreSQL compatible schema. It connects your Oracle database, scan it automatically and extracts its structure or data, it then generates SQL scripts that you can load into PostgreSQL.
sourcegraph-release-train - Sourcegraph Opensource build
zombodb - Making Postgres and Elasticsearch work together like it's 2023