PushshiftDumps VS Nuitka

Compare PushshiftDumps vs Nuitka and see what are their differences.

PushshiftDumps

Example scripts for the pushshift dump files (by Watchful1)

Nuitka

Nuitka is a Python compiler written in Python. It's fully compatible with Python 2.6, 2.7, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 3.10, and 3.11. You feed it your Python app, it does a lot of clever things, and spits out an executable or extension module. (by Nuitka)
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
PushshiftDumps Nuitka
40 94
242 10,921
- 2.8%
8.1 10.0
17 days ago 2 days ago
Python Python
MIT License Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

PushshiftDumps

Posts with mentions or reviews of PushshiftDumps. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-06-15.
  • Pushshift Dumps Help: Only getting submissions, that are named comments
    1 project | /r/pushshift | 26 Nov 2023
    I am trying to get comments and submissions from specific subreddits. So far, I've run the u/watchful1 script combine_folder_mutipleprocess.py and have been able process a few files.
  • Create and Search In Your Own Reddit Database
    1 project | /r/pushshift | 3 Jul 2023
    FYI, you can use my filter_file.py script to directly extract out submissions with a certain title. There's a place you can put in a file with a list of keywords to filter on if you have a lot of them. Or it would be fairly easy to modify to use a regex. There are also steps listed to export the list of submission ids and then filter a comments file to only comments from those submissions. You can also export directly to CSV, though you would want to use zst files for any intermediate steps. Let me know if anything in there doesn't work.
  • Reddit starting to bring back deleted comments.
    2 projects | /r/RedditAlternatives | 15 Jun 2023
    This repo has good examples of scripts to use them, https://github.com/Watchful1/PushshiftDumps
  • Encountered a non-utf8 character
    2 projects | /r/pushshift | 14 Jun 2023
    def read_redditfile(file: str) -> dict: """ Iterate over the pushshift JSON lines, yielding them as Python dicts. Decompress iteratively if necessary. """ # older files in the dataset are uncompressed while newer ones use zstd compression and have .xz, .bz2, or .zst endings if not file.endswith('.bz2') and not file.endswith('.xz') and not file.endswith('.zst'): with open(file, 'r', encoding='utf-8') as infile: for line in infile: l = json.loads(line) yield(l) else: # code by Watchful1 written for the Pushshift offline dataset, found here: https://github.com/Watchful1/PushshiftDumps with open(file, 'rb') as fh: dctx = ZstdDecompressor(max_window_size=2147483648) with dctx.stream_reader(fh) as reader: previous_line = "" while True: chunk = reader.read(2**24) # 16mb chunks if not chunk: break string_data = chunk.decode('utf-8') lines = string_data.split("\n") for i, line in enumerate(lines[:-1]): if i == 0: line = previous_line + line comment = json.loads(line) yield comment previous_line = lines[-1]
  • What to do after decompressing the files from academic torrents?
    2 projects | /r/pushshift | 11 Jun 2023
    Just look a folder down in the github repo https://github.com/Watchful1/PushshiftDumps/tree/master/scripts the scripts are still there.
  • What are you using to browse/self host downloaded reddit?
    4 projects | /r/DataHoarder | 7 Jun 2023
    I am working with the ZST files downloaded from Pushshift and sorted into subreddits by the lovely u/watchful1 here. ZST is too compressed to browse on its own but using scripts like this one you can process them into readable NDJSON files. From there im not sure what to do. I would like to have a self hosted reddit-clone that i can import these dumps into and browse freely.
  • Tell HN: My Reddit account was banned after adding my subs to the protest
    5 projects | news.ycombinator.com | 4 Jun 2023
    The whole reddit (posts and comments separately) from 2005-06 until 2022-12 is on this [1] torrent link, it's very easy to download, extract and use the data [2]. I'm writing my thesis about the connection between the reddit post's type and the comment structure, and I've been working with this data, for a few months, it's amazing.

    [1] https://academictorrents.com/details/7c0645c94321311bb05bd87...

    [2] https://github.com/Watchful1/PushshiftDumps

  • Reddit, API calls, and AI - Who does your knowledge belong to?
    1 project | /r/singularity | 2 Jun 2023
    Sure! You can download the compressed data from this torrent, then you can use this project if you want to just decompress and process the data.
  • Script to find overlapping users between subreddits from dump files
    2 projects | /r/pushshift | 25 May 2023
    You can go through the process outlined in that thread to download the subreddit's you're interested in, then add them at the top of the new script, run it and it will output the list of overlapping users. It will actually likely be faster than the old script even counting download times for the dumps since the api was so slow. Though you are limited to the available 20k subreddits.
  • This Reddit Community Has Been Archived
    5 projects | /r/DataHoarder | 3 May 2023
    how I read the file? First I got tried to extrat the file ok I got it, but them I text file I can't read that., I saw a few people saing it was just a json file I tried with a json reader but it say the json data is invalid, them I tried this program but nothing happens no new file is created or something, here a print, maybe I'm doing something wrong but I don't know because the script don't have any instruction how to use it!

Nuitka

Posts with mentions or reviews of Nuitka. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-22.
  • Py2wasm – A Python to WASM Compiler
    4 projects | news.ycombinator.com | 22 Apr 2024
    Thanks for the feedback! I'm Syrus, main author of the work on py2wasm.

    We already opened a PR into Nuitka to bring the relevant changes upstream: https://github.com/Nuitka/Nuitka/pull/2814

    We envision py2wasm being a thin layer on top of Nuitka, as also commented in the article.

    From what we gathered, we believe that there's usefulness on having py2wasm as a separate package, as py2wasm would also need to ship the precompiled Python distribution (3.11) for WASI (which will not be needed for the other Nuitka use cases), apart of also shipping other tools that are not directly relevant for Nuitka

  • Python Is Portable
    6 projects | news.ycombinator.com | 15 Apr 2024
    This is a good place to mention https://nuitka.net/ which aims to compile python programs into standalone binaries.
  • We are under DDoS attack and we do nothing
    2 projects | news.ycombinator.com | 30 Mar 2024
    For Python, you could make a proper deployment binary using Nuitka (in standalone mode – avoid onefile mode for this). I'm not pretending it's as easy as building a Go executable: you may have to do some manual hacking for more unusual unusual packages, and I don't think you can cross compile. I think a key element you're getting at is that Go executables have very few dependencies on OS packages, but with Python (once you've sorted the actual Python dependencies) you only need the packages used for manylinux [2], which is not too onerous.

    [1] https://nuitka.net/

    [2] https://peps.python.org/pep-0599/#the-manylinux2014-policy

  • Faster Blogging: A Developer's Dream Setup
    4 projects | dev.to | 22 Feb 2024
    glee is rich in blogging features but has some drawbacks. One of the main drawbacks is its compatibility with multiple operating systems and system architectures. We lost one potential customer due to glee incompatibility in macOS. Another major issue is the deployment time. We built the first version of glee entirely in Python and used nuitka, nuitka compiles Python programs into a single executable binary file. We need to create three separate stages for creating executable binaries for Windows, Mac, and Linux in deployment, and it takes around 20 minutes to complete.
  • Python 3.13 Gets a JIT
    11 projects | news.ycombinator.com | 9 Jan 2024
    There is already an AOT compiler for Python: Nuitka[0]. But I don't think it's much faster.

    And then there is mypyc[1] which uses mypy's static type annotations but is only slightly faster.

    And various other compilers like Numba and Cython that work with specialized dialects of Python to achieve better results, but then it's not quite Python anymore.

    [0] https://nuitka.net/

    [1] https://github.com/python/mypy/tree/master/mypyc

  • Briefcase: Convert a Python project into a standalone native application
    4 projects | news.ycombinator.com | 3 Aug 2023
    Nuitka deals pretty well with those in general: https://nuitka.net/
  • Ask HN: How does Nuitka (Python compiler) work?
    1 project | news.ycombinator.com | 22 Jul 2023
    Hi HN,

    Has anyone explored Nuitka [1] and developed understanding from a blank slate?

    Is there any toy version of this, so that one can start playing with the language translation concepts?

    Is there any underlying theory/inspiration upon which this project is built?

    Are there any similar projects, in say other languages?

    [1] https://github.com/Nuitka/Nuitka

  • Why not tell people to “simply” use pyenv, poetry or anaconda
    7 projects | news.ycombinator.com | 13 Jun 2023
    That's more of cultural problem in the Python community.

    If I provide an end user software to my client written an Python (so not a backend, not a lib...), I will compile it with nuitka (https://github.com/Nuitka/Nuitka) and hide the stack trace (https://www.bitecode.dev/p/why-and-how-to-hide-the-python-st...) to provide a stand alone executable.

    This means the users don't have to know it's made with Python or install anything, and it just works.

    However, Python is not like Go or Rust, and providing such an installer requires more than work, so a huge part of the user base (which have a lot of non professional coders) don't have the skill, time or resources to do it.

    And few people make the promotion of it.

    I should write an article on that because really, nobody wants to setup python just to use a tool.

  • Python cruising on back of c++
    3 projects | /r/ProgrammerHumor | 18 May 2023
  • Is cython a safe option for obfuscate a python project?
    1 project | /r/learnpython | 13 May 2023
    As for a simpler option, you could use a "compiler": https://github.com/Nuitka/Nuitka

What are some alternatives?

When comparing PushshiftDumps and Nuitka you can also consider the following projects:

Sketchpad

PyInstaller - Freeze (package) Python programs into stand-alone executables

Pushshift-Importer

pyarmor - A tool used to obfuscate python scripts, bind obfuscated scripts to fixed machine or expire obfuscated scripts.

RedditLemmyImporter - 🔥 Anti-Reddit Aktion 🔥

PyOxidizer - A modern Python application packaging and distribution tool

zreader - Read compressed NDJSON .zst files easily

py2exe - modified py2exe to support unicode paths

7-Zip-zstd - 7-Zip with support for Brotli, Fast-LZMA2, Lizard, LZ4, LZ5 and Zstandard

false-positive-malware-reporting - Trying to release your software sucks, mostly because of antivirus false positives. I don't have an answer, but I do have a list of links to help get your code whitelisted.

reddit-project-public

py2app