rmarkdown VS List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words

Compare rmarkdown vs List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words and see what are their differences.

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
rmarkdown List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words
38 25
2,805 2,765
0.8% 1.2%
7.4 0.0
8 days ago 2 months ago
R
GNU General Public License v3.0 only Creative Commons Attribution 4.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

rmarkdown

Posts with mentions or reviews of rmarkdown. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-01-28.
  • Pandoc
    17 projects | news.ycombinator.com | 28 Jan 2024
    I'm surprised to see no one has pointed out [RMarkdown + RStudio](https://rmarkdown.rstudio.com) as one way to immediately interface with Pandoc.

    I used to write papers and slides in LaTeX (using vim, because who needs render previews), then eventually switched to Pandoc (also vim). I eventually discovered RMarkdown+RStudio. I was looking for a nice way to format a simple table and discovered that rmarkdown had nice extensions of basic markdown (this was many years ago so maybe that is incorporated into vanilla markdown/pandoc).

    The RMarkdown page claims:

    > R Markdown supports dozens of static and dynamic output formats including HTML, PDF, MS Word, Beamer, HTML5 slides, Tufte-style handouts, books, dashboards, shiny applications, scientific articles, websites, and more.

    ...which I think is largely due to using pandoc as the core generator.

    RStudio shows you the pandoc command it runs to generate your document, which I've used to figure out the pandoc command I want to run when I've switched to using pandoc directly.

    This is a bit of a "lazy" way to interact with pandoc. Maybe the "laziest" aspect: when I get a new computer, I can install the entire stack by installing Rstudio, then opening a new rmarkdown document. Rstudio asks whether I'd like to install all the necessary libraries -- click "yes" and that's it. Maybe that sounds silly but it used to be a lot of work to manage your LaTeX install. These days I greatly favor things that save me time, which seems to get more precious every year.

  • 2023 Lookback
    1 project | dev.to | 26 Jan 2024
    Then, I worked on a Shiny project where I had to learn R Markdown. I was very excited about it because being paid to learn a new technology is something I have always preferred. I also worked with Highcharts graphs, which I didn’t do for years. It was also the first time I was being paid to design something. I didn’t enjoy that part as much as development, but I cannot say it was a bother either.
  • Why won't my boxplot knit?
    1 project | /r/u_Mundane-Balance-3358 | 8 Nov 2023
    files/figure-latex/unnamed-chunk-2-1.pdf) Try to find the following text in midterm-question.Rmd: ![](midterm-question_ You may need to add $ $ around a certain inline R expression `r ` in midterm-question.Rmd (see the above hint). See https://github.com/rstudio/rmarkdown/issues/385 for more info.
  • new learner to R .. need help
    1 project | /r/RStudio | 16 Jun 2023
  • We’re Washington Post reporters who analyzed Google’s C4 data set to see which websites AI uses to make itself sound smarter. Ask us Anything!
    4 projects | /r/IAmA | 16 May 2023
    We used R Markdown for cleaning and analysis, creating updateable web pages we could share with everyone involved. Similarweb’s categories were useful, but too niche for us. So we spent a lot of time recategorizing and redefining the groupings. We used the token count for each website — how many words or phrases — to measure it’s importance in the overall training data.
  • Possible to include inline code in a math equation in Org mode?
    1 project | /r/orgmode | 6 May 2023
    In [R Markdown](https://rmarkdown.rstudio.com/) or [Quarto](https://quarto.org/), I can include inline code in a math equation, e.g.,
  • I have to somehow convert this chart into an html file into a file that opens like a website any ideas?
    1 project | /r/RStudio | 5 Mar 2023
    you probably want an rmd file with html output
  • Seeking some markdown help - please redirect me elsewhere if this doesn't belong here
    1 project | /r/rstats | 21 Dec 2022
    GitHub issue code folding
  • Generating PDF 📄 with Python 🐍
    3 projects | /r/learnpython | 15 Dec 2022
    R Markdown / Quarto https://quarto.org/ https://rmarkdown.rstudio.com/ ; can dynamically generate a document and compile it to HTML, PDF, others
  • PYTHON CHARTS: the Python data visualization site with more than 500 different charts with reproducible code and color tools
    3 projects | /r/Python | 18 Oct 2022
    Hi! At this moment I'm not opening the source code, but I can explain you the tech used. This site is based on another site I created before named https://r-charts.com/ and it was created with blogdown (HUGO + R Markdown). Hence, each tutorials is an R markdown file. For PYTHON CHARTS, in order to run Python within an R markdown file I had to use an R package named reticulate. In addition, the template depends on shuffle.js for filtering and fuse.js for searching

List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words

Posts with mentions or reviews of List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-04.
  • Ask HN: List of Subdomains to Reserve
    4 projects | news.ycombinator.com | 4 Mar 2024
    Good point. I am already checking against the naughty-words list from here:

    https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and...

  • Where is the banned word list so I can integrate it?
    1 project | /r/ecommerce | 27 Jun 2023
    https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words is one
  • We’re Washington Post reporters who analyzed Google’s C4 data set to see which websites AI uses to make itself sound smarter. Ask us Anything!
    4 projects | /r/IAmA | 16 May 2023
    We know that C4 was used to train Google’s influential T5 model, Facebook’s LLaMA, as well as the open source model Red Pajama. C4 is a very cleaned-up version of a scrape of the internet from the non-profit CommonCrawl taken in 2019. OpenAI’s model GPT-3 used a training dataset that began with 41 scrapes of the web from CommonCrawl from 2016 to 2019 so I think it’s safe to say that something akin to C4 was part of GPT-3. (The researchers who originally looked into C4 argue that these issues are common to all web-scraped datasets.) When we reached out to OpenAI and Google for comment, both companies emphasized that they undergo extensive efforts to weed out potentially problematic data from their training sets. But within the industry, C4 is known as being a heavily filtered dataset and has been criticized, in fact, for eliminating content related to LGBTQ+ identities because of its reliance on a heavy-handed blocklist. (https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words ) We are working on some reporting to try to address your last and very crucial question, but it’s an open area of research and one that even AI developers are struggling to answer.
  • TIL there's an official list of profanities ChatGPT is trained to avoid
    1 project | /r/todayilearned | 20 Apr 2023
  • Microsoft's paper on OpenAI's GPT-4 had hidden information
    3 projects | news.ycombinator.com | 23 Mar 2023
    "The Colossal Clean Crawled Corpus, used to train a trillion parameter LM in , is cleaned, inter alia, by discarding any page containing one of a list of about 400 “Dirty, Naughty, Obscene or Otherwise Bad Words”. This list is overwhelmingly words related to sex, with a handful of racial slurs and words related to white supremacy (e.g. swastika, white power) included. While possibly effective at removing documents containing pornography (and the associated problematic stereotypes encoded in the language of such sites) and certain kinds of hate speech, this approach will also undoubtedly attenuate, by suppressing such words as twink, the influence of online spaces built by and for LGBTQ people. If we filter out the discourse of marginalized populations, we fail to provide training data that reclaims slurs and otherwise describes marginalized identities in a positive light"

    from "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? " https://dl.acm.org/doi/10.1145/3442188.3445922

    That list of words is https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and...

  • Rule
    1 project | /r/196 | 17 Mar 2023
    Yeah, This is shutterstocks one which they shared
  • If I made a game with a chatroom, what curses and slurs would I ban?
    1 project | /r/gamedev | 3 Mar 2023
    I always turn off the chatfilter, so defo let them choose if they want to have it censored or not. For the actual words themselves, there are plenty of lists out there that you can use (like this one). Although these are just regular words, none of the circumvention methods are included
  • Emad announces a new Stability lab with a new soon model. It looks like a Dall-e 2 style AI to me. Maybe it is our open source Dall-e 2, like KARLO. The images are very interesting. According to Emad "Soon".
    1 project | /r/StableDiffusion | 5 Jan 2023
    That it's very crudely filtered for naughty words. According to the paper, "We removed any page that contained any word on the “List of Dirty, Naughty, Obscene or Otherwise Bad Words”." That list is here. While it contains a lot of unquestionably ugly words, it also contains words like "tit".
  • I made a Stable Diffusion for Anime app in your Pocket! Running 100% offline on your Apple Devices (iPhone, iPad, Mac)
    4 projects | /r/StableDiffusion | 26 Nov 2022
    No problem! I wrote a short json file and Swift script to remove the nsfw words from the prompt during the image generation process, therefore it's not based on the negative prompt. The json file is a txt full with nsfw words so the app can check and remove unwanted prompts, e.g.: https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words
  • Lewdle - A daily lewd word game
    1 project | /r/wordle | 27 Jan 2022
    This is the closest I’ve come to finding one. It’s not that great.

What are some alternatives?

When comparing rmarkdown and List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words you can also consider the following projects:

Pluto.jl - 🎈 Simple reactive notebooks for Julia

google-profanity-words - Full list of bad words and top swear words banned by Google.

jupytext - Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts

List-of-Dirty-Naughty-Obscene-and

here_here - I love the here package. Here's why.

git-crypt - Transparent file encryption in git

tinytex - A lightweight, cross-platform, portable, and easy-to-maintain LaTeX distribution based on TeX Live

following-instructions-human-feedback

TikZ - Complete collection of my PGF/TikZ figures.

Hashids.java - Hashids algorithm v1.0.0 implementation in Java

blogdown - Create Blogs and Websites with R Markdown

RedPajama-Data - The RedPajama-Data repository contains code for preparing large datasets for training large language models.