cheatsheets
awesome-public-datasets
Our great sponsors
cheatsheets | awesome-public-datasets | |
---|---|---|
60 | 77 | |
5,596 | 58,391 | |
1.5% | 0.8% | |
7.6 | 5.1 | |
5 days ago | 11 days ago | |
TeX | ||
Creative Commons Attribution 4.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
cheatsheets
-
Tools a Data Scientist should know:
If you're an R user, stringr + its cheatsheet gets you very close to remembering what to do without needing to look further!
-
JSON to PDF Magic: Harnessing LaTeX and JSON for Effortless Customization and Dynamic PDF Generation
For more information on how to use ggplot2 and create charts consult the ggplot2 official page or the ggplot2 cheat graphic.
-
Best packages to learn?
I'd suggest you have a look at cheatsheets (or download them from GitHub) if you want to get to know your way around a package or set if functions, it saves you a lot of time.
-
How do I make these shapes (pictured below) in ggplot?
You could use geom_hline and geom_vline, geom_abline, or geom_segment for this. (The ggplot cheat sheet is very useful for answering these kinds of questions, BTW.)
-
Why does my scatter plot look like this?
I can't say for sure because I don't know what your ultimate aim is for your visualization. Check out the cheat sheet for ggplot2 here.
-
Import from Excel
Finally just do your analysis. You should also should give a try and see the cheat sheet for data importing on the tidyverse package.
-
[Request] How to best visualize percentages with R?
That said, when I’m trying to come up with an interesting way to visualize data, I find the ggplot cheat sheet very helpful: https://github.com/rstudio/cheatsheets/raw/main/data-visualization-2.1.pdf
-
Need help with variables
Here's a cheat sheet: https://github.com/rstudio/cheatsheets/blob/main/strings.pdf
-
Data manipulation in R
The cheat sheet of the stringr package should give you good overview of string manipulation/ regex in R.
-
I'm trying to recreate this plot but I keep failing
I would very highly recommend that rather than trying to get started by translating an existing graph, you check out some documentation about ggplot first. If nothing else, the ggplot cheat sheet from RStudio should help explain what the component parts of the code are, and that might help you figure out what you actually want to do.
awesome-public-datasets
-
How to practice data analytics skills
Merry Christmas buddy.
You'll find a ton of public datasets on GitHub [1].
Maven Analytics offers a monthly data analytics challenge [2] that you can enter for free. See their past competitions for some interesting datasets.
As I'm based in Ireland I'll also recommend the Irish Data Portal [3].
[1] https://github.com/awesomedata/awesome-public-datasets
- Are there people out there who still like Sam atlman - AI IS AT DANGER
-
The Data Engineering Docker-Compose Starter Kit
The “data.csv” file contains historical weather data from Tel Aviv, sourced from another article I wrote. If you wish, you can swap it with a public dataset, for example, from here.
-
suggestions for personal GitHub projects in economics/econometrics
If you want something personal and fun, you will need lots of different data. As such, you can take a look at these publicly available datasets. Maybe you can find out some interesting relationships.
-
Where do you get your data when you have an obscure idea for a dashboard?
Some others I use: https://data.world/search This github project with links: https://github.com/awesomedata/awesome-public-datasets Data.fivethirtyeight.com r/datasets and similar subreddits can also be of help.
-
Full Stack Data Science Project Ideas
There's a lot in here; it's a good start.
- Where to find big datasets?
- How do you keep track of useful datasets?
-
Complete: D214 - MSDA Capstone
Github: Awesome Public Datasets I didn't find much of use here for me, as much of this was either very specialized or very large datasets. But maybe you'll find something of use, here.
-
The internet has been “parametered”!
Of these thousands of open source data sets? https://github.com/awesomedata/awesome-public-datasets
What are some alternatives?
tidytuesday - Official repo for the #tidytuesday project
labelImg - LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check out Label Studio, the open source data labeling tool for images, text, hypertext, audio, video and time-series data.
forcats - 🐈🐈🐈🐈: tools for working with categorical variables (factors)
mostly-adequate-guide - Mostly adequate guide to FP (in javascript)
reddit-top-2.5-million - This is a dataset of the all-time top 1,000 posts, from the top 2,500 subreddits by subscribers, pulled from reddit between August 15–20, 2013.
ggplot2-book - ggplot2: elegant graphics for data analysis
tensorboard - TensorFlow's Visualization Toolkit
mech - 🦾 Main repository for the Mech programming language. Start here!
data - Data and code behind the articles and graphics at FiveThirtyEight
ggplot2 - An implementation of the Grammar of Graphics in R
zsv - zsv+lib: tabular data swiss-army knife CLI + world's fastest (simd) CSV parser