Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
It's true that their packages are heavy on dependencies, and if that is a concern, you have alternatives: - poorman: no dependencies, same syntax as dplyr, but only includes basic verbs. - datawizard: low dependencies, slightly different syntax, has base-R implementations of most of dplyr / tidyr functions, plus some other goodies likes scaling, mean-centering, rank transforming, ... - And of course, data.table: 0 dependencies, ultra-fast (everything is written in optimized C under the hood), can manipulate much bigger data than the Tidyverse, and can do everything the tidyverse can when it comes to data wrangling (however, sometimes the tidyverse has convenience functions that make some operations shorter than with data.table). The downside is that data.table's syntax requires more efforts to learn / is less intuitive to read for neophytes.
Perl is fast for regex matching, but there is more to processing text than just regex and with parLapply you can parallelize the processing. You can also parallelize re2 and basically destroy Perl if your regex contains |.