simdutf
cpr
simdutf | cpr | |
---|---|---|
11 | 22 | |
960 | 6,167 | |
4.8% | 1.0% | |
9.1 | 8.4 | |
3 days ago | 5 days ago | |
C++ | C++ | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
simdutf
- Glibc Buffer Overflow in Iconv
-
Vectorizing Unicode conversions on real RISC-V hardware
The project was mostly inspired by simdutf [0] which has been around for a couple of years already, and I don't think iconv has any of its vectorized implementations for other architectures.
[0] https://github.com/simdutf/simdutf
-
Cray-1 performance vs. modern CPUs
I'm actually doing something quite similar in my, in progress, unicode conversion routines.
For utf8 validation there is a clever algorithm that uses three 4-bit look-ups to detect utf8 errors: https://github.com/simdutf/simdutf/blob/master/src/icelake/i...
Aside on LMUL, if you haven't encountered it yet: rvv allows you to group vector registers when configuring the vector configuration with vsetvl such that vector instruction operate on multiple vector registers at once. That is, with LMUL=1 you have v0,v1...v31. With LMUL=2 you effectively have v0,v2,...v30, where each vector register is twice as large. with LMUL=4 v0,v4,...v28, with LMUL=8 v0,v8,...v24.
In my code, I happen to read the data with LMUL=2. The trivial implementation would just call vrgather.vv with LMUL=2, but since we only need a lookup table with 128 bits, LMUL=1 would be enough to store the lookup table (V requires a minimum VLEN of 128 bits).
So instead I do six LMUL=1 vrgather.vv's instead of three LMUL=2 vrgather.vv's because there is no lane crossing required and this will run faster in hardware: (see [0] for a relevant mico benchmark)
# codegen for equivalent of that function
-
What C++ library do you wish existed but hasn’t been created yet?
utf8 normalization, stemming, case insensitive comparison. https://github.com/unicode-rs example for rust What are options for C++? 1. translate to utf16 ( https://github.com/simdutf/simdutf ) and use icu -- slow 2. boost text, https://github.com/tzlaine/text , also slow (because the author doesn't care or couldn't care), we made a lot of patches to make our library faster than lucene, but still this part is slower than icu for utf16 (icu for utf16 also very slow...)
-
[Preprint] Transcoding Unicode Characters with AVX-512 Instructions
You can find the corresponding assembly code in this repository. The main branch only contains implementations based on C++ with intrinsics.
-
What's everyone working on this week (10/2023)?
The next big thing is making it LSP-compatible. All language servers must implement UTF-16 based character offsets, which is kinda unfortunate considering that files are much more likely to be stored in UTF-8 (I think?). I don't want to do the UTF-8 -> UTF-16 transcoding, so instead I'll use the excellent simdutf library to count how much code points a UTF-8 string would take if it was transcoded into UTF-16 — which is much faster than actual transcoding. So this is what I'm going to do this week — rewriting parsers to produce UTF-16 offsets + some final benchmarking. After that is done, I'll consider the "research" part of this project completed and will start writing an actual Markdown parser.
-
Why would a language not natively support SIMD?
You can find the assembly code here: https://github.com/simdutf/simdutf/tree/clausecker The corresponding C++ code is in the main branch.
- High speed Unicode routines using SIMD
-
text-2.0-rc1 with UTF8 underlying representation is available for testing!
Or via an ultrafast simdutf.
- Simdutf: Unicode validation and transcoding at billions of characters per second
cpr
-
What C++ library do you wish existed but hasn’t been created yet?
This one might fit the bill https://github.com/libcpr/cpr
-
[CMake] Can't include external header in .h file
cmake_minimum_required(VERSION 3.15) project(xrpc++ DESCRIPTION "C++ AT Protocol XRPC library" VERSION 1.0.0 LANGUAGES CXX) include(FetchContent) FetchContent_Declare(cpr GIT_REPOSITORY https://github.com/libcpr/cpr.git GIT_TAG 2553fc41450301cd09a9271c8d2c3e0cf3546b73) # The commit hash for 1.10.x. Replace with the latest from: https://github.com/libcpr/cpr/releases FetchContent_MakeAvailable(cpr) FetchContent_Declare(json URL https://github.com/nlohmann/json/releases/download/v3.11.2/json.tar.xz) FetchContent_MakeAvailable(json) add_library(${PROJECT_NAME} SHARED src/lexicon.cpp src/xrpc.cpp ) target_link_libraries(${PROJECT_NAME} PRIVATE cpr::cpr) target_link_libraries(${PROJECT_NAME} PRIVATE nlohmann_json::nlohmann_json) set_target_properties(${PROJECT_NAME} PROPERTIES VERSION ${PROJECT_VERSION}) set_target_properties(${PROJECT_NAME} PROPERTIES SOVERSION 1) target_include_directories(${PROJECT_NAME} PUBLIC include) set(CMAKE_BUILD_TYPE debug)
include(FetchContent) FetchContent_Declare(cpr GIT_REPOSITORY https://github.com/libcpr/cpr.git GIT_TAG 2553fc41450301cd09a9271c8d2c3e0cf3546b73) # The commit hash for 1.10.x. Replace with the latest from: https://github.com/libcpr/cpr/releases FetchContent_MakeAvailable(cpr)
-
How to convert libcurl to C++?
There is also the cpr package which should offer a more c++ focussed interface for curl.
-
Trying to use libcpr, linking errors - newbie...
So I'm very new to C++ and I'm trying to write a C++ version of a tool that I put together in Python. I'm trying to use libcpr for all my HTTP needs. I've spent the day trying to get it set up and working, but I'm getting a bunch of linking errors when I try to run. I really don't know if I did the building of it correctly, I'm trying to use Visual Studio Community 2022 and the Usage section of their docs talks about CMake and a couple package manager methods.
- Como são feitos os downloaders? (exemplos no texto)
-
Standardise a C++ build tool and package manager?
I think vcpkg manifests have solved a really key portion of the "please give me these libraries" problem. Couple lines in a json file, pass CMake to your vcpkg toolchain script path and triplet, and you're pretty much done with dependencies. I actually used it for a project with libcpr/cpr and a couple other popular libraries, and I was shocked at how painless it was to get up and running with some web request stuff.
-
What are some cool modern libraries you enjoy using?
Libraries like nlohmann's json, cpr, fmt are prime examples of what I'm seeking. Any suggestions?
-
I'm getting a 422 Validation Failed from Github API. Only when making a request with the Cpr library.
Basically specifying the language and the repo, and it does work when the request is made from postman or from the browser. However, when using https://github.com/libcpr/cpr, I'm getting the following response:
- how to make a C++ web scraper?
What are some alternatives?
simdutf8 - SIMD-accelerated UTF-8 validation for Rust.
libcurl - A command line tool and library for transferring data with URL syntax, supporting DICT, FILE, FTP, FTPS, GOPHER, GOPHERS, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, MQTT, POP3, POP3S, RTMP, RTMPS, RTSP, SCP, SFTP, SMB, SMBS, SMTP, SMTPS, TELNET, TFTP, WS and WSS. libcurl offers a myriad of powerful features
DirectXMath - DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps
C++ REST SDK - The C++ REST SDK is a Microsoft project for cloud-based client-server communication in native code using a modern asynchronous C++ API design. This project aims to help C++ developers connect to and interact with services.
simde - Implementations of SIMD instruction sets for systems which don't natively support them.
Boost.Beast - HTTP and WebSocket built on Boost.Asio in C++11
eve - Expressive Vector Engine - SIMD in C++ Goes Brrrr
cpp-httplib - A C++ header-only HTTP/HTTPS server and client library
Vc - SIMD Vector Classes for C++
curlpp - C++ wrapper around libcURL
simdjson - Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
POCO - The POCO C++ Libraries are powerful cross-platform C++ libraries for building network- and internet-based applications that run on desktop, server, mobile, IoT, and embedded systems.