-
SurveyJS
Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
There is this for openAI: https://github.com/cogentapps/chat-with-gpt/blob/main/app/sr....
Not completely sure, but I think it will likely work as it is for llama.
https://platform.openai.com/tokenizer or the official python library tiktoken https://github.com/openai/tiktoken or this JS port of tiktoken https://github.com/dqbd/tiktoken
https://platform.openai.com/tokenizer or the official python library tiktoken https://github.com/openai/tiktoken or this JS port of tiktoken https://github.com/dqbd/tiktoken
Tokenizers seem to be a massive pain in the neck if you are just calling into an API to use your model. The algorithm itself is non-trivial, and they need pretty sizable data to function: the vocabulary and the merges, which just sit there, using memory. I'm writing https://github.com/ryszard/agency in Go, and while there's a good library for the OpenAI tokenization, if you want a tokenizer for the HF models the best I found was a library calling HF's Rust implementation, which makes it horrible for distribution.
However, at some point I realized that I needed not really the tokens, but the token count, as my most important use was implementing a Token Buffer Memory (trim messages from the beginning in such a way that you never exceed a context size number of tokens). And in order to do that I don't need it to be exactly right, just mostly right, if I am ok with slightly suboptimal efficiency (keeping slightly less tokens than the model supports). So, I took files from Project Gutenberg, and compared the ratio of tokens I get using a proper tokenizer and just calling `strings.Split`, and it seems to be remarkably stable for a given model and language (multiply the length of the result of splitting on spaces by 1.55 for OpenAI and 1.7 for Claude, which leaves a tiny safety margin).
I'm not throwing shade at this project β just being able to call the tokenizer would've saved me a lot of time. But I hope that if I'm wrong about the estimates bring good enough some good person will point out the error of my ways :)
Yes, this one does
https://github.com/functorism/gpt4-tokenizer-visualizer
Related posts
-
Top Open Source Prompt Engineering Guides & Toolsπ§ποΈπ
-
Ask HN: What's the best charting library for customer-facing dashboards?
-
Digitized Continuous Magnetic Recordings for the 1859 Carrington Event
-
Show HN: LLaMA 3 tokenizer runs in the browser
-
Show HN: Minard β Generate beautiful charts with natural language