Twitter Text Obj
base2048
Twitter Text Obj | base2048 | |
---|---|---|
9 | 18 | |
3,056 | 821 | |
0.2% | - | |
0.0 | 4.0 | |
7 days ago | 3 months ago | |
HTML | JavaScript | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Twitter Text Obj
-
Why is GPT-3 15.77x more expensive for certain languages?
I recall that Twitter allowed 140 Chinese characters in tweets originally, but when they switched to 280 ascii characters, CJK languages were not included. The current documentation does say that some languages and emojis use 2 characters per symbol of the 280 limit, limiting those languages to 140. https://developer.twitter.com/en/docs/counting-characters
-
Mini Musk in making
I assume you we’re going for half of the (new) character limit? That wouldn’t be bad as a rough guess, except Twitter uses UTF-8 encoding and what counts as a character is a bit complicated. Certain objects like usernames in a reply or image URLs hosted on Twitter are not counted, either. The maximum size of a modern tweet could therefore be upwards of 280 x 4 = 1,120 bytes.
-
A rust crate that lets you compress ASCII text to a single Unicode "character"
Given the examples in this article it seems like it could potentially be used for this!
-
Tweet-counter: A module to calculate the length of a tweet
It turns out, working this out is non-trivial, as Twitter has a few rules around how it count's characters. These are basically:
-
[DISC] The Tsunderedere Girl Getting More and More Dere Day by Day | Day - 13 by @yakitomahawk & @kota2comic
Japanese, Korean and Chinese were excluded from that increased cap, because they already had a significant advantage in being able to put more information into a single character. (More specifically, it's implemented such that ideograms count as 2 characters now, and emoji do, too.)
- Guinness World Regex
-
TIL the assumption that string length does not change when upper-cased is false
The 280 character limit in a tweet isn't equal to the number of glyphs in a tweet.
https://developer.twitter.com/en/docs/counting-characters
- Tech skill shortage
base2048
- How does Base32 (or any Base2^n) work exactly?
- Show HN: Host a Website in the URL
-
What digit bases do you like?
qntm did a fun project of using larger bases, constrained to subsets of unicode instead of ASCII like base64. It's specifically for social channels where you're constrained by the number of code points, but not bytes, so you want to maximize data per code point. base2048 is pretty impressive, and base32768 is just absurd.
-
Android 15's dessert name is "Vanilla Ice Cream"
Numbers don't have to loop until 32k or 2048 (or even up to 1,112,064, until such point the Unicode standard allows for more)
-
Twitter's anti-Mastodon filter evasion
On a semi-related note, they mention base64 encoding messages to evade filters. There were actually other base{n} methods [1] created specifically for Twitter to be more space optimized though not as readily available to operating systems. I guess this is less useful if they are really expanding the text limit to 4k soon but figured I would add it in the event they add a parser for base64.
[1] - https://github.com/qntm/base2048
-
A rust crate that lets you compress ASCII text to a single Unicode "character"
Actually, in the case of twitter they do some weird counting. It is mostly based on codepoints, true, but some codepoints are considered "heavy" and are counted twice, see https://github.com/qntm/base2048
-
New Twitter TOS
I know of one case of Twitter doing client side validation [1]. Maybe there are more?
1 - https://github.com/qntm/base2048#note
- Base 2048
-
Hacker News top posts: May 7, 2022
Base 2048\ (15 comments)
What are some alternatives?
MarkdownTextView - Rich Markdown editing control for iOS
ecoji - Encodes (and decodes) data as emojis
YYText - Powerful text framework for iOS to display and edit rich text.
Base256 - Encode and decode data in base 256 easily typed words
Iconic - :art: Auto-generated icon font library for iOS, watchOS and tvOS
hatetris - Tetris which always gives you the worst piece
DTCoreText - Methods to allow using HTML code with CoreText
Assemblies-of-putative-SARS-CoV2-spike-encoding-mRNA-sequences-for-vaccines-BNT-162b2-and-mRNA-1273 - RNA vaccines have become a key tool in moving forward through the challenges raised both in the current pandemic and in numerous other public health and medical challenges. With the rollout of vaccines for COVID-19, these synthetic mRNAs have become broadly distributed RNA species in numerous human populations. Despite their ubiquity, sequences are not always available for such RNAs. Standard methods facilitate such sequencing. In this note, we provide experimental sequence information for the RNA components of the initial Moderna (https://pubmed.ncbi.nlm.nih.gov/32756549/) and Pfizer/BioNTech (https://pubmed.ncbi.nlm.nih.gov/33301246/) COVID-19 vaccines, allowing a working assembly of the former and a confirmation of previously reported sequence information for the latter RNA. Sharing of sequence information for broadly used therapeutics has the benefit of allowing any researchers or clinicians using sequencing approaches to rapidly identify such sequences as therapeutic-derived
Atributika - Convert text with HTML tags, links, hashtags, mentions into NSAttributedString. Make them clickable with UILabel drop-in replacement.
TatSu - 竜 TatSu generates Python parsers from grammars in a variation of EBNF
PhoneNumberKit - A Swift framework for parsing, formatting and validating international phone numbers. Inspired by Google's libphonenumber.
DumbIdeas