OpenChem
pixel
OpenChem | pixel | |
---|---|---|
1 | 4 | |
660 | 323 | |
- | - | |
0.0 | 3.8 | |
6 months ago | 2 months ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
OpenChem
pixel
-
Image to Code?
https://arxiv.org/abs/2205.06175 - Gato from DeepMind https://arxiv.org/abs/2207.06991 - An interesting attempt to have a pixel based language model, which should be inherently multimodal
-
[D] Theoretically, could Computer Vision learn language?
PIXEL Language Modelling from Pixels: https://arxiv.org/abs/2207.06991
-
The Grind a Day: thousands of Apple II floppy disks archived
> LLMs do not do this, and if you're training the LLMs (at cost) to do this, you're already having to do the very same searching out of materials within the corpus related to what you want.
A more reasonable suggestion would be not training a LLM (which one doesn't want to do anyway) but treating it as a retrieval+summarization task: search the corpus for mentions and similar-by-embedding documents, and summarize. LLMs are good at abstractive summarization with minimal hallucination or error. This can serve as an 'annotated bibliography', a first pass for a human writing it themselves, or the collective summaries be fed into the LLM for a summary.
The main problem here is I guess that most of the relevant texts have poor or no OCR, so one can't do that in the first place. But there's a good chance that that will mostly stop being an issue in a few years as 'text' LLMs move to images (see eg PIXEL https://arxiv.org/abs/2207.06991 or Kosmos https://arxiv.org/abs/2302.14045 or https://arxiv.org/abs/2010.10648#google https://arxiv.org/abs/2012.14271 https://arxiv.org/abs/2209.14156 ) and they will either OCR, embed, or just process images of complex text directly. So, something to keep an eye on, perhaps: there's never going to be enough humans to do all this archiving properly, but perhaps there may eventually be enough GPUs to do it...
- [D] What is some recent ideas/papers that you find most interesting?
What are some alternatives?
DiffSBDD - A Euclidean diffusion model for structure-based drug design.
extreme-bert - ExtremeBERT is a toolkit that accelerates the pretraining of customized language models on customized datasets, described in the paper “ExtremeBERT: A Toolkit for Accelerating Pretraining of Customized BERT”.
NeuralCDE - Code for "Neural Controlled Differential Equations for Irregular Time Series" (Neurips 2020 Spotlight)
4cade - 100s of games at your fingertips, as long as your fingertips are on an Apple ][
chemicalx - A PyTorch and TorchDrug based deep learning library for drug pair scoring. (KDD 2022)
RATransformers - RATransformers 🐭- Make your transformer (like BERT, RoBERTa, GPT-2 and T5) Relation Aware!
Clairvoyante - Clairvoyante: a multi-task convolutional deep neural network for variant calling in Single Molecule Sequencing
primeqa - The prime repository for state-of-the-art Multilingual Question Answering research and development.
xyz2mol - Converts an xyz file to an RDKit mol object
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
neuralforecast - Scalable and user friendly neural :brain: forecasting algorithms.
datasets - 🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools