- Bot Eat Brain
- Posts
- Allen Institute unveils Dolma
Allen Institute unveils Dolma
PLUS: Code LlaMA
TOGETHER WITH
Good morning, human brains. Welcome back to your daily munch of AI news.
Here’s what’s on the menu today:
Ex-Googlers launch Sakana AI 🤖 🎏
Two OG’s join forces to create a more “natural” AI.
The largest open dataset for training LLMs 🌎 🧠
The Allen Institute’s Dolma, unveiled.
Meta plans to release Code LlaMA 🦙 🖥️
Looking like a Copilot counterpart from Meta.
APPETIZER
Fish in the machine 🤖 🎏
Sakana AI has launched in Tokyo. It’s an AI startup formed by Ex-Google AI experts Llion Jones and David Ha.
“Sakana“ comes from the Japanese word for “fish.”
Sushi, anyone?
Fresh catch:
1/ Llion is one of the authors behind the groundbreaking 2017 transformer paper, “Attention Is All You Need”. Yes, that Transformer, as in “Generative Pre-training Transformer.” 😉 He recently departed from Google.
2/ David is the former head of research at Stability AI (at this rate, Stability might make more money selling tea, jeez) and a former Google Brain researcher.
3/ The two started Sakana to create generative AI models for text, images, code, and multimedia.
4/ Based out of Tokyo, the team says the name is inspired from the way groups of fish behave as one to create collective intelligences.
5/ They claim current AI is too rigid. Their aim: incorporate natural elements (evolution, adaptivity, and responsiveness) into AI.
Our take: what’s more beautiful than two AI daddies coming together to raise another AI baby? Ok, the “child custody” behind AI is a little complicated with research teams, but you get the picture. Let’s see how Jones and Ha fare against the heavy hitters, we’re rooting for ‘em.
BUZZWORD OF THE DAY
Dataset
A structured collection of data used to train, validate, and test machine learning models, especially large language models (LLMs). They typically consist of text, images, or other discrete data (think lists or nested arrays) — and the larger and more variability within, the better.
FROM OUR PARTNERS
Looking for world-class AI & data science devs?
AE Studio is a development, data science, and design studio. They work with founders and executives to create custom software, machine learning, and BCI solutions.
Whether it’s spinning up an MVP, Enterprise Digital Transformation, or applying AI & ML to your business — AE’s blend of expertise and bleeding-edge pedigree means you’ll be working with the finest.
To be more specific, pedigree is like cutting your teeth in the AI crucibles of Stanford, MIT, Harvard, and Caltech.
And expertise, like producing ROI for Berkshire Hathaway, Point, EVgo, Protocol Labs, and Biocentury.
Lucky for you — AE Studio’s taking on new clients, for a limited time.
MAIN COURSE
Allen Institute releases Dolma 🌎 📜
The Allen Institute unveiled Dolma late last week. It’s a new open-source dataset for training LLMs.
Not the food.
It’s now the largest openly available dataset.
Dolma includes data from the internet, English books, scientific manuscripts, Wikipedia, and GitHub code repositories.
The stuffing:
1/ It contains 3 trillion tokens, which makes it the largest openly available dataset.
2/ It’s part of the OLMo project, which aims to build an open and transparent LLM.
3/ The Allen Institute claims Dolma’s principles are openness, representativeness, size, reproducibility, and risk mitigation.
4/ Its users must provide contact info, intent for use, and agree to its terms.
Our take: With so many privately controlled datasets, Dolma is really exciting news. Bigger doesn’t always mean better, but it seems like the Allen Institute is paying extra attention to the data quality.
Moar open-source LLMs!
A LITTLE SOMETHING EXTRA
Leaked: Meta’s new coding AI 🦙 🖥️
Meta’s planning on launching Code LlaMA. It’s a new model based on LlaMA-2.
<spits> codes </spits>
It could be released as early as this week.
Meta claims it’ll be open-source. Similar to GitHub’s Copilot, Code LlaMA will suggest code to developers in real time as they type.
Our take: Meta tends to bend the rules on what “open-source“ means. Will Code LlaMA be truly open-source? And how will it compete with the current coding-capable LLMs?
MEMES FOR DESSERT
YOUR DAILY MUNCH
Think Pieces
AI lists “Ottawa Food Bank“ as a tourist hot spot. Microsoft’s AI put it in the top 3 hottest destinations list. Uhm, that must be some good soup. 🤤 🍜
Does every company need a chief AI officer? Probably not. But does every Fortune 500 company need one? Probably yes.
Startup News
Arthur unveils open-source AI tool for businesses to find LLMs. The tool helps users find and effectively use AI models for specific needs.
Perplexity AI rebrands. Its new strategy is to be the whole world’s research assistant.
Elemental Cognition raises $50 million. Ex-IBM Watson team secures the bag.
Research
Paper for TeCH. A text-to-animation method that’s gained traction in the AI community. Also featured in our Tweet of the Day, below.
How GPT-4 Code Interpreter fares with math. How it synthesizes Python code to enhance reasoning and problem-solving.
Google and University of Michigan’s paper on personalization. Personalizing LLMs to fit users’ specific needs, inspired by multistaged writing education.
Tools
AE Studio [Sponsored] — experience unrivaled business growth with AE's world-class team. Efficient MVPs, innovative Enterprise Initiatives for your digital revolution, and ROI-driven AI/ML solutions.
Team Town [Sponsored] — your own in-house design team without those pesky design bottlenecks that slow you down. Trusted by Staples, Built, Jeep, and many others.
ChapaGPT [Sponsored] — turbocharge your Chrome browsing experience with the power of AI. Summarize content, understand context, create tailored prompts, and take your online experience to new heights.
TWEET OF THE DAY
A look at TeCH: a text-to-animation AI method. With an image as input, it creates 3D digital representations of clothed humans, including areas unseen in the original image.
Tag us on Twitter @BotEatBrain for a chance to be featured here tomorrow.
RECOMMENDED READING
🏥 Healthcare AI News — 5 minutes or less is all it takes to elevate your Healthcare knowledge with this expert-curated weekly AI newsletter. Stay informed and stay ahead.
👨 The Average Joe — Market insights, trends and analysis to help you become a better investor. We like their easy-to-read articles that cut right to the meaty bits.
💌 Marketing Letter — The newsletter keeping 30k+ marketers in the loop. Read by marketers who work at LinkedIn, Techcrunch, and Disney.
AI ART-SHOW
Until next time 🤖😋🧠
What'd you think of today's newsletter? |