

This is only amplified by LLMs


This is only amplified by LLMs


Google Search has long since diminished from it’s peak. 15 years ago we used to play a “game” by describing something as vaguely or strangely as possible and would be entertained to find it still on the first page of search results.
SEO and paid ranking caused a steady decline in Google search performance, and LLMs are the kill shot.
Indexing and ranking were already a solved problem, and conventional algorithms accomplished it much better than LLMs do. Unfortunately, LLMs need to find uses to justify investment, so now they must be used for any task that they can be used for, even if its worse than the solutions that we already had.


How about literally anyone competent? Why is this always framed as a choice between the lesser of two bads? Of course Biden is not as bad as Trump, but no, I’m not “longing” for Biden.


Americans longing for Sleepy Joe Biden
Doubt.


It’s true and a real thing, but its also just a BS excuse to push for removal of body cameras.


The most important question to ask when evaluating end-to-end encryption: who manages the keys?
If Facebook manages all of the keys and is responsible for telling which public key belongs to who, then of course Facebook can read every message.


Quote me in full.
Okay!

You can run at scale, on huawei. You can also run it on a cpu
Yeah, that is absolutely not what you argued.
Anyway, you’ve conceded that I’m correct that you cannot run it at scale on a CPU, because running on CPU is too slow and inefficient, and that they instead use GPU hardware like Huawei GPUs to run the model at scale. That’s good enough for me!


Yes, you can run it at scale.
at scale
Shift those goalposts! We went from “at scale” to “it still runs”


You’ve proved my point that you don’t know what you’re talking about by blindly linking to the git repo. Couldn’t find any source that supports your claim? I wonder why.
Sure you can serve one request at a time to one patient user at a slow token per second rate, which makes running locally viable, but there is no RAM that has the bandwidth to run this model at scale. Even flash would be incredibly slow on CPU with multiple requests. You’d need the high bandwidth of VRAM and to run across multiple GPUs in a scalable way, it requires extremely high bandwidth interconnects between GPUs.


Nope! You don’t know what you’re talking about. At all. But you can have fun running a 1.6 trillion parameter model on CPU at basically 0 tokens per second at scale, MoE or not.


I mean, sure. You could also run it by drawing marks in sand. It doesn’t make any sense to do either, though.
Accidental fentanyl exposure via the air or skin contact is medically and scientifically impossible.
3 people may have died but a fairytale story is not what killed them.