• 0 Posts
  • 12 Comments
Joined 2 years ago
cake
Cake day: June 13th, 2024

help-circle


  • Google Search has long since diminished from it’s peak. 15 years ago we used to play a “game” by describing something as vaguely or strangely as possible and would be entertained to find it still on the first page of search results.

    SEO and paid ranking caused a steady decline in Google search performance, and LLMs are the kill shot.

    Indexing and ranking were already a solved problem, and conventional algorithms accomplished it much better than LLMs do. Unfortunately, LLMs need to find uses to justify investment, so now they must be used for any task that they can be used for, even if its worse than the solutions that we already had.








  • You’ve proved my point that you don’t know what you’re talking about by blindly linking to the git repo. Couldn’t find any source that supports your claim? I wonder why.

    Sure you can serve one request at a time to one patient user at a slow token per second rate, which makes running locally viable, but there is no RAM that has the bandwidth to run this model at scale. Even flash would be incredibly slow on CPU with multiple requests. You’d need the high bandwidth of VRAM and to run across multiple GPUs in a scalable way, it requires extremely high bandwidth interconnects between GPUs.