minus-squareBalinares@pawb.socialtoTechnology@lemmy.world•DeepSeek Permanently Reduces The Price Of Its Flagship V4 Model By 75 PercentlinkfedilinkEnglisharrow-up19·5 days agoThey invented a hybrid attention design that drastically reduces the amount of memory needed for the KV cache at inference time. Like, dividing it by 10. And memory is a large part of the cost of inference. linkfedilink
They invented a hybrid attention design that drastically reduces the amount of memory needed for the KV cache at inference time. Like, dividing it by 10. And memory is a large part of the cost of inference.