Hacker Newsnew | past | comments | ask | show | jobs | submit | tuned's commentslogin

no, from my point of view is being more domain-focused instead of going full-orthogonal.

right. this is a proposal that needs to be tested. I started testing it on 30M parameters then I will move to a 100M and evaluate the generation on domain-specific assisting tasks

> This is obviously not powerful enough to express non-linear relationships - like graph relationships.

the distance metrics used is based on energy-informed graphs that encode energy relations in a distribution called taumode, see my previous paper on spectral indexing for vector databases for a complete roll-out


also: precomputing a sparse Laplacian for N vectors at dimension D (NxD) is infinitely cheaper (if using `arrowspace`, my previous paper) than computing distances on the same full dense vectors billions of times. There are published tests that compute a Laplacian on 300Kx384 space in 500 secs on a laptop on CPU. So it is a trade-off: potentially few minutes of pretaining or hours of dot-product on dense matrices

if you have a corpus of code snippets to train the manifold (Laplacian) on (and a good embedding model), it is definitely possible to try something like this.

it made sense to me as it is a very simple idea I guess: causal self-attention compute QKV distances computing on the full vectors for Q,K and V; the topological transformer can provide the same computation using Q, scalar K and V. Instead of [N², N², N²] -> [N², N, N²] is used. If generation is confirmed to be on par in terms of quality, the gains are evident.

it most-likely will in terms of performance as it uses 50% less memory (for sure it will at inference time that is the most used operation on web services), because it can leverage longer T and D if the design is confirmed and the quality of generation is comparable to other models. If this very basic assumption is correct, it means a lot of savings in electricity as the same GPUs can resolve more requests.

By performance, I meant the accuracy of the model, not the runtime/memory characteristics.

Thanks to all that have read. I would be glad to answer further scoped questions on the content of the post and the paper. I answered some comments that may clarify the ideas from the redesign.

the idea is to have a lot of "narrow" models to work with RAG instead of one model for all the knowledge domains or also distil the metadata that is currently in enterprise Knowledge Graphs

exactly, that is the current objective. To proove that generation for a specific domain is on-par with causal attention models

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: