More

tuned · 2026-01-19T07:12:35 1768806755

no, from my point of view is being more domain-focused instead of going full-orthogonal.

tuned · 2026-01-19T07:10:56 1768806656

right. this is a proposal that needs to be tested. I started testing it on 30M parameters then I will move to a 100M and evaluate the generation on domain-specific assisting tasks

tuned · 2026-01-19T07:08:22 1768806502

> This is obviously not powerful enough to express non-linear relationships - like graph relationships.

the distance metrics used is based on energy-informed graphs that encode energy relations in a distribution called taumode, see my previous paper on spectral indexing for vector databases for a complete roll-out

tuned · 2026-01-19T07:05:21 1768806321

also: precomputing a sparse Laplacian for N vectors at dimension D (NxD) is infinitely cheaper (if using `arrowspace`, my previous paper) than computing distances on the same full dense vectors billions of times. There are published tests that compute a Laplacian on 300Kx384 space in 500 secs on a laptop on CPU. So it is a trade-off: potentially few minutes of pretaining or hours of dot-product on dense matrices

tuned · 2026-01-19T06:57:51 1768805871

if you have a corpus of code snippets to train the manifold (Laplacian) on (and a good embedding model), it is definitely possible to try something like this.

tuned · 2026-01-19T06:43:59 1768805039

it made sense to me as it is a very simple idea I guess: causal self-attention compute QKV distances computing on the full vectors for Q,K and V; the topological transformer can provide the same computation using Q, scalar K and V. Instead of [N², N², N²] -> [N², N, N²] is used. If generation is confirmed to be on par in terms of quality, the gains are evident.

tuned · 2026-01-19T06:39:16 1768804756

it most-likely will in terms of performance as it uses 50% less memory (for sure it will at inference time that is the most used operation on web services), because it can leverage longer T and D if the design is confirmed and the quality of generation is comparable to other models. If this very basic assumption is correct, it means a lot of savings in electricity as the same GPUs can resolve more requests.

throwaway314155 · 2026-01-20T00:01:59 1768867319

By performance, I meant the accuracy of the model, not the runtime/memory characteristics.

tuned · 2026-01-19T06:32:05 1768804325

Thanks to all that have read. I would be glad to answer further scoped questions on the content of the post and the paper. I answered some comments that may clarify the ideas from the redesign.

tuned · 2026-01-19T06:30:01 1768804201

the idea is to have a lot of "narrow" models to work with RAG instead of one model for all the knowledge domains or also distil the metadata that is currently in enterprise Knowledge Graphs

tuned · 2026-01-19T06:28:20 1768804100

exactly, that is the current objective. To proove that generation for a specific domain is on-par with causal attention models