Hacker Newsnew | past | comments | ask | show | jobs | submit | sagebird's commentslogin

I think that what is missing in this discussion is the latent background that one needs to actually create "ai art" that has merit.

It is a strange combination of skills that some have developed to create styles, pipelines, techniques to create bespoke, recognizable, artistic output from these tools.

Anyone who is typing a prompt into a publicly available generative model is likely not getting anything of high value from it.

I will commend parascene project for trying to court this special sort of advanced users who can create custom pipelines and then connect it to the parascene system for anyone to use.

That process, the implied skills to create a custom generator and host it, can all be broken down so that more people can do it, but I don't think enough people realize it is even something they can do. We are so trained to be consumers of ai services.


I am not an expert but this seems like model distillation could work to get the behavior you need to run on a cheap end-user processor (Raspberry Pi 4/5 class). I chatted with claude opus about your project and had the following advice:

For the compute problem, you don't need a Jetson. The approach you want is knowledge distillation: train a large, expensive teacher model offline on a beefy GPU (cloud instance, your laptop's GPU, whatever), then distill it down into a tiny student network like a MobileNetV3-Small or EfficientNet-Lite. Quantize that student to int8 and export it to TFLite. The resulting model is 2-3 MB and runs at 10-20 FPS on a Raspberry Pi 4/5 with just the CPU - no ML accelerator needed. For even cheaper, an ESP32-S3 with a camera module can run sub-500KB models for simpler tasks. The preprocessing is trivial: resize the camera frame to 224x224, normalize pixel values, feed the tensor to the TFLite interpreter. The CNN learns its own feature extraction internally, so you don't need any classical CV preprocessing. Looking at your observations, I think the deeper issue is what you identified: there's not enough signal in single frames. Your validation loss not converging even after augmentation and ImageNet pretraining confirms this. The fix is exactly what you listed in your future work - feed stacked temporal frames instead of single images. A simple approach is to concatenate 3-4 consecutive grayscale frames into a multi-channel input (e.g., 224x224x4). This gives the network implicit motion, velocity, and approach-rate information without needing to compute optical flow explicitly. It's the same trick DeepMind used in the original Atari DQN paper - a single frame of Pong doesn't tell you which direction the ball is moving either. On the action space: your intuition about STOP being problematic is right. It creates a degenerate attractor - once the model predicts STOP, there's no recovery mechanism. The paper you referenced that only uses STOP at goal-reached is the better design. Also consider that TURN_CW and TURN_CCW have no obvious visual signal in a single frame (which way to turn is a function of where you've been and where you're going, not just what you see right now), which is another reason temporal stacking or adding a small recurrent/memory component would help. Even a simple LSTM or state tuple fed alongside the image could encode "I've been turning left for 3 steps, maybe try something else." For the longer term, consider a hybrid architecture: use the distilled neural net for obstacle detection and free-space classification, but pair it with classical SLAM or even simple odometry-based mapping for path planning and coverage. Pure end-to-end behavior cloning for the full navigation stack is a hard problem - even the commercial robots use learned perception with algorithmic planning. And your data collection would get easier too, because you'd only need to label "what's in front of me" rather than "what should I do," which decouples perception from decision-making and makes each piece easier to train and debug independently.


Can you please design a version for kids to ride on?

With a seat and handle similar to "wooden bee ride on" by b. toys?

I want a vacuum that kids can actually drive, ride on, do real vacuuming and has minimal levels so safety, like turning it over halts vacuums, stairs/ledges are avoided, and lack of rollers or items that could snare a kids hair, etc.

There may be benefits of fusion of child input signals with supervisory vacuums route goals. Would be age dependent, older kids would want full manual I think.

Kids like to do real jobs, and as a parent I prefer purchasing real items for my kids rather than toy versions if practical.


> Kids like to do real jobs, and as a parent I prefer purchasing real items for my kids rather than toy versions if practical.

Real vacuums have existed for a very long time now :P


Real vacuums are _so_ difficult for kids though, they're the wrong size and way to heavy. A zamboni-vacuum-for-kids is definitely not a general purpose thing, but does hit a nice balance between functional and kid-friendly.

Like a Zamboni but a vacuum

Oh, I think kids will like it.

>> To misquote Kennedy, “we chose to focus coroutines on generator in C++23, not because it is hard, but because it is easy”.

Appreciate this humor -- absurd, tasteful.


This is an interesting project, congrats. I have a similar project and goals (free, light, fast portable os-like-dev environment) at http://shiro.computer, http://shiro.computer/about ( https://github.com/williamsharkey/shiro ).

You may find bits of Shiro's code useful as it has massive shimming work to get Claude Code, npm/node, git, various grep tools and isomorphic git and git diffs to work, and some weird features like virtual servers that create virtual ports to communicate to frames.

All the unix tools that Claude code are supported as well. It is also a typescript project and has similar architecture, and MIT license so there may be parts you can just straight import without much hassle.

Probably the hardest part to keep architecturally clean is the shimming required in js eval environment to make Claude Code and non-browser-native packages to run. But it is very nice once you have an agent able to work inside your browser os.

Great job and thanks for sharing Lifo. I am certain this will catch on once the implementations become more solid.


ok do pornhub next


it's not agi until we have browser browsers automating atm machine machining machines, imo


Wes Anderson is positioning cinematographers on American soil with extremely powerful telephoto lenses, filming actors performing in meticulously designed miniature sets across the Canadian border. The film will be titled "The Asymmetrical Tax Avoidance" and will star Bill Murray as a customs agent with daddy issues.


i wouldn’t be surprised if many insults to human health are not relevant in population that exercises vigorously 5 times a week and have good body composition.

for the average american maybe we are looking for straws that break camels backs that are on the edge of breaking anyways


A more compact and beautiful relation exists between integers and finite rooted trees exist, imo.

David W. Matula found a correspondence between trees and integers using prime factorization, and reported it in 1968 in SIAM: "A Natural Rooted Tree Enumeration by Prime Factorization", SIAM Rev. 10, 1968, p.273 [1]

Others have commented on it before, search the web for Matula Numbers

I independently found this relation when working on a bar code system that was topologically robust to deformation. I wrote a document that explained this relation here[2].

I created an interactive javascript notebook that draws related topological diagrams for numbers. [3]

[1] http://williamsharkey.com/matulaSIAM.png

[2] https://williamsharkey.com/integer-tree-isomorphism.pdf

[3] https://williamsharkey.com/MatulaExplorer/MatulaExplorer.htm...


Sorry - I believe I am off topic as this is not relevant given:

"This indirectly enforces the idea that sets cannot have duplicate elements, as set membership is defined purely by the presence or absence of elements. For example:"

So there is a constraint on what sort of trees are allowed in this -forrest- which would preclude most finite rooted trees.


From [2]:

> EG: 165 = P5 * P3 * P1

Shouldn’t the last component be P2 (= 3)?


You are exactly correct - thank you for reading and letting me know, appreciate your curiosity!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: