Visual vs. text based programming, which is better?

hermitcrab · on Feb 7, 2024

Professional programmers like to be rude about visual programming, but it only seems to be getting more popular. So it must be doing something right! The problems come when it is applied to problems where it isn't a good fit (as described in the article).

PaulHoule · on Feb 7, 2024

There are a lot of "boxes and lines" tools like

https://www.knime.com/knime-analytics-platform

that are based on a model similar to relational databases in that "rows" pass along the lines. I worked on a pitch and even a prototype for a product that passes a hybrid of JSON documents and RDF graphs over the lines. A pure relational system has a "spaghetti" problem because you have to use joins heavily. For instance if you want to do an analysis of customers and their orders you have to split your data pipeline to separate customers and orders and often join them together here and then split them apart. What seems like "small" changes to your supervisor actually involve changes all over the graph, so you are losing the benefits of a visual tool. (e.g. a "good" architecture for any software is one where what seems like a small change to your boss is actually a small change)

The graph-over-lines product is much more intuitive for complex pipelines and I figured out most of the math for it and how to execute it with reactive streams and tear the whole thing down carefully at the end so you get correct answers. (I later worked at a place that built such a thing where the people denied it had an algebra and didn't think about teardown so they had trouble w/ getting the right answers)

I talked to a lot of people who use, develop, or invest in "boxes and lines" tools and they all thought the graph based system was a terrible idea because modern tools of that sort use a columnar execution model which is really smokin' in terms of speed and didn't want to give up performance for convenience. (Might be something like https://en.wikipedia.org/wiki/The_Innovator%27s_Dilemma)

hermitcrab · on Feb 7, 2024

Boxes and line are a (directed acyclic) graph. So I'm not sure what you mean by 'graph-over-lines'.

PaulHoule · on Feb 7, 2024

'graph-over-lines' means that instead of sending a series of rows over a line you send a series of graphs (more or less JSON documents) over the lines. For instance you might send a record that contains both basic data about a customer (name, address) and a list of orders the customer has made and one of the operations in a box can work on the whole thing as a unit.

Imagine each box has something like "jq" inside of it.

In the prototype I made I used the same RDF tools to define the overall processing graph as well as the little graphs that go over the lines. The folks who make Jena told me I was breaking the warrantee on their rules engine but I found I was able to use production rules to control the process (control plane) that sets up and tears down reactive streams and other resources (like a key-value store) that are part of the data plane.

hermitcrab · on Feb 7, 2024

Ok, so if I understand you correctly, the data structure you want to pass between boxes is a tree structure, rather than a table.

I think that approach would work well with some data, particular data encoded in XML or JSON. But a lot of data is 'rectangular' and comes as CSV, Excel or relational tables. I can't imagine it work well with that.

PaulHoule · on Feb 7, 2024

It works fine for rectangular structures, I mean you can write a JSON document like

  { "a": 1, "b": 2, "c": 3}

you do pay a performance cost but remember the thing can have a compiler in it that can figure out that the data that goes over a certain line has a fixed structure and treated accordingly, e.g. the above could be just

  01 02 03

The trouble w/ the relational model is that you often have to calculate things that involve joins (say you want to know the average selling point of items from the garden supplies department were bought by people in the rewards program) and you fundamentally can't do that inside one box, rather you have to build a great big network of boxes and lines that is difficult to understand and maintain. What is 8 boxes with joins could be done in one box if the data was shaped appropriately. You open up a whole world of possible software reuse (cut and paste one box, not a huge network of boxes and lines)

The demo I did processed XML documents from

https://www.gleif.org/en/lei-data/access-and-use-lei-data

which was almost like a flat CSV listing information about business registrations and then merged it with some other flat files and a "knowledge graph" to make a set of specialized databases that made a web application for browsing registrations.

A key concept here is "structural stability", yes 90% of the data comes in rectangular forms. With graphs you can deal with the other 10% with about 10% more effort but if you are in the rectangular straightjacket it is more like 10x more effort not 10% more. Pretty quickly most of the headspace in a classical data analysis pipelines involves the weird, irregular and hard to do stuff even if it's not most of it.

hermitcrab · on Feb 7, 2024

The relational approach requires flattening JSON or XML into a table. Which is far from ideal, I agree.

What you are suggesting sounds more general purpose, but also significantly more complex. A table of data that passes through transforms is quite easy to explain to non-techies. I don't fancy explaining your approach to a non-techie.