I agree, it is correct, we do not need much markup other than the five you specified, plus hyperlinks; those are good enough. Yes, the client should decides how it looks.
I'd include lists and tables for data presentation. Also, we may want some (additional) tags to manage footnotes and references (these were in SGMLguid, but didn't make it into HTML.)
You could just link to them I think, possibly with a hint to specify inlining; the client decides whether to obey that hint or to ignore that hint. (This would be done the same whether the picture is part of the document or is a separate file, I should think. It makes many considerations easier to work with.)
I guess, the Netscape Navigator 1.0 specs (tables and forms, but before frames and JS) are quite what we may want (with the possible exception of the font tag and, certainly, without the blink tag), augmented by some footnote/reference tags and a reduced set of CSS.