scriptsmith's favorites

		foundval on July 23, 2024 \| parent \| context \| on: Llama 3.1 There is a lot out here. I gave a seminar about the overall approach recently, abstract: https://shorturl.at/E7TcA, recording: https://shorturl.at/zBcoL. This two-part AMA has a lot more detail if you're already familiar with what we do: https://www.youtube.com/watch?v=UztfweS-7MU https://www.youtube.com/watch?v=GOGuSJe2C6U
		crazygringo on Oct 20, 2023 \| parent \| context \| on: In search of the least viewed article on Wikipedia... It is shocking, but what's more shocking is that doing it correctly is kind of rocket science. MySQL simply isn't built for picking a truly random row performantly (I'm not sure if any of the common relational databases are?). If you do something naive like "ORDER BY RAND() LIMIT 1" performance is worse than abysmal. Similarly, doing a "LIMIT 0 OFFSET RAND() * row_count" is comparably abysmal. While if you do something performant like "WHERE id >= RAND() * max_id ORDER BY id LIMIT 1", you encounter the same problem where the id gaps in deleted articles make certain articles more likely to be chosen. There are only two "correct" solutions. The first is to pick id's at random and then retry when an id isn't a valid article, but the worry here is with how sparse the id's might be, and the fact it might have to occasionally retry 20 times -- unpredictable performance characteristics like that aren't ideal. While the second is to maintain a separate column/table that includes all valid articles in a sequential integer sequence, and then pick a random integer guaranteed to be valid. But the problem with performance is now that every time you delete an article, you have to recalculate and rewrite, on average, half the entire column of integers. So for something that's really just a toy feature, the way Wikipedia implements it is kind of "good enough". And downsides could be mitigated by recalculating each article's random number every so often, e.g. as a daily or weekly or slowly-but-constantly-crawling batch job, or every time a page gets edited. You could also dramatically improve it (but without making it quite perfect) by using the process as it exists, but rather than picking the article with the closest random number, select the ~50 immediately lower and higher than it, and then pick randomly from those ~100. It's a complete and total hack, but in practice it'll still be extremely performant.
		jitl on Aug 12, 2023 \| parent \| context \| on: tRPC – Build and consume typesafe APIs without sch... We’ve use an API style similar to tRPC at Notion, although our API predated tRPC by 4 years or so. You can build this kind of thing yourself easily using Typescript’s mapped types, by building an object type where the keys are your API names, and the values are { request, response } types. Structure your “server” bits to define each API handler as a function taking APIs[“addUser”][“request”] and returning Promise<APIs[“addUser”][“response”]>. Then build a client that exposes each API as an async function with those same args and return type. We use this strategy for our HTTPS internal API (transport over POST w/ JSON bodies), real-time APIs (transport over Websockets), Electron<>Webview APIs (transport over electron IPC), and Native (iOS, Android)<>Webview APIs (transport over OS webview ipc). For native APIs, the “server” side of things is Swift or Kotlin, and in those cases we rewrite the request and response types from Typescript by hand. I’m sure at some point we’ll switch to a binary based format with its own IDL, but for a single cross-language API that grows slowly the developer experience overhead of Protobuf or similar hasn’t seemed worth it.
		rsanek on April 10, 2023 \| parent \| context \| on: Effective Spaced Repetition For my job search process I created a custom note type specifically for interview problems. My general process was go to LeetCode, find a medium/hard problem, hack on it for 30-60 minutes, then look at the solution if I couldn't get there myself. At the end of the problem, regardless of if I solved it or not, I'd create an Anki card with the following fields: Title Question Additional Criteria Example input/output Insight (1 sentence maximum) Insight explanation (can be longer/bullet-pointed list) Key Data Structure (at most 1 data structure; if there are multiple, use the most important one) Time complexity Space complexity Full answer code (can use syntax highlighter add-on) Source (can provide link to associated question online; can include link(s) to solutions that the insight and/or code come from) There are 4 cards that are generated from this template, which test the same question in slightly different ways. They individually ask for the insight, the key data structure, and the time and space complexities. I found this note type to be critical to my success in the following interviews. In two cases, I was asked literally the same exact question I had already added to Anki; I was able to write out the solution from memory in one go. If you'd like to use my note type directly, I've exported an example here. [0] [0] https://drive.google.com/file/d/12NsYNIBjIPI1Nhq5wE1xPljr9rH...
		JoeDaDude on March 27, 2023 \| parent \| context \| on: BlenderGPT: Use commands in English to control Ble... Famous quote: "Wouldn't it be nice if our machines were smart enough to allow programming in natural language?". Well, natural languages are most suitable for their original purposes, viz. to be ambiguous in, to tell jokes in and to make love in, but most unsuitable for any form of even mildly sophisticated precision. And if you don't believe that, either try to read a modern legal document and you will immediately see how the need for precision has created a most unnatural language, called "legalese", or try to read one of Euclid's original verbal proofs (preferably in Greek). That should cure you, and should make you realize that formalisms have not been introduced to make things difficult, but to make things possible. And if, after that, you still believe that we express ourselves most easily in our native tongues, you will be sentenced to the reading of five student essays. - Dijkstra From EWD952 https://www.cs.utexas.edu/~EWD/transcriptions/EWD09xx/EWD952...