More

adriancooney · on Sept 12, 2023

I love your site, I use it all the time and recommend it to all my friends. Thank you.

Do you offer historical exports of data? I’d love to create some visualisations of the housing situation in Ireland over time.

adriancooney · on Aug 18, 2023

I still find `ts-node` gets in my way often with configuration related issues. `tsx` on the other hand has been a dream.

adriancooney · on Aug 8, 2023

Thanks for posting this again! It's a year later and I still haven't touched the web scraper in production which is great to reflect on. It seems running the Youtube command on the post is still producing the exact same data too.

  $ npx puppeteer-heap-snapshot query \
    --url https://www.youtube.com/watch\?v\=L_o_O7v1ews \
    --properties channelId,viewCount,keywords --no-headless

odysseus · on Aug 9, 2023

Did you ever make another blog post about how to choose properties working backward from the visible data on the web page to the data structure containing said data?

Searching the heap manually is not working very well. The data I want is in a (very) long list of irrelevant values within a "strings" key. It might have something to do with the data on the page that I want to scrape being rendered by JavaScript.

adriancooney · on May 21, 2023

I wonder if comparing embeddings could work here? It might be more resilient to cosmetic changes.

adriancooney · on Oct 12, 2022

Thanks for the recommendations. Some of my fav ambient(ish) albums that might fit in that list:

- William Basinski - The Disintegration Loops

- Jon Hopkins - Music For Psychedelic Therapy

- Erland Cooper - Music for Growing Flowers

adriancooney · on Aug 15, 2022

Anecdotally I agree with you but doesn't this blog post suggest the reverse - click bait does well? The model was trained on a fairly comprehensive set of HN titles and it scores click-bait-y titles with a high "Good" probability. e.g. `"Beware! Uninstalling this PC game deletes your hard drive"` with a `62.0% Good prob`. There's a ton of hidden complexity involved here but if click-bait was generally downvoted by the HN community, we should expect a low "Good" score, right?

adriancooney · on Aug 10, 2022

Apologies for the shameless self-promotion here but it was this very problem that I built puppeteer-heap-snapshot. It decouples the HTML from the scraper and instead we inspect the booted app’s memory. Not near as performant but a lot more reliable. I wrote about it here: https://www.adriancooney.ie/blog/web-scraping-via-javascript...

btzs · on Aug 13, 2022

Hi! Your application looks interesting! I have a question regarding your YouTube example: Where do you get property names like channelId,viewCount,keywords from? Thanks

robk · on Aug 11, 2022

This is amazing thanks for sharing!!

adriancooney · on May 31, 2022

If you haven't already, give the Fall of Civilizations podcast [1] a listen. It's one of my favourites - informative, engaging and peaceful listening - about how civilizations rise and fall. Episode 8 is about the Sumerians in Iraq and might give you a picture of how these people lived (if nearly 1500 years earlier).

[1]: https://fallofcivilizationspodcast.com/

kaon123 · on May 31, 2022

YES! I was born in 1989 and have played quite a bit of age of empires 1. I find this podcast gave me goosebumps.

The one on the Assyrians is my favorite podcast i've ever heard.

rpastuszak · on May 31, 2022

Sweet, I’ll check it out tonight.

Have you listened to The History of Rome podcast?

I’ve spent 3 years listening to it (twice) and I have to make conscious effort not to pick it up again. It’s sooo good.

guildan · on May 31, 2022

The History lf Rome was my first podcast I've listen to. It is such a treat to listen. I've tried to continue with The History of Byzantium and it was just not the same. So now I've picked up Revolutions since I think it was really Mike Duncan style I appreciated (well that and the Romans)

vowelless · on May 31, 2022

The two books by Paul (host of that podcast) are fantastic. I cannot recommend them enough.

dakial1 · on May 31, 2022

I love this one, sad that it doesn't have any new episodes for some time now.

Sort of related: I also do recommend the very excellent Pirate History Podcast.

ghastmaster · on May 31, 2022

Thanks for the lead. I tried. I cannot handle the music bed.

adriancooney · on April 29, 2022

If it’s rendered server-side - no. The data likely won’t be loaded into the JS heap (the DOM isn’t included in the heap snapshots) when you visit the page. You might be in luck if the website executes JavaScript to augment the server-side rendered page however. If it does, your data may be loaded into memory in a way you can extract it.

adriancooney · on April 29, 2022

Ah thank you for the reminder. Added it now!