Hacker Newsnew | past | comments | ask | show | jobs | submit | more adriancooney's commentslogin

I love your site, I use it all the time and recommend it to all my friends. Thank you.

Do you offer historical exports of data? I’d love to create some visualisations of the housing situation in Ireland over time.


I still find `ts-node` gets in my way often with configuration related issues. `tsx` on the other hand has been a dream.


Thanks for posting this again! It's a year later and I still haven't touched the web scraper in production which is great to reflect on. It seems running the Youtube command on the post is still producing the exact same data too.

  $ npx puppeteer-heap-snapshot query \
    --url https://www.youtube.com/watch\?v\=L_o_O7v1ews \
    --properties channelId,viewCount,keywords --no-headless


Did you ever make another blog post about how to choose properties working backward from the visible data on the web page to the data structure containing said data?

Searching the heap manually is not working very well. The data I want is in a (very) long list of irrelevant values within a "strings" key. It might have something to do with the data on the page that I want to scrape being rendered by JavaScript.


I wonder if comparing embeddings could work here? It might be more resilient to cosmetic changes.


Thanks for the recommendations. Some of my fav ambient(ish) albums that might fit in that list:

- William Basinski - The Disintegration Loops

- Jon Hopkins - Music For Psychedelic Therapy

- Erland Cooper - Music for Growing Flowers


Anecdotally I agree with you but doesn't this blog post suggest the reverse - click bait does well? The model was trained on a fairly comprehensive set of HN titles and it scores click-bait-y titles with a high "Good" probability. e.g. `"Beware! Uninstalling this PC game deletes your hard drive"` with a `62.0% Good prob`. There's a ton of hidden complexity involved here but if click-bait was generally downvoted by the HN community, we should expect a low "Good" score, right?


Apologies for the shameless self-promotion here but it was this very problem that I built puppeteer-heap-snapshot. It decouples the HTML from the scraper and instead we inspect the booted app’s memory. Not near as performant but a lot more reliable. I wrote about it here: https://www.adriancooney.ie/blog/web-scraping-via-javascript...


Hi! Your application looks interesting! I have a question regarding your YouTube example: Where do you get property names like channelId,viewCount,keywords from? Thanks


This is amazing thanks for sharing!!


If you haven't already, give the Fall of Civilizations podcast [1] a listen. It's one of my favourites - informative, engaging and peaceful listening - about how civilizations rise and fall. Episode 8 is about the Sumerians in Iraq and might give you a picture of how these people lived (if nearly 1500 years earlier).

[1]: https://fallofcivilizationspodcast.com/


YES! I was born in 1989 and have played quite a bit of age of empires 1. I find this podcast gave me goosebumps.

The one on the Assyrians is my favorite podcast i've ever heard.


Sweet, I’ll check it out tonight.

Have you listened to The History of Rome podcast?

I’ve spent 3 years listening to it (twice) and I have to make conscious effort not to pick it up again. It’s sooo good.


The History lf Rome was my first podcast I've listen to. It is such a treat to listen. I've tried to continue with The History of Byzantium and it was just not the same. So now I've picked up Revolutions since I think it was really Mike Duncan style I appreciated (well that and the Romans)


The two books by Paul (host of that podcast) are fantastic. I cannot recommend them enough.


I love this one, sad that it doesn't have any new episodes for some time now.

Sort of related: I also do recommend the very excellent Pirate History Podcast.


Thanks for the lead. I tried. I cannot handle the music bed.


If it’s rendered server-side - no. The data likely won’t be loaded into the JS heap (the DOM isn’t included in the heap snapshots) when you visit the page. You might be in luck if the website executes JavaScript to augment the server-side rendered page however. If it does, your data may be loaded into memory in a way you can extract it.


Ah thank you for the reminder. Added it now!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: