Ep 70 – April 13 – Scraping data like a pro

Martin Uncut
Martin Uncut
Ep 70 – April 13 – Scraping data like a pro
/

Today I’m doing a bit different type of show. Today I wanted to talk a bit about some tech news and provide a bit of commentary around them.

Multiple different sources reports that as much as 42 episodes of The Joe Rogan show has been taken down by Spotify. It was part of the deal when Spotify made the podcast exclusive that some episodes could be pulled down. It’s mainly topics that could be seen as controversial. I think this is quite interesting – of course it doesn’t have anything to do with free spreach. Rogan has him self said that he doesn’t really care – he also confirmed that it was part of the agreement. Spotify however doesn’t provide any commentary at all to which and why the episodes where deleted. My questions, that I don’t really know how to ask them to, would be how much control does Spotify actually have over the content created and published through the exclusive podcasts?

Libsyn is one of the larger podcast hosting platforms. Yesterday they announced they have acquired the podcast monitization platform Glow.fm. Link to the blog article on Libsyns website can be found in the show notes. Glow is similar to patreon and provides an alternative to ads as a monitization stream. The thought is to use different premium content to be offered to people in a membership. Glow also offers monthly or one-time donations from listeners.

Even if Libsyn now is the owner of the service – independently of where you host your podcast you can still use it. Personally I think this is a great move from Libsyns side. I don’t know much about Grow.fm from before but it ties well into the services that libsyn provides. The next natural tool in the toolbox would be to find a way to offer Ads – similar to what Anchor does. I think this would make Libsyn more attractive in the eyes of the podcasters.

Bussiness Insider reports that 1.3 milion user records has been scraped from Clubhouse and made public online. You will find a link to the article in the show notes. Clubhouse states that the data is not from a hack but all the data is publicly available through the app or their APIs. The reason that this is not good is that this data can be used for impersonation attacks where personal data is important.

Related to this LinkedIn 500M user records where also scraped from and made available in hacker forums acording to Wired. Again all the data in both the Clubhouse and LinkedIn scraping is data that has been made available by their users and was available for anyone on these platforms. Recently Facebook hade the same issues with around the same number of users records being scraped of Facebook. Here together with potential sensitive information. There are some indications that the contact import function was the culprit and made it easy for the attackers to get hold of the data.

Be careful what you put out there when it comes to your personal life and your personal data. Even if you think it is only your limited circle of contacts that can see the information you published – there may be others gaining access to it. Together it can all be used to build a picture of a person and the network around one. This can result in more or less automated phising and impersonation attacks.