See linked posting. I’ve commented there with a link to a CLI tool in Python that allows downloading of IA collections. I’ve submitted a patch to enable specifying start and end points so that it’s easier to resume downloading a huge collection, or to allow multiple people to split up the work.
https://archive.org/details/georgeblood
https://archive.org/details/78rpm_bowling_green
F*ck the RIAA and absurdly long copyright.
EDIT: There is more than one collection of 78s on IA, so I updated the title.
The issue with these collections are that they’re absolutely HUGE. And yes, IA offers torrents for them, but as a separate torrent for every. single. album. And the torrents have all data in them – FLAC, fixed-rate MP3, VBR MP3, PDF liner notes, etc. etc… there may be some extremely hardcore data-hoarders out there who want everything, but IMHO as these are scratchy old 78 records, FLAC is overkill to just save the audio in a listenable format. The George Blood collection, just the VBR MP3s, is looking to be about 6TB. With ALL data it might be over 40TB! I can’t afford that many hard drives :)
So, my approach at the moment is to save just the VBR MP3s (they seem to be done at up to 320kbps VBR) and the JPEG album cover. If I have a chance and any storage left afterwards, I can make a separate pass to get the album liner PDFs…
Tool used: https://github.com/jjjake/internetarchive
Patch to allow setting start and end item indices for downloads: https://github.com/jjjake/internetarchive/pull/605
Example usage to grab just the VBR MP3 and record label JPG for each (note the --start-idx and --end-idx arguments):
#ia download --start-idx=4001 --end-idx=8000 -a -i --format="VBR MP3" --format="JPEG" --search collection:georgeblood
I’m going to concentrate on the George Blood collection for now… I’m starting at item 1. It would be great if others started at index 50,000, 100,000, 150,000, … and others started at the end and worked backwards in similarly-sized chunks, so that it’s assured someone gets each of them.
You don’t need to censor yourself…
Fuck the RIAA, bunch of absolute cunts.
Yeah, you’re right, Fuck em.
FYI I’m currently on 4001-8000 of the ‘Great 78 Collection’. Looks like I’ll need about 6TB to get it all, yikes! (Just the VBR MP3 files, not the FLACs. Holy Hell.)
collection:georgeblood
https://archive.org/details/georgeblood
If everyone would take blocks of it, say 4000 each, we can eventually create torrents for each one or something so it can all be reassembled if/when the IA has to take it down.
Yup. Torrents are the way forward to archive such collection.
I wish the IA would offer a torrents of the overall collection but it’s over 400k separate torrents, one for each album. And they contain FLACs, fixed- and VBR MP3s, PDF jacket notes, JPGs … it’s just too much for one person (I am OK with buying an 8TB drive or two, but not a dozen!)
I’m trying to at least grab the VBR MP3s (these are old scratchy records after all… I don’t know how much FLAC will really preserve). Maybe if I can get most of those, I’ll do a second pass and get the album cover JPGs, then liner PDFs… depending on if/how long the collection stays up.
The IA has torrents for everything they upload already.
Normally I would just fetch the torrent, yes, but this particular collection is huge – over 400k separate items (which on IA be their own torrents). Is there a way to get an aggregate, but filtered, torrent with just, say, the album jpg and VBR mp3 files for each? I don’t think I can afford the entire collection as each also has the FLACs.
which block are you on now?
around 5500… gonna take a while. My ISP says there’s no monthly cap but I wonder if I really should dl this much…