Everyone here knows the importance of archive sites. They fill in the memory hole. That alone makes them incredibly valuable. But they're centralized, which means once they're down all that filling gets dug up again. Plus it costs money to host that much data.
What do you guys think of a program that essentially does the same thing as archive sites, but it downloads the archive to your computer instead? I'm a software developer and could create it myself rather easily.
Now I know what you're thinking: archive sites are valuable because they're probably not tampered with, because tampering means they can be dismissed by their opponents as fake. Local archives can be tampered with, making them useless. However, this program would encrypt the archives such that they can only be viewed if first unecrypted by that same program. The user would have zero power over the encrypted archives, and would act entirely as a host for them.
Considering how tightly the iron grip of progressivism has become around the throat of the internet, it's only a matter of time before archive sites are made illegal, or at least taken down under some bullshit excuse. Local archives would make this less of an issue, because each archive would continue to exist as long as at least one person still has it.
Do you think people would use such a tool? It would be completely free, but couldn't be open source for security reasons.
I know anyone can already just download a page and encrypt it themselves, but most people wouldn't even think to do that much less know how. This program would make encrypted local backups normie-friendly and standardized, because the important thing is having as many as possible.
ArchiveBox is probably the closest thing to this that exists today.
Instead of encrypting them I'd suggest signing the output/page. Then people can decide whether or not they trust the signer. Combine that with a web of trust public key infrastructure and it might be pretty slick.
The encryption is to make it so nobody can simply say "this was tampered with by nazis" when confronted with harsh reality. Archives are for people who know the memory hole exists, but they're also for normies who don't, and normies aren't going to trust anything the news didn't tell them to trust unless they're really sure about it.
You could argue it's an unnecessary level of validation, but going above and beyond is necessary when peoples' default reaction to anything you say is extreme distrust. You have to prove to them that you're not lying. It's unfair, but it's reality.
A signed hash of the file accomplishes the same thing though, and doesn't make the file unreadable by other programs/browsers.
The problem is this part:
They won't. They'll be told not to by Experts and Fact Checkers. And they don't know what a web of trust is, nor do they care. As long as you allow them to choose who to believe, they will choose "not you", because you're not on television. The idea is to not give them the choice in the first place. Sure, there will always be a subset of people who are so NPC that even a hundred different layers of security isn't enough for them, but those people can't be helped in the first place.
There is no way to prove it hasnt been tampered without the original source, unedited. It's all based on trust.
I use Archivebox and it works well enough for individual pages but not whole websites. Grab-site if good for that. Saving as a WARC is the closest you're going to get to proving authenticity since it saves request headers.
See this reply. If you have the time and resources you can defeat anything, but it would be far more complex than a simple redirect. There is physically no way to solve this problem while also achieving absolute data security, so the options are a program that's good enough or no program at all.
The key would have to be baked into the code in order for this system to be totally local, i.e. not relying on a webserver. It obviously wouldn't just be a string stored somewhere. It would be obfuscated many times over, including during encryption/decryption. Again, there's no such thing as literally tamperproof software, but if it's enough of a pain in the ass nobody is going to bother.
The other alternative would be a program that sends a request to a server that does all this for you, but then you have no way of proving that the server didn't tamper with the results somehow, or that whoever's hosting the server isn't just fabricating data. It's arguably worse than a purely local solution because it's less trustworthy, and trustworthiness is the entire point.
I would suggest publishing the documents with IPFS if authenticity is your concern, since it already has tamper-proof checksums built into its addressing mechanism. Open source btw.
You would have a local utility or dApp to generate and store the initial archive, and then anyone you share the link with can also participate as a backup node using your software.
u/WhoIsThatMaskedMan
You would probably need to decentralize it further, by having people hosting nodes of this program and majority nodes will determine trustworthiness or something similar, I'm uncertain if you can encrypt it to such a degree that it cannot be modified in runtime memory, I presume the weak point would be to attack during the download process and in between the encryption in which you could then "modify the content", alternatively what about just feeding it wrong data and then claim it is correct?
What is your threat model - who is trying to censor the content you're archiving? Is it Facebook, Google, Amazon, governments, etc? If so, don't you think that these tech companies might be able to outvote you with raw computing power?
This program would only support connections with TLS, which isn't a problem considering every site relevant to the problem the program is solving supports it. We'd mostly be targeting news sites and social media. TLS certificates can be spoofed, of course, but that alone should deter most attackers.
That wouldn't be the only check, of course, but assuming I do end up making this program, the less said the better. TLS is a pretty obvious feature though.
Unfortunately if someone is dedicated enough, there's nothing you can do to make it one hundred percent secure. But the idea is that if you can make it secure enough, it lends an implied level of trust to the final content. Not even Fort Knox is 100% secure, but it's close enough for the government to store most of its gold there.
Yes, someone could potentially create their own fake Twitter server complete with a darknet certificate and spoofed IP address, not to mention a slew of other ways to defeat the program's validation checks, all so they can create a fake archive of a fake tweet, but that's a pretty insane degree of dedication for something that probably wouldn't amount to much.
You could create an archive of a site and then put it on LBRY (/Odysee). Once it's published any modifications to the content leaves a transaction record on the LBC blockchain.
There’s Webrecorder which apparently has a Chrome/Chromium add-on, along with pywb for playback. I’ve used the software before in projects but I haven’t tried the add-on.
If I was going to write such a program, I would simply have it save the files without decrypting them in the first place, possibly with an additional layer of encryption to store metadata like the archive date. It would only provide a tamper-secure archive of HTTPS sites, and it would only be a trustworthy archive until the current security certificate expired (at which point even if the program had its own copy of the certificate to allow indefinite reading, the expired certificate would no longer be verifiably identical to the original site's certificate), but it would give you a bundle that you could send to anyone that is still signed with the original creator's digital signature.
This program would include TLS as part of the chain. On its own it's generally good enough, but adding another layer of encryption on top of that would ensure that people can't spoof the certificate and defeat the entire purpose. A deadbolt on top of a knob lock, as it were.
I generally like the idea.
Though, wouldn't this result in a lot of data stored on your phone or laptop?
I used to archive tonnes of webpages, saved as PDFs, which took a surprisingly large amount of space.
I have not only thought of this before but about 20 (oh God) years ago had a program that downloaded entire sites so it maintained the entire infrastructure and all data that replicated the entire online experience.
Great program but I also do not remember what it was or if it would work with Internet 2.0 and all that may entail.
I think at minimum a (somehow) trusted source that would (somehow) ignore dynamic content and extract every news story posted and record the hash of that story is needed. Then any archive no matter how untrusted could be verified. Of course this source would become "untrusted" the second it verified a story that the left wants memory holed.