About #Snowdenfiles & how they were obtained: a thought
DISCLAIMER: This article is pretty much about “IT COULD HAPPENED LIKE THIS”; so just to make sure I am not making any promises over here. In addition, I am not making any allegations nor believing statements to side or another. I take no responsibilities of anyone sleeping capability after reading this either. This is just some primitive analysis.
Well, first of all there is most likely 3 vectors available for this, but only 1 really feasible – with a twist.
They are not in any particular order: 1) DB dump, usage of 2) SPS indexer/reporter etc. OR 3) using automation w/wget like (Powershell) OR manually w/browser. Since it would be merely impossible to work with browser as the alleged amount of data is sizable. So in its lowest form, it is possible that the alleged data was accessed and downloaded with a very simple techniques.
For the reference, you might want to check this: http://arstechnica.com/information-technology/2014/02/nsa-let-snowden-crawl-all-over-its-intranet-unchecked/ – I do have my doubts, so let it be like that.
I am not saying all the data resides within SPS DB:s (that is Sharepoint Portal Server using MSSQL), but they are heavy users of such technology (and its corporate std). Snowden was working with such IT technology area anyway so they could be natural match.
The best part of the thingy is that they could have used SPS as metacrawler for other sites and systems to create “Collections” as they are within SPS. Who knows? Just a wild quess to make all information available under same roof.
I try to image what kind of environment (No) Such Agency & others within same sandbox have. Most likely they do have some sort of reverse-proxying capability with TLS off-loaders. Among that, they do most likely have clustered environment with few other Microsoft technologies on front of them. It is publicly disclosed information that they seem to have extensive use of certs/PKI with such environment.
Most likely portal server (SPS) or in-front-of server/LB stack (or all) would request certificate authentication with any standard issue browser..Without cert, no access and alarm bells whistle. For ‘WGET’ also, same deal. Basically it would have been too problematic to play with such channel. Not to talk about bending or circumventing the IRM/DLP protection layer. I just can not bite it.
With DB dump vector, he would encounter encryption which secures data “on rest”. I can not tell the technology (I can imagine w/some sophisticated guesses what they are using, but I leave it away), but it relies a) heavily “adding” key for crypto operations & b) there’s AD integration. The crypto mechanism is policy controlled and is established lower level to protect the data for untrusted 3rd parties, transfer and so on.
Now with DB dump access & exports would be something that is automatically logged and would be visible. AND the data would been kept secure still. There’s no practical way within the premises that unauthorized party would being able to obtain the keys.
Sounds funny, but I do believe it maneuvered in this case quite well. It protected the access for the data on-rest and that is the reason why such attack vector was never selected. It worked.
2 OPTIONS LEFT
So – that leaves 2 options: automation w/wget/powershell etc. (OR by hand) or through SPS indexer process by using indexer credentials.
Well, with automation tool/hand – there’s great chance to get compromised as reverse proxy should caught this already.
If the accesses are done through indexer, most likely events/log details are very low as it happens all the time. And that my friends, gives an excellent answer for our question HOW IT IS DONE…
…HOWEVER: The indexer may be executed through separated proxy and does not necessarily need such authentication. It is even possible that such search routine is executed in separate server stack without ever going through the “security checkpoints”; or network controls.
It is quite possible that someone modified the indexing process so that it would download the data despite the control set to external location from the SPS and the DBs.The modifications may be overly simple: Just a piece of ASP.NET code to tell instead of pure crawling & indexing, take the data and pack it with ZIP to specific location. All the data on-rest security would then be legitimately passed as the credentials and keying match. And in addition, IRM/MS DLP would not harm indexers/reporters etc. system component work.
From that external location I have not yet any clues how the data got out. There are many options for it. Maybe someone scripted the work so that it created such a small size chunks of data it was easier to hide.
REFERENCE ENVIRONMENT / TESTS / LONELY WOLF ?
What ever happened, but if it happened with SPS – I am quite sure one needed a test/reference env. to run it prior active “collection”. Without it – there was too great chance to
Well – could one do this my himself? Yes, with relatively easy setup. It may take time to secure the operation, but yes. Easily.
WHAT DID NOT WORK
First and foremost: Such attack would have been very difficult to prevent, except by really, really rigorous compartment over the data. How would you disseminate such (f.ex intel) data after that? You do not, at least not efficiently.
Multilayer security (and
#crypto) is essentially difficult to deploy, there’s no end-2-end solution available to implement such (I assume) security. Not even with such enormous power.
Second: Microsoft SPS is remarkably complex system with amount of different reporting capabilities and auxiliary functions. The structure of such system within organization like this is about QUANTITY OF SERVERS; I am talking about tens or even hundreds of servers running specific tasks within SPS environment. It is quite possible they have 15 servers on their search (indexer) cluster performing different kind of tasks.
Microsoft technology is VERY difficult to handle in such manner what is needed in compartmented environments.
I am quite convinced that whether or not one accessed & downloaded the data from such system, there is a major hole in many organizations security posture through this kind of relatively simple hole, even without any software bugs or vulnerabilities that is.
Third: How come there was no tracking of foreign objects in place of such system (lets assume the indexer)?
Fourth: They gotta have some sort of SIEM or SEM system in place. Logs do not do any good as they are forensics tools obviously. How come their S(I)EM system did not monitor that unusual behavior which our alleged indexer hole created? Oh yea, you do not have such technology being able to do it in massive scale 🙂