Some observations on the Great Firewall of China

Suddenly, without any warning:

Aug  8 02:04:32 neolith1 sshd[12862]: Bad protocol version identification 'jN\321\203\267\237\317a-\310VS9\305\343\242\230\370\232W\255\224c\223\021\251\001\230 \335\256\315\037\374\367\347`\312\tI\037\024__~\363i\031\b8P\262\247\002s\322\324\177\022\27430v\020\313\030\253\346\312\001\301c\002&-r*\350\037i\241\335\274\316\364S]\035%\242\332\262\037\270xiO\243P\230' from 58.144.66.30

Say what?

Aug 20 05:57:34 neolith1 sshd[14803]: Bad protocol version identification 'E\253l\233\242\243R\246\356#5\314\347K\327;\025\305?\224\367\212$\221q\276&\337>\037\366G\363B<\340"O\327f\233\037\371G\a\222e$\220\002E\203\300<\251\0050g\235\200`4\204\365\226d\305\027\030\003%Q\364\001\032\350\257P\250\363\352\344\002g\327\304\370\251\21149RLdeaG*x\335' from 123.184.190.106

Again?!

I work as a security officer at the National Supercomputer Centre at Linköping University. It is my job to be paranoid.

I pay a lot of attention to our ssh logs. We have something like fifty thousand ssh logins per day, and anywhere up to half a million failed login attempts. I don't like seeing things in my logs that I don't understand. It makes me twitch.

I certainly didn't understand those log entries that suddenly started appearing. From out of the blue, Chinese IP addresses would connect to the ssh port on one of our systems and throw what looked like random bytes at it. Each address would connect just once. We had never seen those IP addresses before, and they would go away again, never to return.

The plot thickens: Moreover, it seemed that nobody had seen these addresses in any relevant context. I checked all of my usual sources for information on botnets, on ssh brute-force scanners, on machines exhibiting a generalized anti-social behaviour. Nothing. These addresses were clean. Nobody had seen so much as an insulting blog post comment from them.

I started discreetly contacting colleagues at other sites, to check if they saw anything similar. Again, nothing. Oh, sometimes people would see some "Bad protocol version identification" messages, but on closer look there was always some simple explanation. Perhaps somebody had ran an nmap or Nessus scan against them - these scans are easily recognizable. Sometimes, somebody was scanning for open http proxies. But this stuff? Nope.

Except, a few select colleagues did see the same thing. Chinese addresses, making one-shot connections to port 22 on a small number of target systems, sending pseudo-random binary data. To this date, I am aware of four sites around the world that have received these probes. In all reason, there must be more targetted sites, but I haven't found them.

This really made my admin senses tingle. Were we compromised? Were these weird payloads triggers for some kind of backdoor on our system? I will spare you the details, but the level of scrutiny I subjected the system to was, well, scrutinous. I found nothing. Unfortunately, this didn't necessarily mean that we weren't compromised. It might just mean that I hadn't looked hard enough. So I looked some more. Once again, nothing.

We captured some of the probe payloads in full. I could see no pattern in the data, no kind of protocol that I could recognize. Was this some kind of 0day sshd exploit? I threw the data at a carefully monitored sshd process, to see if it would trigger some unexpected code path. Nothing.

Eureka: Then, almost a year after I saw that first probe, I was looking at the log data from yet another angle to try to figure this out, and I finally saw the pattern. Epiphany. I felt like a genius, and at the same time like the world's greatest idiot.

The blindingly obvious thing I had been missing for a year was that each and every one of these probes was followed after 5-20 seconds by an ssh login attempt from China, and the attempts were mostly successful and apparently legitimate. The legitimate logins came from somewhere completely else, but still in China.

What can I say? We have lots of Chinese users. We have lots of logins from China, nothing strange about that. In our systems, 5-20 seconds means several pages of log data to scroll through. Still, I should have caught this much faster.

My colleagues at the other targetted sites confirmed my observations. Once they knew what to look for, they saw the same thing. My only consolation is that those very clever people also hadn't found this correlation.

So, to more precisely describe what we have found: a small subset of the ssh logins from Chinese IPs to two of our systems are preceded by one or two connections from unrelated Chinese IP addresses, in which opaque binary data is thrown at sshd. These addresses come from all over China, from all sorts of networks.

I see no particular pattern in which users are targetted, or what IP addresses they log in from and I see no pattern or structure in the opaque data, except for the following:

For a few weeks around May/June 2011, the probe payloads actually looked like SSL handshakes,

We see lots and lots of ssh brute-force attacks from Chinese IP addresses. However, these ssh login attempts do not appear to trigger any probes.

So, what is this all about? Well, I have a theory.

Conjecture: This is the Great Firewall of China, working to protect its citizens from unlawful foreign influences.

My hypothesis is that just over a year ago, a new function in the firewall went into limited beta test, where a sample of outgoing ssh connections from China is carefully selected for secondary screening.

I don't know what the selection criteria are, but apparently the probing only involves certain Chinese source networks (hence the lack of probes in connection to brute-force attacks) and certain target hosts.

For the selected ssh connections, the target system is probed from one or two IP addresses under the control of the Chinese government. These may be otherwise innocent addresses that are spoofed at the level of the great firewall, or they may be actual computers under remote control by the government - I have no way to tell.

I don't know what the probes are supposed to accomplish. My only guess is that the government is looking for certain services it doesn't approve of, like open proxies or Tor relays, and that precise fingerprinting may be too expensive. Instead, they resort to an inspection method similar to fuzzing, where pseudo-random data is thrown at the server, just to see what happens.

In some cases, the legitimate ssh connections are unsuccessful; they appear to be interrupted. This may be a result of the firewall deciding the target system to be unsuitable and injecting RST packets into the TCP stream to kill it.

The last few weeks, the frequency of the probing has increased. This might mean the beta test period is nearing its end, and that this function is about to become more widely deployed.

Conclusion: I do believe my theory above is more or less correct. I have no way to prove it, but it does fit all observed facts (including some which I am unable to disclose), and I have no tenable alternative explanations. It also matches the known repulsive censorship the Chinese government subjects its citizens to. I strongly dislike this probing of our systems that the Chinese government appears to be performing.

Of course, I may be mistaken. Any feedback or alternative explanations are welcome. You can reach me at nixon@nsc.liu.se.

- Leif Nixon, 2011-11-07