The Bird in Borrowed Feathers [Update]

Update:

48 hours ago, the original post has been published by me. It was a very valuable experience to see how the companies would be handling this. The spotlight they were in was very small. We’re not a big company and we’re not really working in an industry that has a huge audience like the gaming industry for example. So obviously we wouldn’t generate much “buzz”. But that was the perfect condition! Companies usually don’t do something about anything unless they really need to, right? I mean go and ask your internet provider.
So I couldn’t wait to see how the companies would react to my post. I was sure that they’d rather remove our content than linking back to us, but it turned out to be the exact opposite of that. Only one company removed our content – all the others added links to the source material. Not everyone did so entirely in accordance to our policy, but hey – we didn’t want to push it.

And I don’t want to push it now, either. While my first idea was to write down how each company behaved afterwards, I decided not to. I will only show you some numbers:

Five out of five companies either removed our content or named us as the source within 12 hours after the initial blog post.
Three out of five companies contacted us by E-Mail to apologize in a more personal way, which is much appreciated.
Two out of five companies apologized on Twitter (as well) (also appreciated of course).
One out of five companies did not apologize at all.

Special thanks to Nominalia Internet, S.L., who really made some effort in rectifying their mistake. 

 

 

Original post

Dear Heart Internet Ltd,
dear Easyspace Ltd,
dear Nominalia Internet, S.L.,
dear Mainstream d.o.o. (mCloud),
dear Total Publishing Network, S.A.

The lines you are reading now are the 16th or 17th attempt of me trying to not only write a blog post about copyright infringement, but also to deliver the emotions connected with the issue at hand. Over the course of 2015, in which we have proven to provide reliable data, we had a lot of people telling us to stop providing the data for free and instead sell it. They asked us how we could ignore the money that is to be made with nTLDStats over and over again. We started to ask ourselves whether we were naive to believe that our way of running nTLDStats would be the way of doing it. Our conclusion was: Yes, it is the way. The idea that all the data nTLDStats would provide should be available for free has always been a part of the whole concept from the very beginning.

Of course we thought about third parties using the data to make money with it. So we decided that we do offer the data as well as an API to automatically gather the data, but with the restriction that you would never be able to make a copy of nTLDStats.

I don’t know what you thought nTLDStats would be, but it is a project that has been created because something like it hasn’t been available until then. Stefan, the owner of greenSec Solutions, already had way enough customers. nTLDStats was nothing that had to be done in order to make money. It started as a side-project – and still is. Of course it grows and grows and it consumes more time than we’d like to admit. But that is also the reason why Stefan isn’t taking care of nTLDStats alone anymore. There are 4 people working on nTLDStats now.

Did we belive that people would steal content from us? Of course we did. But we were guessing that it would be mostly persons who take data for their website and such. The least we wanted to do is to annoy them with lawyers, copyright and whatnot. Let them have some stats on their site. We even received evidence sent to us anonymously (which is kinda crazy, but nice) that our statistics have been used in presentations not available to the public. Domain broker created selling points with our data, our charts and figures. And thus, logically, made money. But even that was okay. It is how the world works after all and we can’t change that. But we wouldn’t want to limit access for so many people just because of a handful of others who aren’t citing nTLDStats as their source. We accepted it, because it would cost so much time to actually dig through the internet on a regular basis to find people who make use of our data without naming us as the source.

And then, a few days ago, I stumbled across a slide as part of a presentation. It contained our chart and there was no backlink, no mention of nTLDStats, nothing. But the slide has been shared eight times, had 282 views, six downloads and has been used on other websites 61 times. Now those numbers don’t seem to be that high, or are they? You tell me, but I think that it’s pretty much, given that the domain-, or specifically the new gTLD industry, isn’t so big (people-wise, not money-wise!) compared to others.

Finding that slide angered me. While for all the past months I mostly shrugged off reports of people using our stuff, this bothered me a lot. Not only because there was no source given, not only because people would think that the creator of the slide actually had that knowledge that we at nTLDStats worked our asses off for, spending so much time optimizing data, fiddling with our crazy large databases for hours, days and sometimes weeks. What bothered me was that an actual big company did it. A big company comes across, picks up your stuff and displays it as their own. Really now?

Slide from Heart Internet

Slide from Heart Internet

 

Easyspace

Easyspace

I wanted to know: Are there other companies doing this? Well, yes there are. And the most interesting part is that they apparently did it shortly after nTLDStats launched. Our research showed no copyright infringement in 2015. Wherever we found our material, it has been taken in 2014 already, back when we just launched. One could think that the content has been taken because 2014-nTLDStats haven’t had any reputation back thenand no one would care anyways.

I do believe that writers can overlook that whole copyright thing. Or shrug it off maybe. Why bother contacting some small website for my article in the name of some big company? I get that, I really do. I also believe that most requests from one big company to another to use their content is just to avoid bad press and lawyers shouting at each other. It’s not happening because big company A is respecting companies B copyright so much that they humbly ask to use it. Big companies are not an open source community. I mean, it’s capitalism all over the place, right?

Now don’t worry, there won’t be any lawyers. We decided against it, because believe it or not: We are nice people. But just so you know: In this case, our lawyers would mop the floor with your lawyers.

 

 

Nevertheless we are giving you the opportunity to make things right:

  1. Either remove our content or add us as the source of the material in question in accordance to our Linking and Content Usage Policy

  2. Our PayPal address is payment@greensec.de – feel free to send us any amount that you see fit as an adequate compensation for your infringement to our copyright. What would it be worth to your company? What would someone who violated your copyright for a year would have to pay (apart from your lawyers’ fees)? If you don’t have Paypal, drop us a mail at mail [that a sign with the circles around it] ntldstats.com and we give you our bank details.

mCloud

mCloud

Nominalia

Nominalia

Nominalia

Nominalia

Total Publishing Network

Total Publishing Network

Looking at spam

Opening a blog post with a meme – check. Fancy title – check. While you might think that this blog post is a joke, I have to tell you that it might be one of the most serious ones until now. Not “>>Houston, we have a problem<< and everyone already knows that he’s not gonna make it back”-serious, but more like “You still haven’t done the dishes?“-serious.

Whenever I scroll through seemingly endless lists of domain names, one particular type of them catches my attention. Let me give you a few examples:

8c5968b9kspxkwd.club
fxdee.science
jb1bd.webcam
yposx.party
tsy809.science
vokvbn.science
7kks.science
tzjv.party
j3dv.science
bl9nl.party
iij3e.xyz
qetyyu.xyz
16872.xyz
wshod.party
sus430.xyz
c8356el.xyz
avvb.science
mskcx.space
vxur.webcam
wq6e.party
h9jb.webcam
y8g1x.science
520v1v484.top
seosepll.xyz
papuuod.xyz
ggzcx.party
nx5l1f.science
v29qkt30dxh5.link

Of course you could just shrug it off, but that would be boring. Instead, I added another item to our already infinitely long to-do list: “Write blog posts about spam TLD stuff!

Why would you register those domains?, the naive internet user asked. Well, that question is partially answered by our categorization of those domains: “Spam” and “Filler”. So Spam-TLDs are used to create spam websites while “Filler”-TLDs, well, apparently to fill up the pool of domains for that particular TLD.

 

Spam

Most of the domains timed out like one would expect. When we started to check those still responding, we quickly realized that an overwhelming part has the same content: Scripted websites with automatically generated links – in chinese. Those websites mostly link to other spam-TLDs, but sometimes even include ccTLDs or old gTLDs. The purpose? Sale spam. The websites are basically a big list of links to products. Interestingly, you can’t order any of the advertised products online – you have to call a number (can you imagine? 2015?). And since I only speak the rare Chinese Huizhou dialect from southern Anhui (think of it as the chinese “Texas English” counterpart), I couldn’t get more information about the order process.

Back to topic: Another set of spam domains shows a page full of products as well. Regardless of which link you follow, you will be redirected to a modified version of the chinese news agency CNNB.com.cn, even though you are accessing the CNNB domain and servers, the articles have modified names (“European Casino”, etc.).

One could argue about how successful all of this is, but we already know that people willingly sent money to the nigerian government/lottery/whatever with the promise of receiving 2 million dollars in return. So, uhm, yeah. Also, it’s Asia. While checking those domains, I stumbled across the website of a japanese dentist. It was full of pictures of him and his staff. Doing victory signs. With their patients. During their dental treatment. And that’s why I absolutely don’t know whether this kind of marketing-spam works in Asia or not, because I don’t understand  the people living there (haha, unintentional pun).

 

Filler

While it is easy to make sense of spam domains, I feel that “Filler”-domains are a more complex matter. Here is a list of some of them:

aabo.wang
aabp.wang
aabq.wang
aabr.wang
aabs.wang
aabt.wang
aabu.wang
aabv.wang
aabw.wang
aabx.wang
aaby.wang
aabz.wang
aaca.wang
aacb.wang
aacd.wang

I cut it down to 15, because this would go on for another few hundred entries. Here is another excerpt:

lalala001.xyz
lalala002.xyz
lalala003.xyz
lalala004.xyz
lalala005.xyz
lalala006.xyz
lalala007.xyz
lalala008.xyz
lalala009.xyz
lalala010.xyz
lalala011.xyz
lalala012.xyz
lalala013.xyz
lalala014.xyz
lalala015.xyz

Again, let’s start with the simple question: Why would you register those domains? I am unable to answer that question. All domains time-out and all of them are whois-protected. I could dig deeper, but that would go beyond the scope of this blog post. Although I am pretty sure that, while looking at the numbers, those domains aren’t just failed attempts of whatever, but tough business strategy.

My thoughts travel in a lot of different directions here.

  • What would be the plan with these kind of domains?
  • Would you use it for spam?
  • If so, is there some difference in using randomized names and continuous names or do I just get overly excited?
  • Who profits from those registrations short-, mid- and long-term?
  • Why are none of those domains accessible?
  • Why aren’t they being renewed?
  • Why are they whois-protected?

Unfortunately, I do plan on leaving you alone with those questions, because I fear that doing a follow-up on all of those questions would risk the impartial status that we at nTLDstats are so proud of. But before you start your journey into the dangerous world of consipiracy theories, take two more charts from me. Yes, they may answer some questions – but might as well raise just as much new ones. *flashlight face vanishes in the dark*

(If you have input on this, feel free to mail me at [this authors name] at [this websites name] dot [this websites TLD] (ha! take this, e-mail-address crawling spam bots!))

Classification of spam / non-spam TLDs by registrar

Classification of spam / non-spam by TLD

Classification of spam / non-spam by registrar

Classification of spam / non-spam by registrar

 

WHOIS validation, anyone?

We did it. Mark your calendars, since today is the day in which we are releasing our WHOIS validation tool. The super-serious description would be Registration Data Directory Service Specifications Validator (add “9000” for extra tension while saying it) – or you can just call it Brian. Why Brian? Because I feel like Brian suits the secret purpose of why we are releasing this tool. But I’ll talk about that later in this post.

Let me explain its official purpose first:

The first part is the validator itself. You can now check whether the WHOIS output of a specific registry complies with ICANNs Registration Data Directory Service Specifications set in various Registry Agreements. 

The second part is an interactive rules-guide for the abovementioned specifications. When (formatting) errors are found, our tool points them out and lets you read up on it in detail.

 

That’s it. That is its purpose. Nothing more. That is also the boring part, because now I am gonna explain to you the secret purpose of our new tool. And please don’t tell anyone, because – you know – its secret.

Imagine you handle about 100.000 (sometimes five times more, other times a fifth of that) database records per day. You process them automatically, because if not you’d need about as many employees as Walmart has. We obviously have the money, but we just don’t feel like hiring so many people. Which leaves us with the only thing that makes sense: Process insane amounts of data automatically.

Which of course is what we are doing. We programmed a nice piece of software that validates everything. And by everything I mean everything. It goes through the nTLDStats database and checks for bugs, burglars and insurance agents. A few minutes after we ran it for the first time, it died. The reason was an unexpected output format provided by a registry. So we added an exception for that and ran it a second time. Again, it died shortly after we started it. Same reason. So we added another exception. It went on and on like this for a while – basically forever. At this point, I am not even sure whether you can even remotely comprehend the frustration we felt in the office. Lets just say some tables, keyboard, coffee cups as well as one USB fan had to be replaced. But we are okay now.

Error Distribution

Only 16 TLDs manage to deliver a WHOIS output following ICANNs specifications.

We now have Brian.

Brian will display which Registry does not comply with their Registration Data Directory Service Specifications set in a Registry Agreement for their own gTLD. Brian will also show you a graph, detailing which registries WHOIS output has errors and how many there are. We will even send E-Mails, manually(!), to those registries with the most error count in their WHOIS output to make them feel bad!

 

Fear Brian!

Runs away laughing maniacally

P.S.: On a more serious note: The ICANN is planning to do what we’re doing starting 2016, but with real consequences for the particular registry. So love us or hate us, we’re doing you a favour.

 

 

=> See the new nTLDStats.com WHOIS Validation Tool and be sure to check out the interactive rules-guide as well.