While checking the Whois of my domains, a whois for this blog revealed a link from AboutUs.org.
I have mixed feelings about this one.
Here is why:
It seems rather good to have your website on a website whos popularity is rising sharply (the last week Alexa rank is around 4000)
However, doesn’t it also appear to be a breach of privacy, copyrights and more?
Text from you site is scraped by the bot and displayed on the wiki.
Text from my about page has been copied as is. If the text on your site is copyrighted (which is the case of many blogs today), then AboutUs.org directly infringes your copyright!
Your contact information is made public. My email address was publically visible (via a graphic). I also noticed that the entire contact information, including my phone number was visible for another of my sites. They have a page on how to protect your email address, but why does it get listed in the first place?
While many may debate that all this information was anyway available via a simple whois, I am of the opinion that I have willingly submitted my information while registering my domain. I have not submitted the information to AboutUs.org and hence it is not correct for them to display this information publically.
A private domain registration can protect your information, but it comes at an additional cost.
Freely Edittable Information:
AboutUs is a wiki. As a result anybody can edit / delete the information of your website. This means that you will have to continuously track your website details to ensure that nobody has put any false data about your website.
What is even worse is that you can edit the information without even signing up! :O
That means you can use a good proxy server, edit an article and not bother about being tracked for any incorrect changes!
Ray King’s (the man behind AboutUs) User Talk already has a few angry comments, and I am sure these will just increase with time.
What to do about it?
As of now, there isn’t any way to delete your website from the listing.
So, the best thing to do is create an account, login and edit the information of the pages. Remove all contact information that you don’t like.
I also suggest adding your other websites, which aren’t already listed and editting their information.
This will ensure that your contact information doesn’t suddenly turn up because the bot, or someone else decides to add your site.
Do tell me what are your opinions on this issue? Do you agree with the concept of AboutUs.org? Or are you against it? Or do you think we should just not bother?
Update 1: View Ray’s Reply
Update 2: You can now use robots.txt to block the bot (Thanks Daniel)
Add the following code to your robots.txt files. Read more about the AboutUsBot.
Update 3: Rays efforts are being appreciated. View Tazzy and Piggy’s Comments.
I had no idea about this wiki site, it would be nice if they asked first. I use a contact form, but I’m sure alot of people use text for displaying their email address, and webbots are going to pick it up.
i think the are scraping the data from alexa and from domaintools… but hey… my sites are small and they had everyone listed there.
[…] More info: http://ajaydsouza.com/archives/2006/08/25/aboutusorg-is-it-ethical/ […]
they’ll also swipe your logo if you have an image titled as such.
the best part is the notice on the editting page warning you not to contribute copyrighted information.
I stumbled upon it by chance. Was a bit happy at first but then realized the implications.
@adam, I too was quite amused by that. Talk about no copyright infringement, and the site does just that!
Well, it’s a wiki, and editable as such. I just went and deleted all of the information about my domain.
I just ran across a reference to this site a few minutes ago, and searched for my domain. Sure enough, there I was. Their bot didn’t manage to glean much info from my site, though: there’s a mail link on my front page which is generated with an anti-spam-tastic script, and AboutUs didn’t manage to winkle that out. Happily, I don’t keep my blog in the root directory of my domain, so they’ve missed the bulk of my site altogether.
Two strange things about the listing – it capitalised the P in nexistepas, which I have never done, anywhere.
But the oddest thing was the list of sites that allegedly link to my domain. Out of the eight they listed (considerably fewer than actually do link to my blog), I’d never heard of three of them, and when I checked those out, they had no such link. Eh well.
And yes, the dire warning against posting copyright info is pretty funny, considering.
As to the ethics of the site, I’m not so sure I can get too excited about it: WHOIS data is in the public domain, is it not? If someone wants to root out my identity, it can be done more effectively than by visiting AboutUs…
AboutUs.org telling you not to publish copyrighted work made me shake my head. I’ve deleted everything on my domain’s pages and only left a comment saying that I never gave permission to them copying my content.
I think it’s okay for a search engine or sites like Technorati to display parts of my content because other user cannot edit it. But on a wiki there is no easy way to tell whether the information is correct or not (at least for an average user who doesn’t fully understand the concept – and problems – of a wiki).
Yeah. Unhappy about this. The “Related Domains” list must look for sites linked to from the site being profiled. Mine showed up as “related” on some that I am certainly not related to and would rather not be grouped with. This should be an opt-in thing so people who want to can allow their “bot” to gather information.
I just checked and one of my sites listed showed copyrighted material, a screenshot, my home address (old address, I’ve moved since) and my cell phone number.
I’m not sure where its pulling that information from except maybe an old whois cache, but thats been private since almost a year ago. Then again, the screenshot was also at least 9+ months old.
I didn’t need to log in to delete the information. I just hit edit and removed the full contents of the article. I’ll have to keep an eye on it to see if its reverted or re-added.
I do feel a bit more vulnerable now that I saw that though… almost like those people that found out AOL released their search logs.
I ABoutus.org will argue that just like Phone Numbers, your contact info is not copyrightable. There is that famous yellow pages book case. Regardless, I don’t see the benfit of having my details on Aboutus. Why would I want to be listed?
I’m not a fan. When I searched my site, About Us had not scraped it yet, so all I saw were the related links, which I don’t have a problem with. I absolutely have a problem with them infringing copyrights. If they do infringe, I will consider going after them.
I just visited, they had copied a rather large part of one of one of my webpages without authorship information or attribution. This is much different than simply listing my domain information.
The problem, Craig, is that it’s a wiki and as such has a restorable history. In fact, your domain’s information is still there on the site and is fairly easily discoverable, even if nobody ever bothers to revert the page back to the original version.
@pilgrim, The bot seems to have some kind of capitalization issue. There is a solution. And I went it and editted a good list of sites. I don’t even know where that related list came from, though some of it was from Alexa.
I have asked Ray King to comment on this issue. While he has been prompt to reply via email, I do hope he replies to this post.
Well sorry to say, but once you registered a domain, your personal information such as email and phone number is already made public (that’s if you filled it correctly and not using anykind of identity guard service). People can easily go to a whois service and found out a way to contact you (or expose you if that matter). No big deal.
As everyone else, I found my site on there and did not appreciate it. I mean, I do publish stuff online for people to view and normally don’t care if it’s archived or not. For some reason, and I’m unsure why, I really didn’t like this at all.
I appreciate the feedback from you and your community. The intention of AboutUs is to be a valuable and free resource that allows users to share their thoughts and knowledge of various websites. That said, your concerns are well expressed and here are my comments:
Whois data is readily and publicly available from many sources, as is historical whois data, which often pre-dates a domain being switched to \”private\”. I am not aware of any whois data service that first asks the registrant for permission to provide it — not that this wouldn’t be good, but it seems impractical. We list contact information because it is valuable to users wanting to contact website owners. The information can be changed simply by hitting the edit button.
Search engines also scrape some content from sites in order to give their readers a sense of what is on the site so the reader can decide if it’s worth going to the site. AboutUs does the same thing. The goal is to present some information as a \”stub\” so that readers to easily recognize the site. It can be edited from there.
On capitalization, we algorithmically try to figure out where the word breaks are and unfortunately it is not always accurate. So my apologies for that and thanks for helping to correct some of these errors.
AboutUs is a wiki, and as such, its content is openly editable. Some site owners may feel the desire to check AboutUs to see if anyone else has modified the page referring to their site or commented positively or negatively — just as they might want to know if someone made a blog or del.icio.us entry about their site. Watching for commentary on one’s own site is par for the course if you publish a website in the first place. To make this easier however, it is our plan to add the ability to get an e-mail alert for any pages users put on their watch list. This will be coming as soon as we are able.
I am a big believer in wikis and their power to create collaborative works which would be impossible to do with fewer people — yet they do pose some new problems to think about. We will be watching the changes made on our site carefully to keep the development of the AboutUs moving in a positive direction.
I will also do my best to respond to continued thoughts. Thanks – Ray.
[…] There’s a lot of buzz today about AboutUs.org and whether or not their practice of posting the whois information of peoples sites to their wiki is ethical. […]
How conveinent that they managed to suck the content off pages of two of my sites, but not the very explicit copyright notices at the bottom of them.
Hilarious that when I went in to edit the pages about my site, there’s a big message about not posting copyrighted content.
I’m just horrified that this site even exists.
Thank you for your comments Ray.
Really appreciated. I guess AboutUs does have a long way to go and a lot of queries to be answered as it gets there.
I went and had a look, and my site wasn’t on there. So I decided to see how the automated system worked.
On the front page, I typed my domain name into the \”add missing domain\” box, and told it to create the page. It took a second and did so, and then let me have a look see.
The resulting page is rather bereft of information. It has a screen shot and some unusual text for a description, but it’s all more or less correct. Funnily enough, I recognize where all this information came from.
Near as I can figure, it basically scraped Alexa and possibly Technorati. The screen shot of my page came from somewhere I do not know. Alexa’s screen shot of my page is different, and the screen shot it found is out of date, showing some older material. But the description is straight from Alexa, as is the email address and name and such.
Looking at my server logs, it doesn’t appear to have actually hit my site at all, although I need to check the apache logs directly to be certain of that.
they are using the new alexa web search thingy where they give you access to all their info for a small fee. if you look up your site on alexa.com and you compare the info to whats listed on aboutus.org you will notice that its the same.
what is your comment on the problem summarized (again, not the first time this came up) by Lisa regarding copying our content, leaving copyright notices out and in some kind of wired irony saying on the edit page not to use copyrighted content?
You said search engines also use the content, sure, but they don’t enable anybody to change this content!
[…] Flowing up on what I wrote about the AboutUs.org uproar yesterday, I noticed that Ray King (owner of aboutus.org) left a rather lengthy reply to on Ajay D?Souza’s blog. […]
My biggest concern is that even if I edit away my copyrighted content, it’s still there in the History section.
AboutUs should be \”opt in\” (in the same way that many sites like Technorati won’t index your site unless you \”ping\” them) and should not editable by all and sundry. Plus it needs to be possible to purge your history, so that copyrighted material is actually removed.
It would be unethical, in my opinion, if it started listing information not publicly available about you and your website. This is very much possible since it allows people to edit any website’s information.
Scenario: John B. Average becomes miffed with Jane B. Average (his competition) and decides to post all of her private information on the site. Jane B. Average has no idea this site exists, but starts wondering why strange people are e-mailing her, calling her, or even driving by her home. See where I’m going with this?
As is stands now, I don’t believe AboutUs.org is unethical, but I don’t care for it.
It’s not a registrar or whois look-up service; the owner should give website owners the ability to opt-out of appearing in the list. Google, Yahoo!, MSN, and even The Internet Archives give webmasters a way to opt-out of their directories, and so should AboutUs.org.
My 2 cents.
I’d like a way to just specify sections to not be looked at by AboutUs ‘ala the robots.txt file or something similar
At last: there is now some clarification on how the bot behaves at http://www.aboutus.org/AboutUsBot
It’s sad I had to create the following robots.txt for just this bot.
[…] But I digress, some other bloggers (notably, AJay, who posted the original write up that brought my attention to things, now has a nice little reply from Ray King – the guy responsible for AboutUs, and Paul O’Flaherty have been lending their voices to the debate) […]
My gawd! Talk about being horrified after readin this post!
Like others, I checked my own entry and was disgusted that such info was publicly available.
I emailed Ray and surprisingly got a reply within minutes stating that he’s drafting a new policy ‘within the day’.
The real danger I think some have overlooked is that the Wiki’s search function allows the entry of an individuals NAME and then lists info relating to the domains they own. Pretty dangerous, i would have thought, when it comes to some people (such as battered wives who may be blogging about stuff they’d rather a husband didn’t know, for example).
I’m absolutely appalled by this site.
Thanks for alerting us all to it’s existence. I’ve also blogged about it, albeit less eloquently than yourself, advising my readers to check and, if required, edit any information already there.
Ok, I appreciate the very candid commentary and have take some steps:
1) There is now a way to get rid if the history and recreate a shorter article that doesn’t contain descriptive content from the site or contact information. The shorter article will prevent the page from being recreated by the bot if someone tries to add it again. Unfortunately, I don’t yet have an automated way to authenticate domain owners, so I ask that anyone who wants this done send me an e-mail ([email protected]) from the whois address listed (or send some other evidence of domain ownership) and I’ll take care of them manually for now.
2) Also, the bot will now take much less descriptive information from the sites it looks at and label that text as \”Excerpted from the website description\” and point back to the site for more information. For examples of this, look at the \”Sites recently added\” link from the main page.
3) The bot now also respects robots.txt and there is an article on the site that describes several ways to make that happen. There will be examples of that in \”Sites recently added\” also.
I know this doesn’t answer all of the questions posed here, but I will continue to listen and adjust as best I can. – Ray.
Hi Piggy and Tazzy, I didn’t really mean to scare anybody, just to make them aware and to find out what others thing about it.
Got your pingback from your post. See it below 🙂
It is great to see all the efforts you are putting in to address our issues.
Justifies the green I spend on private domain registration:)
I actually like the idea. I see the problems it can cause but I do feel the best intentions are meant. I also think that this could happen to anyone of us at anytime. I see postings fairly regularly about ideas and content being ‘nicked’.
But I think most of all I like the idea of bringing the sites I manage under one roof so to speak in this sort of concept.
Yes I have doubts but I wish Ray all the best with the site idea and with sorting out some of the issues people might have.
Thanks for listening.
Maybe you totally missed reading the copyright statement on my website:
What you have done is illegal. It’s called copyright infringement. And your wiki is of no use other than your own private purposes. I believe in wikis, such as wikipedia, but they do not scrape material off the web with a robot. And they do not use the site to sell advertising.
You have a number of my sites on the AboutUs website. Remove them. I have no problem getting lawyers to work for me, pro bono, because of the international rare cancer agency I run.
I would suggest that you hire a fleet of lawyers to defend yourself ungainst copyright infringement. You’ll need them!
I think it best to write to Ray directly from the AboutUS.org site. I don’t think he will be reading the comments out here, especially after so many months.
You can also edit your site information in the Wiki directly.
I have read some of the posts on aboutus.org and I agree with all that I read but, the one comment I hoped to read I did not see, and that is: apparently their bot extracts all or many of the pages from your site and places them in link pages with 1,000 + other links making it near impossible to find the link, but that link page is picked up by google (maybe others but haven’t found them) so that when you do a search on google by site name it brings up all the link pages on aboutus.org with a link to your site, what this means is it can double the number of listings in search for your site with half of the links going to aboutus link pages and nowere near your site unless people take the time to read the address after the linking. Example: I have a page on one of my sites named Animals and there is a link listed for Animals-about us going to their site link page(s) for those having 100 pages on their site in time they will have 200 pages and one half of them going to aboutus.org I have contacted google and waiting for a response as I consider this whole thing a fraud and a damage to every website listed in search’s. I would hope google has enough money that they do not stoop to that level to turn people away from the legitamite sites on the web, if not my intent is to contact the prosecuters in their home states as I consider this fraud and we will see if I am right.
Thanks for allowing me to add my 2 cents worth, also check you searchs in google to see if you have been victimized by this
I only recently found my sites were included on this wiki, when I saw the aboutus bot has visited my site. Once I was on the site, there was no way to remove my listing. This is infuriating. Not only was my private information listed, but I am now forced to participate in a wiki that I have no wish to participate in.
I can remove information, but I can’t remove my listing. Anyone at any time can post erroneous, flaming or spammy information on my listing, therefore I must be constantly vigilant and participate in this site.
I have no problem with the idea of a wiki for websites, but it should be opt in. I don’t want to wake up one day and find my family’s address and phone number listed on some wiki! And aboutus should give us a way to really opt out, beyond merely removing information under our listing. We should be able to remove our own listing altogether.
Does anyone know if there is any to take action against Ray? So far my inquiries and comments to Ray have been answered by yes-men who assure me in a condescending tone that I ‘just don’t yet see the value of the site’. Oh, I see the value of the site alright. I see just how valuable it is for ole Ray to exploit all the hard work we put into our sites.
I posted a number of ideas on how to fight back at
so I wont repost them here, but would like to explore what we can do. If there is no laws against it maybe we need to talk to the people that make the laws, as this in no way seems right to me
This business of ‘scraping’ (aka STEALING) other people’s web content is really a low-down scurvy way to try to capitalize on other people’s creativity because the scrapers lack any creativity themselves.
I suggest everyone who is outraged by this do the following:
1. File complaint with Attorney General for your state and for California or wherever those aboutus.org jerks are.
2. POST a BIG NOTICE on the aboutus.org site indicating that the ABOUTUS.ORG PEOPLE HAVE VIOLATED YOUR COPY RIGHTS & PRIVACY BY LIFTING YOUR WEB CONTENTS AND WHOIS INFO AND PUBLICLY PUBLISHING THEM, WHICH IS ILLEGAL.
3. FILE a legal complaint and hopefully Class Action Lawsuits will soon be gathering on the horizons to put a stop to this BS!
PS The jerks at Alexa and that so-called Biblo thing in Egypt are even worse, because they cough up OLD OLD CONTENT that you have deleted from your web projects and they still make it accessible –which could cause serious privacy problems and even lead to dangerous situations for folks who are being stalked and trying to fly under the radar.
I have a robot.txt file which was totally ignored. They still intruded upon my website and published it. Looks like I have to use a specific text in my robot.txt file, seems aboutus think that \”go away\” commands do not apply to them.
Well, it seems a controversy but what I’m more concerned is my website screen shot is not being updated. Anyway, it’s a good place to have our websites exposed to the world.
Be positive about this free exposure!
I must say I was taken back at how well it scraped data from my sites when I submitted them. The default listing, dare I say it, actually looked quite good.
While agreeing with all of what you say above, I don’t consider it much different to other search engines which scrape your content and keep a copy for themselves.
So long as they obey the robots.txt standard, I don’t see the problem.
Please, don’t get into a flap, just use their robots.txt correct commands and all will be fine.
Too many lawyers and too few crackerjacks in this world.
If you don’t want something crawled just don’t publish it, use an intranet or implement an authentication system.
Comments are closed.