Last Saturday, about an hour after this blog was sent out to readers, I received two strangely similar emails. The first was from CrunchBase and came into my inbox at 9:11am. The subject line read, “CrunchBase Needs What You Know.” The second email coming in only 25 minutes later at 9:36am was from CB Insights and carried the subject line, “top 10 most popular research briefs this week.” Here’s the copy that kicked off each message: From CrunchBase: “In case you didn’t know, you can add information and create profiles for people, startups, companies, investment firms, products, events and more. Crunchbase relies on people like you — more than 90,000 people contributed to CrunchBase in the past year — to keep the crowd-sourced Crunchbase dataset up-to-date.” From CB Insights: “VC firms can now edit their portfolio company data & board data with The Editor.” Both CrunchBase and CB Insights are well-respected financial database companies who know a thing or two about the value of information. And while CrunchBase relies much more heavily on crowdsourced content from users, both companies proactively gather content from the trove of unstructured data floating out on the Internet, compile it in their own databases, and make it useful. This is a kinder way of saying they “scrape the web.” (BTW, I’m not implying what they do isn’t kosher.) It’s pretty clear from their emails that they’d like to augment their web scraping efforts and enlist you to help them. Why? Well, it makes for a better, more valuable, and more defensible business if you happen to be in the business of information (like I am). It’s with this in mind that I decided to put together what I’m calling “The Hierarchy of Data Crappiness.” I have to give credit to Boris Wertz who got me thinking about this concept several weeks ago when I read his blog post on how companies can define their data strategy.
Continue Reading : Your data, my data, our data