Accel Makes Big Commitment To Big Data With $100M Fund - Venture Capital Dispatch - WSJ

By Deborah Gage

Accel Partners has set aside $100 million to invest in start-ups that are trying to harness the power of big data, the firm will announce at the Hadoop World conference in New York this morning, aiming to consolidate its position as one of the earliest investors in these companies as the amount of data generated by companies and government agencies continues to grow.

The money for the fund, called Accel Big Data Fund, is coming from several Accel funds that are already raised, including Accel XI, which closed in June at $475 million; Accel Growth Fund II, which closed in June at $875 million; and the firm’s global funds, according to Accel Partner Ping Li.

Li will be an investor for the fund along with Accel Partners Rich Wong, Kevin Efrusy and Andrew Braccia in the U.S.; Bruce Golden in London; Subrata Mitra in India and others.

“Consumer companies (like Facebook) are very visible and we’re happy to be in those, but there’s a real undercurrent of picks-and-shovels innovation and how you harness all the data that’s being generated,” Li said. “We’ve all read articles about these new types of data that are unstructured and breaking the traditional data platforms…We’ve all grown to love them at Oracle, but they’re not built for the big data world.”

Li said the idea for the new fund came because some of Accel’s consumer portfolio companies—including Facebook, Rovio Entertainment (which makes Angry Birds) and Dropbox—generate tons of data, and the partners were able to see the importance of extracting, sorting and managing all the data and understanding who the users are.

The firm will partner with one of its current portfolio companies, Cloudera, which is commercializing the big data analysis engine Hadoop, to understand what business customers are doing with big data and where the technology is going. Accel took part in a $40 million Series D round for Cloudera, announced Monday.

Accel is also planning a big data conference in Silicon Valley and has named several advisers for the big data fund to help guide Accel through what could be “uncharted territory,” Li said. These include Cloudera Architect Doug Cutting, Cloudera Chief Scientist Jeff Hammerbacher, Bitly Chief Scientist Hilary Mason and former SolarWinds Chief Product Strategist Kenny Van Zant (currently at Asana).“You could drown in this big data wave as well,” he said.

Investments will focus on infrastructure, storage, security and enterprise applications, all the way up to business intelligence, mobile apps, financial trading apps and more. “I think we’ll look back 10 years from now and see great companies…that are just as important as the companies created 10 years ago,” Li said.

The critical role data quality plays in the ROI of a data warehouse

http://ibmdatamag.com/2012/11/get-the-most-out-of-your-data-warehouse/

Only a decade ago, 10 million records would have been considered a large volume of data. Today, the amount of data stored by enterprises is often in the petabyte or even exabyte range. This explosion is not limited to structured data—in fact, most of the added volume comes from unstructured sources, such as email, images, and documents. Companies use data warehouses to manage these large data volumes, providing users with fast access to important information to help them gain insight and drive innovation. But a data warehouse’s value depends on the completeness, accuracy, timeliness, and understanding of the data that is put into it.

Data with abundant  errors, excessive duplication, too many missing values, or conflicting definitions leads to cost overruns, missed deadlines, and most important, users who do not trust the information they are provided. According to a report from Ventana Research, only 3 in 10 organizations view their data as always reliable. More than two-thirds (69 percent) of organizations spend more time preparing data for use than actually using it. When an organization doesn’t trust the data in its warehouse, different parts of the company may act independently and create their own projects to get the information they need—diminishing the value and return on investment (ROI) of the warehouse. According to a Forbes study cited by Bloor Research, “data-related problems cost the majority of companies more than $5 million annually. One-fifth estimate losses in excess of $20 million per year.”

 

The importance of data quality

An organization can have hundreds or even thousands of different systems. Information can come from numerous places—such as transactions, document repositories, and external information sources—and in many formats, including structured data, unstructured content, and streaming data.

An organization must be able to manage its supply chain of information, and then integrate and analyze it to make business decisions (see Figure 1). Unlike a traditional supply chain, an information supply chain has a many-to-many relationship. For example, data about the same person can come from multiple places as the person may be a customer, an employee, and a partner—and that information can end up in various reports and applications. Given this complexity, integrating information, ensuring its quality, and interpreting it correctly are crucial tasks that enable organizations to use the information for making effective business decisions. The underlying systems must be cost-effective and easy to maintain, and they must perform well for the workloads they need to handle, even as information continues to grow at exponential rates.

Figure 1: The information supply chain.

 

Essential data quality capabilities for a data warehouse

The success of a data warehouse hinges upon robust data quality. Organizations realize the greatest value when they can leverage data quality software that provides end-to-end data quality capabilities that enables them to act on their data in the following ways.

 

Defining a common business language

Having a common business language is critical for aligning technology with business goals. In addition to a controlled vocabulary, the hierarchy and classification systems provide needed business context.

 

Understanding data and their relationships

For most organizations, data discovery is a manual, error-prone process requiring months of human involvement to discover business objects, sensitive data, cross-source data relationships, and transformation logic. Organizations need an automated data discovery process that addresses single-source profiling, analysis of cross-source data overlap, discovery of matching keys, automated transformation, and prototyping and testing for data consolidation.

 

Analyzing and monitoring data quality

The risk of proliferating incorrect or inaccurate data can be reduced by using rules-driven analysis. Rules analysis is a key data assessment capability that extends the ability to compare, evaluate, analyze, and monitor expected data quality. It consists of rules that evaluate data through focused and targeted testing of that data against user-defined conditions.

 

Cleansing, standardizing, and matching data

Organizations need to create and maintain an accurate view of master data entities, such as customers, vendors, locations, and products. A complete data cleansing solution includes data standardization, record matching, data enrichment, and record survivorship.

 

Maintaining data lineage

A centralized and holistic view across the entire landscape of data quality processes, with visibility into data transformations that operate inside and outside of data quality and data integration systems, arms organizations with critical information that can lead to sound decisions.

 

A comprehensive data quality platform for data warehousing

For IBM customers, the IBM® InfoSphere® Information Server data quality suite is a fully integrated software platform that provides all of the capabilities outlined above. It facilitates the collaboration needed to develop and support a data warehouse, helping organizations to maximize their technology ROI.

Join the discussion! How have data quality issues impacted your data warehouse? We want to hear from you!

http://blogs.hbr.org/cs/2012/12/what_a_big-data_business_model.html?utm_sourc...

The rise of big data is an exciting — if in some cases scary — development for business. Together with the complementary technology forces of social, mobile, the cloud, and unified communications, big data brings countless new opportunities for learning about customers and their wants and needs. It also brings the potential for disruption, and realignment. Organizations that truly embrace big data can create new opportunities for strategic differentiation in this era of engagement. Those that don't fully engage, or that misunderstand the opportunities, can lose out.

There are a number of new business models emerging in the big data world. In my research, I see three main approaches standing out. The first focuses on using data to create differentiated offerings. The second involves brokering this information. The third is about building networks to deliver data where it's needed, when it's needed.

Differentiation creates new experiences. For a decade or so now, we've seen technology and data bring new levels of personalization and relevance. Google's AdSense delivers advertising that's actually related to what users are looking for. Online retailers are able to offer — via FedEx, UPS, and even the U.S. Postal Service — up to the minute tracking of where your packages are. Map services from Google, Microsoft, Yahoo!, and now Apple provide information linked to where you are.

Big data offers opportunities for many more service offerings that will improve customer satisfaction and provide contextual relevance. Imagine package tracking that allows you to change the delivery address as you head from home to office. Or map-based services that link your fuel supply to availability of fueling stations. If you were low on fuel and your car spoke to your maps app, you could not only find the nearest open gas stations within a 10-mile radius, but also receive the price per gallon. I'd personally pay a few dollars a month for a contextual service that delivers the peace of mind of never running out of fuel on the road.

Brokering augments the value of information. Companies such as Bloomberg, Experian, Dun & Bradstreet already sell raw information, provide benchmarking services, and deliver analysis and insights with structured data sources. In a big data world, though, these propriety systems may struggle to keep up. Opportunities will arise for new forms of information brokering and new types of brokers that address new unstructured, often open data sources such as social media, chat streams, and video. Organizations will mash up data to create new revenue streams.

The permutations of available data will explode, leading to sub-sub specialized streams that can tell you the number of left-handed Toyota drivers who drink four cups of coffee every day but are vegan and seek a car wash during their lunch break. New players will emerge to bring these insights together and repackage them to provide relevancy and context.

For example, retailers like Amazon could sell raw information on the hottest purchase categories. Additional data on weather patterns and payment volumes from other partners could help suppliers pinpoint demand signals even more closely. These new analysis and insight streams could be created and maintained by information brokers who could sort by age, location, interest, and other categories. With endless permutations, brokers' business models would align by industries, geographies, and user roles.

Delivery networks enable the monetization of data. To be truly valuable, all this information has to be delivered into the hands of those who can use it, when they can use it. Content creators — the information providers and brokers — will seek placement and distribution in as many ways as possible.

This means, first, ample opportunities for the arms dealers — the suppliers of the technologies that make all this gathering and exchange of data possible. It also suggests a role for new marketplaces that facilitate the spot trading of insight, and deal room services that allow for private information brokering.

The most intriguing opportunities, though, may be in the creation of delivery networks where information is aggregated, exchanged, and reconstituted into newer and cleaner insight streams. Similar to the cable TV model for content delivery, these delivery networks will be the essential funnel through which information-based offerings will find their markets and be monetized.

Few organizations will have the capital to create end-to-end content delivery networks that can go from cloud to devices. Today, Amazon, Apple, Bloomberg, Google, and Microsoft show such potential, as they own the distribution chain from cloud to device and some starter content. Telecom giants such as AT&T, Verizon, Comcast, and BT have an opportunity to also provide infrastructure, however, we haven't seen significant movement to move beyond voice and data services. Big data could be their opportunity.

Meanwhile, content creators — the information providers and brokers — will likely seek placement and distribution in as many delivery networks as possible. Content relevancy will emerge as a strategic competency in delivering offers in ad networks based on the context by role, relationship, product ownership, location, time, sentiment, and even intent. For example, large wireless carriers can map traffic flows down to the cell tower. Using this data, carriers could work with display advertisers to optimize advertising rates for the most popular routes on football game days based on digital foot traffic.

There are many possible paths to monetize the big data revolution ahead. What's crucial is to have an idea of which one you want to follow. Only by understanding which business model (or models) suits your organization best can you make smart decisions on how to build, partner, or acquire your way into the next wave.