Thursday, 20 June 2013

Ten rules for building clouds

Ten rules for building clouds

1. Aim for 100 percent automation provisioning: Part of the reason for installing a cloud is because you want to speed provisioning of new compute power. Putting in authorization check points slows down this process. Sure a cost center owner may need to confirm the cost of the new compute power, but that aside, everything should happen automatically wherever possible.
2. Aim for 100 percent automated testing of new/revised catalog entries: Cloud catalogs contain a list of types of compute power (Linux, Red Hat, Windows) and application add-ons (accounting software, analytics software) that users want. The IT function will have populated that catalog after exhaustive testing. But things change and that catalog will need to be kept up to date using automated testing techniques to handle new releases. That way the testing is consistent and less onerous. This helps to reduce the support costs and protects the enterprise. Automate the deployment of patches and fixes to the deployed systems in the cloud, too.
3. Reuse “Lego-like” building blocks using SOA concepts to build the cloud catalog: If you have more than one catalog entry that requires Windows 7 as the operating system, then try to have only one Windows 7 image in your catalog with constructed workflows that add the applications on top. That way you have a smaller number of components to manage and keep up to date, reducing your costs in the process.
4. Design your cloud to help transform your business: Cloud computing is about reducing costs and making things happen. So instead of waiting weeks – or months – to get new compute power installed, the wait is minutes or hours. That means users have far more power and control on how the power they need is accessed. Business users have another tool at their disposal and therefore the role of IT changes. How this is all implemented takes thought. Without it, cloud is just another IT project that has limited value. Form the cloud vision early and manage it.
5. Get cloud governance up and running early: The cloud vision – and the benefits it can realize – need to be owned by the organization. So governance needs to be in place early on in the development phase to ensure that the vision is true and achievable, and that changes in requirements or the solution are properly assessed and accepted. When the cloud is live, this governance should ensure that it is managed properly using measures in the form of Key Performance indicators (KPIs) and change control to keep the cloud true to a vision.
6. Do not automate manual processes: In the non-cloud world, there will be various processes with manual steps and authorizations required to provide new compute power. All of this takes time and money. In the cloud, none of these real-world constraints exist. So take the time to step back and really work out what is needed from a process point of view. The challenge is to have at most no more than one manual authorization step, for provisioning compute power. Make it as fast and as snappy as possible to provide a fantastic and responsive service to the business users
7. Only monitor, report and manage things that matter: Cloud governance processes will manage the cloud for the benefit of the organization. It will need information to do that, matched to the KPIs. But only measure the minimum to enable both governance and systems management. Do not put huge amounts of effort into measuring things that have no value in the management of the cloud.
8. The cloud is self documenting: with physical things in the non-Cloud world, documentation and records need to be kept of what is where, as well as what is connected to what. Most cloud management software provides a lot of reporting facilities which the cloud uses to effectively document itself. Therefore, there is little value in duplicating these features and spending lots of effort in keeping records outside of the cloud up to date. Let the cloud do it for you and use the power of the built in features as much as possible.
9. Clouds are used by business users who should be protected from technical detail: Business users are good at running the business and not that knowledgeable about IT. IT people are good at managing IT but not at managing the business. So set the cloud up to use common language rather than jargon. This is so that business users do not need to understand the technical detail of the cloud. This is particularly true of the cloud catalog where the entries for selection by business users need to be readily understandable.
10. Use out of the box features as much as possible: It is tempting to think that the cloud should provide some features you deem more desirable than anything else. But proceed with caution. Any add-ons or changes you make will reduce the ease of updating the cloud software when the vendor releases updates. Similarly a lot of effort – and expense – will be used to adapt the cloud which delays the return of investment and pushes that point further out. These extras mean retaining (potentially) expensive knowledge in the enterprise, at a cost. So use as many out of the box features as possible and resist the urge to tweak, extend and replace.

Tuesday, 18 June 2013

Accel Closes $100M For Big Data Fund 2 To Invest In The ‘Second Wave’ Of Big Data Startups

The tech industry has been buzzing about “big data” for years now. And according to venture capital firm Accel Partners, the excitement around the big data space is not set to die down any time soon — it’s just about to enter into a new phase.
Accel is announcing tonight that it has closed on $100 million for a new investment fund called Big Data Fund 2. The fund is the same size as Accel’s first big data focused fund, which launched with $100 million back in November 2011.
As part of the new fund, Accel is also adding QlikView CTO Anthony Deighton and Imperva CEOShlomo Kramer to its Big Data Fund Advisory Council, which Accel has said is meant to serve as a “guiding light” to help think through investments and track entrepreneurs doing interesting things in the space.
Despite the nearly identical name, Accel’s Big Data Fund 2 will mark a definite shift in focus from the firm’s first big data fund, partner Jake Flomenberg said in a phone call today. “Over the past few years, we’ve focused a tremendous amount of attention on what people like to call the ‘three Vs’ of big data: variety, volume and velocity,” he said. “We now believe there is a fourth V, which is end user value, and that hasn’t been addressed to the same extent,” and that is where Big Data Fund 2 will be focusing the bulk of its investment and attention.
Specifically, Accel believes that “last mile” for big data will be served largely by startups focused on data-driven software, or “DDS.” These startups have largely been made possible through the hardware and infrastructure technology innovations that defined big data’s first wave, Flomenberg says. In a prepared statement from Accel, Facebook engineering VP Jay Parikh, who also serves on Accel’s Big Data Advisory Council, explained it like this:
“The last mile of big data will be built by a new class of software applications that enable everyday users to get real value out of all the data being created. Today’s entrepreneurs are now able to innovate on top of a technology stack that has grown increasingly powerful in the last few years – enabling product and analytical experiences that are more personalized and more valuable than ever.”
One example Flombenberg pointed to as an example of a “fourth V” DDS startup is RelateIQ, the “next generation relationship manager” software startup which launched out of stealth last week with some $29 million in funding from Accel and others.
Accel’s existing portfolio of big data investments also includes Cloudera, Couchbase, Lookout, Nimble Storage, Opower, Prismatic, QlikView, Sumo Logic, and Trifacta.

Sunday, 9 June 2013

How the feds are using Silicon Valley data scientists to track you

How the feds are using Silicon Valley data scientists to track you
Tech companies like Facebook, Apple, and Google are not the only ones helping U.S. intelligence agencies track citizens.
For years, data scientists have been brought in to brief with the National Security Agency.
The NSA has a massive team of analysts and a huge wiretapping program called PRISM, but it is eager to take advantage of the newest “big data” and machine learning technologies, so it can more easily make sense of millions of phone calls, emails, and text messages.
The goal is to track suspicious activity and create a complex “alerts system” for acts of terrorism, said Sean Gourley, a data scientist and founder of Silicon Valley-based Quid, which provides big-data analysis services, mostly for government customers.
Some of the most innovative technology designed to cope with massive data streams has come out of Silicon Valley. For this reason, the CIA’s venture arm, In-Q-Tel, has an office on the Palo Alto’s famous Sand Hill Road. It makes strategic investments in “big data” startups, like Recorded Future, whose products may come in useful for various government agencies.
“The NSA is naturally interested in data mining; I know of data scientists in Silicon Valley who have helped them,” said Mike Driscoll, chief executive of Silicon Valley-based big data startup Metamarkets.
“They appeal to our sense of patriotism,” said Driscoll.
Driscoll was not surprised by today’s news exposing the government’s PRISM program, which caused a furor among civil liberties activists and the media. He referred to the Echelon Project, the NSA’s clandestine data mining project and spy program that we’ve known about for years, as a precedent.
To recap: The Washington Post reported today that tech companies are participating in a top secret data mining program for the FBI and NSA, dubbed PRISM. Since the news broke, the companies named in the report have almost universally issued statements to the press that they do not provide direct access to their servers.
However, the government is a third party. Facebook’s terms of service, for instance, state that it can share your information with third parties. The assumption most Facebook users make is that the wording refers to marketers or advertisers, not the government.
“We don’t mind little bits of manipulation, but we do mind if it’s on this scale,” said Gourley.
According to Gourley, who regularly works with federal agencies, the NSA is most interested in real-time systems for data analysis. It’s not just what you say — but who you know. In other words, you’ll be flagged if you’ve communicated with a person of interest, or if you share a suspicious tweet.
“The NSA is essentially looking for a needle in a massive, massive haystack,” he said.
Given that technology exists for sophisticated analysis of social networks, “you could be on the list by association,” he warns.

Thursday, 6 June 2013

VMware unveils vCloud Hybrid Service

VMware has revealed its VMware vCloud Hybrid Service, an infrastructure as a service (IaaS) platform.
“VMware’s mission is to radically simplify IT and help customers transform their IT operations,” said Pat Gelsinger, CEO of VMware.
“Today, with the introduction of the VMware vCloud Hybrid Service, we take a big step forward by coupling all the value of VMware virtualisation and software-defined data centre technologies with the speed and simplicity of a public cloud service that our customers desire.”
vCloud Hybrid Service will extend VMware software, currently being used by hundreds of thousands of customers, into the public cloud. This means customers will be able to utilise the same skills, tools, networking and security models across both on-premise and off-premise environments.
“As a source of competitive advantage for our international business, our operations and IT department needs the agility and efficiency the public cloud promises,” says Julio Sobral, senior VP of business operations at Fox International.
“However, we don’t have the luxury of starting from scratch; we see in the vCloud Hybrid Service a potential solution to enable Fox International to have a more elastic platform that will support future deployments around the world. Working with technology partners like VMware gives us the best of both worlds by extending our existing infrastructure to realise the benefits of public cloud.”
According to the company, the vCloud Hybrid Service will allow customers to extend their data centres to the cloud and will support thousands of applications and more than 90 operating systems that are certified to run on vSphere. This means customers can get the same level of availability and performance running in the public cloud, without changing or rewriting their applications.
Built on vSphere, vCloud Hybrid Service offers automated replication, monitoring and high availability for business-critical applications, leveraging the advanced features of vSphere, including VMware vMotion, High Availability and vSphere Distributed Resources Scheduler.
“Our new VMware vCloud Hybrid Service delivers a public cloud that is completely interoperable with existing VMware virtualised infrastructure,” said Chris Norton, regional director at VMware for southern Africa.
“By taking an ‘inside-out’ approach that will enable new and existing applications to run anywhere, this service will bridge the private and public cloud worlds without compromise.”
According to VMware, the vCloud Hybrid Service will be available this month through an early access programme.

Monday, 3 June 2013

The Real Reason Hadoop Is Such A Big Deal In Big Data

The Real Reason Hadoop Is Such A Big Deal In Big DataHadoop is the poster child for Big Data, so much so that the open source data platform has become practically synonymous with the wildly popular term for storing and analyzing huge sets of information.
While Hadoop is not the only Big Data game in town, the software has had a remarkable impact. But exactly why has Hadoop been such a major force in Big Data? What makes this software so damn special - and so important?
Sometimes the reasons behind something success can be staring you right in the face. For Hadoop, the biggest motivator in the market is simple: Before Hadoop, data storage was expensive. 
Hadoop, however, lets you store as much data as you want in whatever form you need, simply by adding more servers to a Hadoop cluster. Each new server (which can be commodity x86 machines with relatively small price tags) adds more storage and more processing power to the overall cluster. This makes data storage with Hadoop far less costly than prior methods of data storage.

Spendy Storage Created The Need For Hadoop

We're not talking about data storage in terms of archiving… that's just putting data onto tape. Companies need to store increasingly large amounts of data and be able to easily get to it for a wide variety of purposes. That kind of data storage was, in the days before Hadoop, pricey.
And, oh what data there is to store. Enterprises and smaller businesses are trying to track a slew of data sets: emails, search results, sales data, inventory data, customer data, click-throughs on websites… all of this and more is coming in faster than ever before, and trying to manage it all in a relational database management system (RDBMS) is a very expensive proposition.
Historically, organizations trying to manage costs would sample that data down to a smaller subset. This down-sampled data would automatically carry certain assumptions, number one being that some data is more important than other data. For example, a company depending on e-commerce data might prioritize its data on the (reasonable) assumption that credit card data is more important than product data, which in turn would be more important than click-through data.

Assumptions Can Change

That's fine if your business is based on a single set of assumptions. But what what happens if the assumptions change? Any new business scenarios would have to use the down-sampled data still in storage, the data retained based on the original assumptions. The raw data would be long gone, because it was too expensive to keep around. That's why it was down-sampled in the first place.
Expensive RDBMS-based storage also led to data being siloed within an organization. Sales had its data, marketing had its data, accounting had its own data and so on. Worse, each department may have down-sampled its data based on its own assumptions. That can make it very difficult (and misleading) to use the data for company-wide decisions.

Hadoop: Breaking Down The Silos

Hadoop's storage method uses a distributed filesystem that maps data wherever it sits in a cluster on Hadoop servers. The tools to process that data are also distributed, often located on the same servers where the data is housed, which makes for faster data processing.
Hadoop, then, allows companies to store data much more cheaply. How much more cheaply? In 2012, Rainstor estimated that running a 75-node, 300TB Hadoop cluster would cost $1.05 million over three years. In 2008, Oracle sold a database with a little over half the storage (168TB) for $2.33 million - and that's not including operating costs. Throw in the salary of an Oracle admin at around $95,000 per year, and you're talking an operational cost of $2.62 million over three years - 2.5 times the cost, for just over half of the storage capacity.
This kind of price savings mean Hadoop lets companies afford to hold all of their data, not just the down-sampled portions. Fixed assumptions don't need to be made in advance. All data becomes equal and equally available, so business scenarios can be run with raw data at any time as needed, without limitation or assumption. This is a very big deal, because if no data needs to be thrown away, any data model a company might want to try becomes fair game.
That scenario is the next step in Hadoop use, explained Doug Cutting, Chief Architect ofCloudera and an early Hadoop pioneer. "Now businesses can add more data sets to their collection," Cutting said. "They can break down the silos in their organization."

More Hadoop Benefits

Hadoop also lets companies store data as it comes in - structured or unstructured - so you don't have to spend money and time configuring data for relational databases and their rigid tables. Since Hadoop can scale so easily, it can also be the perfect platform to catch all the data coming from multiple sources at once.
Hadoop's most touted benefit is its ability to store data much more cheaply than can be done with RDBMS software. But that's only the first part of the story. The capability to catch and hold so much data so cheaply means businesses can use all of their data to make more informed decisions.