MIT-backed "Taco" system could deliver big data breakthrough: Researchers at MIT, the French Energies and Atomic Energy Commission and Adobe have collaborated on a new system that tackles the problem of "sparse" data when running big data-analysis workloads. Here's how MIT's news office describes the system:
Imagine, for instance, a massive table that mapped all of Amazon’s customers against all of its products, with a “1” for each product a given customer bought and a “0” otherwise. The table would be mostly zeroes.
With sparse data, analytic algorithms end up doing a lot of addition and multiplication by zero, which is wasted computation. Programmers get around this by writing custom code to avoid zero entries, but that code is complex, and it generally applies only to a narrow range of problems..
The system is called Taco, for tensor algebra compiler. ... If that Amazon table also mapped customers and products against the customers’ product ratings on the Amazon site and the words used in their product reviews, the result would be a four-dimensional tensor.
[Taco] offers a 100-fold speedup over existing, non-optimized software packages. And its performance is comparable to that of meticulously hand-optimized code for specific sparse-data operations, while requiring far less work on the programmer’s part.
A publicly released tensor from Amazon that associates customer ID numbers with purchases and terms from reviews consists of 107 exabytes of data, but Taco's compression can squeeze that massive data set down to an almost trivial 13 gigabytes, according to the researchers.
POV: The MIT post goes into much more detailed description of Taco's capabilities and how it was conceptualized. On paper, the researchers' work shows considerable promise, notes Constellation VP and principal analyst Doug Henschen.
The idea behind Taco is to save processing time in big data scenarios by avoiding churning through tensors (arrays) of data that are sparse, meaning full of null values that don't impact a calculation," Henschen says. "It's a common trick in big-data analyses to avoid churning through data that's irrelevant to a calculation."
Columnar databases, for example, let you skip the columns of data that aren't relevant to a query, he adds. Compression schemes, another example, let you skip across or summarize redundant values of data. "Whether it's typical or even an extreme case, the example cited in the article of turning 107 exabytes of data into 13 gigabytes of relevant information bodes well for the Taco approach," Henschen adds.
Google beefs up its Dedicated Interconnect service: Earlier this year, Google announced Dedicated Interconnect, a premium service that provides faster, private connections to its cloud platform, with an eye on hybrid cloud deployment scenarios.
Dedicated Interconnect is now generally available under a 99.9 percent or 99.99 percent service-level agreement, but as part of the GA launch Google has also added a number of additional features.
One is global routing support in Cloud Router, which gives Interconnect customers the ability to connect on-premises workloads to any Google Cloud Platform subnet in the world. This provides additional network flexibility and robustness but there are cost differences compared to the default setting of regional dynamic routing. Google claims that global routing support is "unique among leading cloud providers," which may be the case, but the question is for how long.
Google has also added Dedicated Interconnect to four additional regions: Mumbai, Munich, Montreal and Atlanta. Dedicated Interconnect works by directly connecting a customer's network to Google's in a co-location center. Google says it's also working with Equinix to add more DI locations around the world.
POV: The success or failure of Dedicated Interconnect is something to watch keenly, given that it is going GA at a time when enterprises are more concerned about security than ever, while hybrid cloud deployment models continue gaining significant momentum. Amazon Web Services and Microsoft Azure already had offered similar services, in the form of Direct Connect and ExpressRoute, respectively. Now that Google has caught up to the competition, enterprises have a third option for private connections, but should take care to weigh each service's merits carefully on matters such as management complexity and the finer details of pricing.
Halloween nightmare as SCO-IBM case lives on: Yes, the litigation between SCO and IBM is still alive, somehow. The zombie-like case over alleged improprities by IBM with SCO's UNIX code has been dragging on since 2003, and now an appeals court judge has partially ruled in favor of SCO, as Ars Technica reports:
Last year, US District Judge David Nuffer had ruled against SCO (whose original name was Santa Cruz Operation) in two summary judgment orders, and the court refused to allow SCO to amend its initial complaint against IBM.
SCO soon appealed. On Monday, the 10th US Circuit Court of Appeals found that SCO’s claims of misappropriation could go forward while also upholding Judge Nuffer's other two orders.
In essence, SCO has argued that IBM essentially stole, or misappropriated, its proprietary code (known as UnixWare System Release 4, or SVr4) in the May 4, 2001 release of the "Monterey operating system," a new version of UNIX designed for IBM’s "Power" processors.
POV: The lawsuit has had remarkable legs, even surviving SCO's bankruptcy. Estimates of SCO's legal expenditures vary but have been pegged as high as $100 million. There is said to be billions of dollars at stake, however, so expect SCO to take its fight to the bitter end—when that will actually come is anyone's guess.