Getting those answers much more quickly to the user, even by a few seconds, might seem like a move that's not wildly important. But for Amazon, a company that relies on capturing a user's interest in the absolute critical moment to execute on a sale, it seems important enough to drop that response time as close to zero as possible to cultivate the behavior that Amazon can give you the answer you need immediately — especially, in the future, if it's a product that you're likely to buy. Amazon, Google and Apple are at the point where users expect technology that works and works quickly, and are probably not as forgiving as they are to other companies relying on problems like image recognition (like, say, Pinterest).
This kind of hardware on the Echo would probably be geared toward inference, taking inbound information (like speech) and executing a ton of calculations really, really quickly to make sense of the incoming information. Some of these problems are often based on a pretty simple problem stemming from a branch of mathematics called linear algebra, but it does require a very large number of calculations, and a good user experience demands they happen very quickly. The promise of making customized chips that work really well for this is that you could make it faster and less power-hungry, though there are a lot of other problems that might come with it. There are a bunch of startups experimenting with ways to do something with this, though what the final product ends up isn't entirely clear (pretty much everyone is pre-market at this point).
In fact, this makes a lot of sense simply by connecting the dots of what's already out there. Apple has designed its own customer GPU for the iPhone, and moving those kinds of speech recognition processes directly onto the phone would help it more quickly parse incoming speech, assuming the models are good and they're sitting on the device. Complex queries — the kinds of long-as-hell sentences you'd say into the Hound app just for kicks — would definitely still require a connection with the cloud to walk through the entire sentence tree to determine what kinds of information the person actually wants. But even then, as the technology improves and becomes more robust, those queries might be even faster and easier.
The Information's report also suggests that Amazon may be working on AI chips for AWS, which would be geared toward machine training. While this does make sense in theory, I'm not 100 percent sure this is a move that Amazon would throw its full weight behind. My gut says that the wide array of companies working off AWS don't need some kind of bleeding-edge machine training hardware, and would be fine training models a few times a week or month and get the results that they need. That could probably be done with a cheaper Nvidia card, and wouldn't have to deal with solving problems that come with hardware like heat dissipation. That being said, it does make sense to dabble in this space a little bit given the interest from other companies, even if nothing comes out of it.
Amazon declined to comment on the story. In the mean time, this seems like something to keep close tabs on as everyone seems to be trying to own the voice interface for smart devices — either in the home or, in the case of the AirPods, maybe even in your ear. Thanks to advances in speech recognition, voice turned out to actually be a real interface for technology in the way that the industry thought it might always be. It just took a while for us to get here.
There's a pretty big number of startups experimenting in this space (by startup standards) with the promise of creating a new generation of hardware that can handle AI problems faster and more efficiently while potentially consuming less power — or even less space. Companies like Graphcore and Cerebras Systems are based all around the world, with some nearing billion-dollar valuations. A lot of people in the industry refer to this explosion as Compute 2.0, at least if it plays out the way investors are hoping.