New! Sign up for our email newsletter on Substack.

Next-generation, high-performance processor unveiled

April 24, 2007

The prototype for a revolutionary new general-purpose computer processor, which has the potential of reaching trillions of calculations per second, has been designed and built by a team of computer scientists at The University of Texas at Austin.

The new processor, known as TRIPS (Tera-op, Reliable, Intelligently adaptive Processing System), could be used to accelerate industrial, consumer and scientific computing.

Professors Stephen Keckler, Doug Burger and Kathryn McKinley have been working on underlying technology that culminated in the TRIPS prototype for the past seven years. Their research team designed and built the hardware prototype chips and the software that runs on the chips.

“The TRIPS prototype is the first on a roadmap that will lead to ultra-powerful, flexible processors implemented in nanoscale technologies,” said Burger, associate professor of computer sciences.

TRIPS is a demonstration of a new class of processing architectures called Explicit Data Graph Execution (EDGE). Unlike conventional architectures that process one instruction at a time, EDGE can process large blocks of information all at once and more efficiently.

Current “multicore” processing technologies increase speed by adding more processors, which individually may not be any faster than previous processors.

Adding processors shifts the burden of obtaining better performance to software programmers, who must assume the difficult task of rewriting their code to run well on a potentially large number of processors.

“EDGE technology offers an alternative approach when the race to multicore runs out of steam,” said Keckler, associate professor of computer sciences.

Each TRIPS chip contains two processing cores, each of which can issue 16 operations per cycle with up to 1,024 instructions in flight simultaneously. Current high-performance processors are typically designed to sustain a maximum execution rate of four operations per cycle.

Though the prototype contains two 16-wide processors per chip, the research team aims to scale this up with further development.

Source University of Texas at Austin

Quick Note Before You Read On.

ScienceBlog.com has no paywalls, no sponsored content, and no agenda beyond getting the science right. Every story here is written to inform, not to impress an advertiser or push a point of view.

Good science journalism takes time — reading the papers, checking the claims, finding researchers who can put findings in context. We do that work because we think it matters.

If you find this site useful, consider supporting it with a donation. Even a few dollars a month helps keep the coverage independent and free for everyone.

43 thoughts on “Next-generation, high-performance processor unveiled”

BJS
February 7, 2008 at 11:14 am
I suspect it is linked on some high traffic site somewhere. Also, it ranks high in search engines for the ‘phrase next generation processor.’
Weird.
Reply
Anonymous
February 7, 2008 at 6:55 am
This “Next-generation, high-performance processor unveiled”
has been posted as one of the day’s “Top stories”
since April 2007 !
Does someone have a vested interest?
Reply
Anonymous
February 2, 2008 at 10:27 pm
http://en.wikipedia.org/wiki/TRIPS_architecture
If these will be able process 1 teraflop per second, which is 200 times faster than todays fast computers, RISC, ARM, CISC will all be dinosaurs.
Reply
Anonymous
January 31, 2008 at 3:17 am
Reading some of these comments, it is quite clear to me that there is often an astoundingly common correlation between the amount of knowledge a person has in a specialized area [such as CPU design], and their inability to conceptualize or accept possibly superior solutions. That isn’t to say I think this will be the next big thing, I don’t know enough about CPU architecture to make such a claim. I just find the lack of knowledge people have about this new architecture combined with the assumptions they are making about it’s viability to be somewhat onerous.
Reply
kjellstrom
January 28, 2008 at 7:11 am
It seems to me that your new computer should be very good for “simulated evolution” such as for instance “Gaussian adaptation” because you may test all individuals in a pululation of 1000 individuals in parallel.
But the relatively small number os individuals in a population limits the number of degrees of freedom in the process, because the statistical certainty in the elements of the moment matrix of the Gaussian must be determined with sufficient precision.
Gkm
Reply
Anonymous
November 30, 2007 at 4:03 pm
TRILLIONS of calculations in mathematical
equations is very minor compared to 100,000 to 999,000 times in data calculations that
is the true supremacy in data communications.
Reply
Anonymous
October 24, 2007 at 6:42 pm
Intel started demonstrating a chip capable of delivering a teraflop of performance last winter. (see http://techresearch.intel.com/articles/Tera-Scale/1449.htm)
Theirs isn’t x86 compatible either, which means it wouldn’t have mass market appeal, even if Intel offered it as a product (which they don’t plan to do).
Sustaining performance in the TF and PF domains takes more than a clever core architecture. Memory capacity and bandwidth, packaging and software all play key roles. The UT charts don’t say much about any of these aspects of their design.
Reply
MainFragger
October 23, 2007 at 1:44 am
Why aren’t there processors that can compute in straight hex? There has to be a way to make a 0 state and 15 clean voltage ranges to create digital hex. It just seems to me that processing FFFF at an adress of ffff is a lot faster than processing 111111111 at an adress of 111111111 or whatever..
And for optical media, the pits can be circular and match the maximum size readable by the beam. Then have 0 be empty, 1-8 ascending pit sizes, 9-F being inverse conectrentric circles unpitted. 9 would be the next circle edge size down from the other edge of the pit unpitted. F would be the outer ring of the pit with nothing pitted inside that ring.
Reply
xgeorgio
October 22, 2007 at 12:38 am
Advanced branch prediction requires many cycles and large in-core cache, both mean very expensive h/w compared to current CPU technologies. Furthermore, this can only be truly effective when automatic deep branch prediction is required, like when using very high-level programming languages (usually for AI).
The current trend in desktop h/w is quite the opposite, that is to embed as many parallel cores as possible inside the PC, including general-purpose programmable GPU (graphics card) h/w. This can ease the burden of compatibility of instruction sets from classic x86 while exploiting the current CPU technologies to the max via massive parallelism.
Also, it should be noted that most heavy-processing applications today, like climate simulations, weather prediction, molecular dynamics, pattern recognition, etc, are designed DSP-like, mainly focusing on simple math instructions that can be easily ported to parallel or vector machines. Hence, a new instruction set for graph-like branch prediction seems too specialized and cost-inefficient for now.
Reply
Anonymous
October 23, 2007 at 11:28 am
MainFragger,
One reason why we use binary systems to perform mathematical operations is relative immunity to noise. Having a system that has only two states, high and low, means that there is a very large difference between the two states that allows noise to be ignored. To be more explicit, anything above a set threshold is high, and anything below is low. Thus, when noise, an inevitable intruder, is found on the signal, it can be amazingly high before it causes error. But, in a system that is trying to use multiple levels, such as your hexidecimal example, any noise that is greater than 1/32nd of the full scale will cause an error. The greater the precision required of a given signal, the smaller a given noise can be that will cause error. Thus, the system is very likely to be error prone.
Another reason binary systems are used is cost. It is far easier to make on/off switches than it is analog amplifiers with the accuracy needed for higher modulo math systems. Analog computers were developed before binary computers proved to be far more economical. (As a youngster, I build such a simple analog computer, just for kicks, after reading about them in an electronics hobby book from a few decades earlier.)
–Candice H. Brown Elliott
Reply
Anonymous
April 25, 2007 at 7:56 am
In the future, we won’t even need or use ‘processors’.
All this in the future nonsense is nothing short of a car commercial that has the words “announcing” & “the all new” mantras. dime a dozen and I might buy one
Reply
Ilya Rosenberg
April 25, 2007 at 4:37 pm
I read the introductory PDF on their web site. Interesting stuff.
What these guys are doing is trying to replace the superscalar architecture (which eats up a lot of transistors and power on architectures such as the x86 for things such as register renaming out of order execution). The advantage this has over x86s is that you could cram more cores on the same die because you’d be wasting less chip real-estate and you should be less sensitive to delays due to cache misses. You might also use the ALUs better. The advantage this has over GPU type processors is that GPU processors typically want to repeat the same operations on similar data (for example, processing 16 pixels in parallel), and if branches are taken, GPUs want to branch the same way for all the pixels. This architecture doesn’t have that same requirement.
I think it’s a nice idea, but I doubt they could get anywhere close to the performance of x86 chips or GPUs any time soon unless they get major funding and access to the high end fabs. Still, it’s interesting research… cool to see people trying a different approach.
Reply
Anonymous
April 25, 2007 at 7:00 am
. simulation (medical, climate, vr)
. ray-traced gaming at LAST!
. compiling my linux kernel on the fly :-) … long live my hypervisor !
. real-time 3D effects
. computer-aided sensorial
. mind-control !!! Woehoe!
. advanced weapon systems
. top speed navigation in space
. AI
. 3D radar systems
. weather prediction
. Performing multi-dimensional analyses on the TORA
. Pi^2
. forget about huge clusters, think pizza-box supercomputing
and many many more
Reply
Anonymous
April 27, 2007 at 5:50 pm
Better than ‘on ice’… or ‘on junk food’.
Reply
DuLac
April 26, 2007 at 2:18 am
You got it right!
It resembles a GPU.
Actually you already have it’s human-made brother on your PC.
The different is that while the GPU already have things optimized for a specific task (Graphics)… this stuff is specialized in building automatically and localy circuits for new specific tasks (like a GPU.
This means circuits will not need people to build something like a GPU… but for many different specialized tasks that will be optimized temporarily in hardware (circuit programming) to get a similar result.
Number-cracking is another specific task being introduced in our PCs… BUT they are general as they should be.
Now imagine the following scenario:
You have a cipher to crack (just an example)… you have millions of chips specialized in AES… any variation of AES will be a problem… you loose a lot of time to program a new chip…
You need to adapt your high-level programming, optimization, change circuits. Such solution exists for decades. But is expensive and local to certain services… OTHER services also have need of it… MORE available power is needed… And cheaper!
NOW you can have a cheap solution to be integrated in higher quantities… and easily ported to other different tasks. Naturally this as a lot of usages. Though not the common user needs who are already using GPUs. They are already using this thing (limited) in their PC’s.
So this is not a new generation CPU, just the generalization of what exists in a cheaper way. Naturally this is a very powerful tool… specially (and this is the interest of it) because it will be cheaper and available in wider quantities.
The result of a more vast and/or increased use depends of the usage given to it… And the power of it increases the power of the way it is used. Personally, while I find it promising in some areas, its also scary on others. That’s the usual mankind’s problem: Power! and the lack of wisdom to only use it right.
Cheers.
P.S. – I did digress, sorry.
As I did I’ll add a note to another user that sugested many valued good applications… and some bad and/or some that just seem silly:
The user mentioned Mind-Control… Well that’s not so silly if we look from the right angle. For example: We do mind control all the time. Just ask a mind doctor… or an advertise maker… or a political campaign expert… or… You got the idea, so let’s keep it simple.
The fact is that people are very limited to what is familiar to them… and since they are mostly familiar to what is GIVEN to them (ex. TV, NEWS) that is one problem that propaganda exploits. There are other problems, but this is not the place. Just consider that a fish in the water does not see it. And that a fairly intelligent person is easy to fool with it’s own words and it’s own limitations because their words are felt important and the limitations are not recognized. We also make our own waters… and churches/school/media show how fragile we are to the social environment that builds our believe system.
Best wishes.
Reply
amanfromMars
April 25, 2007 at 11:54 pm
“Their research team designed and built the hardware prototype chips and the software that runs on the chips.”
Hmmm. A Micro Operating System for Beta Use of any Macro Operating System? ….. which is Really a Roadmap to Route Highly Enriched Information to and from Root Sources/Intelligent Servers?…….. SMARTer chips?
Or merely more Intelligent Programmers in AI Environments/Virtual Domains? [Intelligent as in Viably Imaginative]
Reply
Anonymous
April 26, 2007 at 10:12 am
is there port of slaka on this architecture?
Reply
Anonymous
April 25, 2007 at 10:04 am
Where’s the difference to an ordinary vector cpu. e.g. mips ?
Reply
Anonymous
April 25, 2007 at 9:24 am
Web servers doesn’t care about X86 compatibility as long as the software is ported to the platform.
Reply
Anonymous
April 25, 2007 at 6:08 am
weI’m wondered most of todays (binary) logic problems have a single input flow of instructions So what is the use of having so many calculations side by side? It is powerfull but where, in what fields would this be required. I dont think normal PC would require it (programes with 1024 threads are unlikely). I can imagine it would be handy in copmutated biochemics but what would the other target fields be ???
Reply
Anonymous
April 24, 2007 at 6:31 pm
Did someone say pie? I love pie…
Reply
Anonymous
April 26, 2007 at 12:17 pm
???????? ????
comments is on fire!
Reply
James Snell
April 24, 2007 at 4:50 pm
I think the key consideration when designing an architecture for general use is the programmers. If you can make something which provides a simple interface to users at the lowest level (that is provides a very straight forward instruction set) then I’d expect adoption to be highly encouraged. Of course, there needs to also be economical benefits all way around. If the chip is expensive as hell, hard to obtain in bulk, unstable, unscalable and so on, then there’s going to be no real reason to adopt it at all.
Given that intel announced today that they’re opening things up, I suspect that it will be easier for alternatives such as this to be adapted to work with existing PCs. If this CPU could provide even some sort of x86 emulation at the low level while keeping alternative features readily available, then it stands a great chance at being a success.
I wish them the best, I may have to consider Austin now for my master’s, I can’t wait to really get hard core with this stuff… :)
Reply
RichWargo
April 25, 2007 at 3:35 am
I just want to say something to the naysayers. If you read more closely, this work is being funded by DARPA. If that is your criteria for predicting failure, just look to the Internet, another DARPA-funded creation. Reading further on, the research team is not just creating another “academic” solution, but is pursuing all avenues that will provide an end-user usable solution – hardware, software, the whole package. It’s taken them about 7 years to get this far, and if DARPA is still funding them after 7 years, then there must be something real to this.
Reply
ParoX
April 24, 2007 at 11:36 pm
Um… A wide (and ridiculously long?) processor pipe does not invalidate the usefulness of a processor.
Specialized chips are seen everywhere; you DO know what memory controllers and GPUs do, right? Please?
Programmable Logic Circuits are not evil; they’re used in everything from your car to your oven.
You are insanely paranoid and should go away :D
Reply
Lightning
April 25, 2007 at 5:01 am
EDGE doesn’t “issue” instructions in the same way that a typical processor does, but instead issues data to a set of execution units that have been pre-issued with instructions. The allows it to offer considerable parallelism without difficult programming. It would be worthwhile reading the resources available at the home web site here http://www.cs.utexas.edu/~trips/ before posting more.
As for making it open source… ever tried to work on something for seven years without corporate backing?
Reply
Mario.
April 25, 2007 at 4:30 am
This looks like todays graphic cards chips, like the G80 or more like the RV650, RV600, RV630 and RV610. The R600 can make ~0.5TFLOPS.
Reply
DuLac
April 24, 2007 at 5:58 pm
Basically this is an expansion of the programmable circuits used for decades by the NSA to crack ciphers. The news is that it allows an automatic programming of the circuits.
The result is that only very repetitive tasks are optimized in circuit programming. This works well with cipher cracking, war-weather-economic-social simulation/prediction that is been used for decades.
An U.S. general once said about the end of WW-II: The germans lost the War but the NoZIs won it! … This is a better tool for them to do more efficiently what they have been done for the last 50 years. The military-industrial complex will appreciate!
To the public this tool is useless. So take your ideas out!
You won’t need it. Period. It is not for you!
Cheers.
Reply
Anonymous
April 24, 2007 at 10:44 pm
Explicit?
I believe that there are more NOPs eating the cache like any VLIW processor!!!
Reply

Next-generation, high-performance processor unveiled

Related

43 thoughts on “Next-generation, high-performance processor unveiled”

Leave a Comment Cancel reply