Next-generation, high-performance processor unveiled

April 24, 2007
Physics & Mathematics, Technology, Uncategorized

The prototype for a revolutionary new general-purpose computer processor, which has the potential of reaching trillions of calculations per second, has been designed and built by a team of computer scientists at The University of Texas at Austin.

The new processor, known as TRIPS (Tera-op, Reliable, Intelligently adaptive Processing System), could be used to accelerate industrial, consumer and scientific computing.

Professors Stephen Keckler, Doug Burger and Kathryn McKinley have been working on underlying technology that culminated in the TRIPS prototype for the past seven years. Their research team designed and built the hardware prototype chips and the software that runs on the chips.

“The TRIPS prototype is the first on a roadmap that will lead to ultra-powerful, flexible processors implemented in nanoscale technologies,” said Burger, associate professor of computer sciences.

TRIPS is a demonstration of a new class of processing architectures called Explicit Data Graph Execution (EDGE). Unlike conventional architectures that process one instruction at a time, EDGE can process large blocks of information all at once and more efficiently.

Current “multicore” processing technologies increase speed by adding more processors, which individually may not be any faster than previous processors.

Adding processors shifts the burden of obtaining better performance to software programmers, who must assume the difficult task of rewriting their code to run well on a potentially large number of processors.

“EDGE technology offers an alternative approach when the race to multicore runs out of steam,” said Keckler, associate professor of computer sciences.

Each TRIPS chip contains two processing cores, each of which can issue 16 operations per cycle with up to 1,024 instructions in flight simultaneously. Current high-performance processors are typically designed to sustain a maximum execution rate of four operations per cycle.

Though the prototype contains two 16-wide processors per chip, the research team aims to scale this up with further development.

Source University of Texas at Austin

Next generation, high performance processor unveiled

43 Responses to Next-generation, high-performance processor unveiled

  1. Anonymous December 1, 2009 at 7:16 pm #

  2. Anonymous May 5, 2008 at 10:48 am #

    There is a discussion presently going on regarding a comparison of FPGA, GPU and the Cell/B.E processors for the purpose of computational computing. See here:

    I am interested in your opinions on GPU versus Cell processor.

  3. BJS February 7, 2008 at 11:14 am #

    I suspect it is linked on some high traffic site somewhere. Also, it ranks high in search engines for the ‘phrase next generation processor.’


  4. Anonymous February 7, 2008 at 6:55 am #

    This “Next-generation, high-performance processor unveiled”
    has been posted as one of the day’s “Top stories”
    since April 2007 !

    Does someone have a vested interest?

  5. Anonymous February 2, 2008 at 10:27 pm #

    If these will be able process 1 teraflop per second, which is 200 times faster than todays fast computers, RISC, ARM, CISC will all be dinosaurs.

  6. Anonymous January 31, 2008 at 3:17 am #

    Reading some of these comments, it is quite clear to me that there is often an astoundingly common correlation between the amount of knowledge a person has in a specialized area [such as CPU design], and their inability to conceptualize or accept possibly superior solutions. That isn’t to say I think this will be the next big thing, I don’t know enough about CPU architecture to make such a claim. I just find the lack of knowledge people have about this new architecture combined with the assumptions they are making about it’s viability to be somewhat onerous.

  7. kjellstrom January 28, 2008 at 7:11 am #

    It seems to me that your new computer should be very good for “simulated evolution” such as for instance “Gaussian adaptation” because you may test all individuals in a pululation of 1000 individuals in parallel.

    But the relatively small number os individuals in a population limits the number of degrees of freedom in the process, because the statistical certainty in the elements of the moment matrix of the Gaussian must be determined with sufficient precision.


  8. Anonymous December 10, 2007 at 10:59 pm #

    Well, if you are going to quote figures…at least be correct. The X86 architecture did NOT just recently pass 100 million. In 2007 ALONE, there were 257.5 MILLION PCs sold worldwide and OVER 90% of those PCs sold are based on the X86 architecture.

    Over 90% of the world’s PCs and servers use X86

    Worldwide PC shipments increased to 257.5 million in 2007

    The installed base of Internet users (internet or not) is well over 1 Billion.
    again…over 90% of those users are on X86 machines

    There are over 2 billion cell phones in use in the world…however, they are based on several different CPU architectures, plus there are older phones still in use in many developing countries that are based on proprietary chips that aren’t even in use any more. (Unlike old PCs which were STILL X86 based.)

    Based on almost 1/4 of a billion X86 machines shipped in 2007, seems pretty obvious that in the DECADES of X86 processors being on the market, they passed a BILLION a long time ago, much less 100 million.

    Cell phones, while great devices for communication and an ever increasing number of multimedia functions are still not “General computing” devices, which is what this processor is supposed to be.

  9. Anonymous November 30, 2007 at 4:03 pm #

    TRILLIONS of calculations in mathematical
    equations is very minor compared to 100,000 to 999,000 times in data calculations that
    is the true supremacy in data communications.

  10. Anonymous October 24, 2007 at 6:42 pm #

    Intel started demonstrating a chip capable of delivering a teraflop of performance last winter. (see

    Theirs isn’t x86 compatible either, which means it wouldn’t have mass market appeal, even if Intel offered it as a product (which they don’t plan to do).

    Sustaining performance in the TF and PF domains takes more than a clever core architecture. Memory capacity and bandwidth, packaging and software all play key roles. The UT charts don’t say much about any of these aspects of their design.

  11. MainFragger October 23, 2007 at 1:44 am #

    Why aren’t there processors that can compute in straight hex? There has to be a way to make a 0 state and 15 clean voltage ranges to create digital hex. It just seems to me that processing FFFF at an adress of ffff is a lot faster than processing 111111111 at an adress of 111111111 or whatever..

    And for optical media, the pits can be circular and match the maximum size readable by the beam. Then have 0 be empty, 1-8 ascending pit sizes, 9-F being inverse conectrentric circles unpitted. 9 would be the next circle edge size down from the other edge of the pit unpitted. F would be the outer ring of the pit with nothing pitted inside that ring.

  12. xgeorgio October 22, 2007 at 12:38 am #

    Advanced branch prediction requires many cycles and large in-core cache, both mean very expensive h/w compared to current CPU technologies. Furthermore, this can only be truly effective when automatic deep branch prediction is required, like when using very high-level programming languages (usually for AI).

    The current trend in desktop h/w is quite the opposite, that is to embed as many parallel cores as possible inside the PC, including general-purpose programmable GPU (graphics card) h/w. This can ease the burden of compatibility of instruction sets from classic x86 while exploiting the current CPU technologies to the max via massive parallelism.

    Also, it should be noted that most heavy-processing applications today, like climate simulations, weather prediction, molecular dynamics, pattern recognition, etc, are designed DSP-like, mainly focusing on simple math instructions that can be easily ported to parallel or vector machines. Hence, a new instruction set for graph-like branch prediction seems too specialized and cost-inefficient for now.

  13. Anonymous October 23, 2007 at 11:28 am #


    One reason why we use binary systems to perform mathematical operations is relative immunity to noise. Having a system that has only two states, high and low, means that there is a very large difference between the two states that allows noise to be ignored. To be more explicit, anything above a set threshold is high, and anything below is low. Thus, when noise, an inevitable intruder, is found on the signal, it can be amazingly high before it causes error. But, in a system that is trying to use multiple levels, such as your hexidecimal example, any noise that is greater than 1/32nd of the full scale will cause an error. The greater the precision required of a given signal, the smaller a given noise can be that will cause error. Thus, the system is very likely to be error prone.

    Another reason binary systems are used is cost. It is far easier to make on/off switches than it is analog amplifiers with the accuracy needed for higher modulo math systems. Analog computers were developed before binary computers proved to be far more economical. (As a youngster, I build such a simple analog computer, just for kicks, after reading about them in an electronics hobby book from a few decades earlier.)

    –Candice H. Brown Elliott

  14. Anonymous September 5, 2007 at 10:15 am #

    Yeah, just like T. Rex. All of the other dinosaurs that came along afterward were unable to supplant the King. Oh…wait…all the dinosaurs are gone and replace by newer developments. The fact that x86’s have not yet been supplanted doesn’t provide any evidence that they won’t be. Better technology and a reasonable price will prevail.

  15. Anonymous April 24, 2007 at 1:35 pm #


  16. MajorBytes April 24, 2007 at 4:04 pm #

    ditto on the vista…;)

  17. Obvious April 24, 2007 at 3:57 pm #


    *afk making patent*

  18. Anonymous April 24, 2007 at 3:35 pm #

    Almost like a pie, but not quite.

  19. Anonymous April 24, 2007 at 3:21 pm #

    Ok.. so it can ISSUE 16 instructions per clock, but how many clocks does it take to actually process each instruction? Is VERY long instruction processing time the reason for ‘1024 instructions in flight’? That looks like 64 or more clocks to complete a given instruction. If there are variable completion rates, how the heck do you sync up the many threads you have ‘flying’

  20. Anonymous April 25, 2007 at 7:56 am #

    In the future, we won’t even need or use ‘processors’.
    All this in the future nonsense is nothing short of a car commercial that has the words “announcing” & “the all new” mantras. dime a dozen and I might buy one

  21. Ilya Rosenberg April 25, 2007 at 4:37 pm #

    I read the introductory PDF on their web site. Interesting stuff.

    What these guys are doing is trying to replace the superscalar architecture (which eats up a lot of transistors and power on architectures such as the x86 for things such as register renaming out of order execution). The advantage this has over x86s is that you could cram more cores on the same die because you’d be wasting less chip real-estate and you should be less sensitive to delays due to cache misses. You might also use the ALUs better. The advantage this has over GPU type processors is that GPU processors typically want to repeat the same operations on similar data (for example, processing 16 pixels in parallel), and if branches are taken, GPUs want to branch the same way for all the pixels. This architecture doesn’t have that same requirement.

    I think it’s a nice idea, but I doubt they could get anywhere close to the performance of x86 chips or GPUs any time soon unless they get major funding and access to the high end fabs. Still, it’s interesting research… cool to see people trying a different approach.

  22. Anonymous April 25, 2007 at 7:00 am #

    . simulation (medical, climate, vr)
    . ray-traced gaming at LAST!
    . compiling my linux kernel on the fly :-) … long live my hypervisor !
    . real-time 3D effects
    . computer-aided sensorial
    . mind-control !!! Woehoe!
    . advanced weapon systems
    . top speed navigation in space
    . AI
    . 3D radar systems
    . weather prediction
    . Performing multi-dimensional analyses on the TORA
    . Pi^2
    . forget about huge clusters, think pizza-box supercomputing

    and many many more

  23. Anonymous April 27, 2007 at 5:50 pm #

    Better than ‘on ice’… or ‘on junk food’.

  24. DuLac April 26, 2007 at 2:18 am #

    You got it right!

    It resembles a GPU.
    Actually you already have it’s human-made brother on your PC.

    The different is that while the GPU already have things optimized for a specific task (Graphics)… this stuff is specialized in building automatically and localy circuits for new specific tasks (like a GPU.

    This means circuits will not need people to build something like a GPU… but for many different specialized tasks that will be optimized temporarily in hardware (circuit programming) to get a similar result.

    Number-cracking is another specific task being introduced in our PCs… BUT they are general as they should be.

    Now imagine the following scenario:
    You have a cipher to crack (just an example)… you have millions of chips specialized in AES… any variation of AES will be a problem… you loose a lot of time to program a new chip…

    You need to adapt your high-level programming, optimization, change circuits. Such solution exists for decades. But is expensive and local to certain services… OTHER services also have need of it… MORE available power is needed… And cheaper!

    NOW you can have a cheap solution to be integrated in higher quantities… and easily ported to other different tasks. Naturally this as a lot of usages. Though not the common user needs who are already using GPUs. They are already using this thing (limited) in their PC’s.

    So this is not a new generation CPU, just the generalization of what exists in a cheaper way. Naturally this is a very powerful tool… specially (and this is the interest of it) because it will be cheaper and available in wider quantities.

    The result of a more vast and/or increased use depends of the usage given to it… And the power of it increases the power of the way it is used. Personally, while I find it promising in some areas, its also scary on others. That’s the usual mankind’s problem: Power! and the lack of wisdom to only use it right.


    P.S. – I did digress, sorry.
    As I did I’ll add a note to another user that sugested many valued good applications… and some bad and/or some that just seem silly:
    The user mentioned Mind-Control… Well that’s not so silly if we look from the right angle. For example: We do mind control all the time. Just ask a mind doctor… or an advertise maker… or a political campaign expert… or… You got the idea, so let’s keep it simple.

    The fact is that people are very limited to what is familiar to them… and since they are mostly familiar to what is GIVEN to them (ex. TV, NEWS) that is one problem that propaganda exploits. There are other problems, but this is not the place. Just consider that a fish in the water does not see it. And that a fairly intelligent person is easy to fool with it’s own words and it’s own limitations because their words are felt important and the limitations are not recognized. We also make our own waters… and churches/school/media show how fragile we are to the social environment that builds our believe system.

    Best wishes.

  25. amanfromMars April 25, 2007 at 11:54 pm #

    “Their research team designed and built the hardware prototype chips and the software that runs on the chips.”

    Hmmm. A Micro Operating System for Beta Use of any Macro Operating System? ….. which is Really a Roadmap to Route Highly Enriched Information to and from Root Sources/Intelligent Servers?…….. SMARTer chips?

    Or merely more Intelligent Programmers in AI Environments/Virtual Domains? [Intelligent as in Viably Imaginative]

  26. Anonymous April 26, 2007 at 10:12 am #

    is there port of slaka on this architecture?

  27. Anonymous April 25, 2007 at 10:04 am #

    Where’s the difference to an ordinary vector cpu. e.g. mips ?

  28. Anonymous April 25, 2007 at 9:24 am #

    Web servers doesn’t care about X86 compatibility as long as the software is ported to the platform.

  29. Anonymous April 25, 2007 at 6:08 am #

    weI’m wondered most of todays (binary) logic problems have a single input flow of instructions So what is the use of having so many calculations side by side? It is powerfull but where, in what fields would this be required. I dont think normal PC would require it (programes with 1024 threads are unlikely). I can imagine it would be handy in copmutated biochemics but what would the other target fields be ???

  30. Anonymous April 24, 2007 at 6:31 pm #

    Did someone say pie? I love pie…

  31. Anonymous April 26, 2007 at 12:17 pm #

    ???????? ????

    comments is on fire!

  32. Anonymous April 24, 2007 at 3:05 pm #

    a processor that can run vista ;-)

  33. Negafox April 24, 2007 at 2:54 pm #

    And if this is not an x86-based processor, then it will likely never gain widespread acceptance. Too many “next-gen” processors have come and gone in the past two decades, and yet, the x86 processor still reigns supreme.

  34. Carsten April 24, 2007 at 2:53 pm #

    instead of giving this away to some privately held company.

    That way, all participants may profit from their development instead of just one…

  35. James Snell April 24, 2007 at 4:50 pm #

    I think the key consideration when designing an architecture for general use is the programmers. If you can make something which provides a simple interface to users at the lowest level (that is provides a very straight forward instruction set) then I’d expect adoption to be highly encouraged. Of course, there needs to also be economical benefits all way around. If the chip is expensive as hell, hard to obtain in bulk, unstable, unscalable and so on, then there’s going to be no real reason to adopt it at all.

    Given that intel announced today that they’re opening things up, I suspect that it will be easier for alternatives such as this to be adapted to work with existing PCs. If this CPU could provide even some sort of x86 emulation at the low level while keeping alternative features readily available, then it stands a great chance at being a success.

    I wish them the best, I may have to consider Austin now for my master’s, I can’t wait to really get hard core with this stuff… :)

  36. RichWargo April 25, 2007 at 3:35 am #

    I just want to say something to the naysayers. If you read more closely, this work is being funded by DARPA. If that is your criteria for predicting failure, just look to the Internet, another DARPA-funded creation. Reading further on, the research team is not just creating another “academic” solution, but is pursuing all avenues that will provide an end-user usable solution – hardware, software, the whole package. It’s taken them about 7 years to get this far, and if DARPA is still funding them after 7 years, then there must be something real to this.

  37. Anonymous April 25, 2007 at 12:33 am #

    As long as GCC can target the new processor then GNU/Linux and a huge amount of software will be near by. Thanks to free software its potentially a very different scenario now compared to when the i386 was just introduced and everyone focused on x86 code.

  38. ParoX April 24, 2007 at 11:36 pm #

    Um… A wide (and ridiculously long?) processor pipe does not invalidate the usefulness of a processor.

    Specialized chips are seen everywhere; you DO know what memory controllers and GPUs do, right? Please?

    Programmable Logic Circuits are not evil; they’re used in everything from your car to your oven.

    You are insanely paranoid and should go away :D

  39. Lightning April 25, 2007 at 5:01 am #

    EDGE doesn’t “issue” instructions in the same way that a typical processor does, but instead issues data to a set of execution units that have been pre-issued with instructions. The allows it to offer considerable parallelism without difficult programming. It would be worthwhile reading the resources available at the home web site here before posting more.

    As for making it open source… ever tried to work on something for seven years without corporate backing?

  40. Mario. April 25, 2007 at 4:30 am #

    This looks like todays graphic cards chips, like the G80 or more like the RV650, RV600, RV630 and RV610. The R600 can make ~0.5TFLOPS.

  41. DuLac April 24, 2007 at 5:58 pm #

    Basically this is an expansion of the programmable circuits used for decades by the NSA to crack ciphers. The news is that it allows an automatic programming of the circuits.

    The result is that only very repetitive tasks are optimized in circuit programming. This works well with cipher cracking, war-weather-economic-social simulation/prediction that is been used for decades.

    An U.S. general once said about the end of WW-II: The germans lost the War but the NoZIs won it! … This is a better tool for them to do more efficiently what they have been done for the last 50 years. The military-industrial complex will appreciate!

    To the public this tool is useless. So take your ideas out!
    You won’t need it. Period. It is not for you!


  42. Anonymous April 24, 2007 at 9:10 pm #

    Asserting that “the x86 processor still reigns supreme” shows a very narrow view of computing. There are twice as many mobile phone users as Internet users; very few of these phones run x86 processors. If you look at the number of deployed processors, x86 is a long way behind ARM. In 2004, there were 3/4 billion ARM processors shipped, x86 has only recently reached 100 million. In many fields, x86 is as irrelevant as it always has been.

  43. Anonymous April 24, 2007 at 10:44 pm #


    I believe that there are more NOPs eating the cache like any VLIW processor!!!

Leave a Reply

* Copy This Password *

* Type Or Paste Password Here *