Comments on the “Great ISA Debate”

20 May, 2010 § 2 Comments

As computer architectures continued to grow and develop over time, many solutions have been proposed. This paper will discuss the competition between the IBM System/360 and Burroughs B5000 as well as its legacy in terms of the Reduced Instruction Set Computer (RISC) versus Complex Instruction Set Computer (CISC). The end of this paper will comment on the current state of computer architecture and where things are headed.

IBM System/360 vs Burroughs B5000

The System/360 and the B5000 differed in many fundamental ways. Figure 1 shows a table of just some of the differences:

Figure 1. Key decisions made by the IBM System/360 and the Burroughs B5000

IBM System/360 [4] Burroughs B5000 [1]
Address size 24 bits Program Reference Table with 1024 entries
Character size 4 bit for binary coded decimal 6 bit (8 characters for 48 bit word)
FP size 32/64 bit 48 bit
Instruction size Variable: 16/32/64 bit 12 bit
Integer size 32 bit 48 bit
Register style General purpose registers Stack based

The IBM System/360 made a significant decision to be binary compatible with their machines and make all of the machines work off a common instruction set [1]. To this day, binary compatibility still survives greatly, with the x86 architecture on desktop PC’s allowing programs written ten years ago to run on machines built yesterday.

Another significant introduction from the IBM System/360 was byte address-ability, whereas the B5000 was word addressed [1]. Byte address-ability allowed a bigger address size than word size, and the ability to use characters located at any location.

The B5000 used a stack based register style [4], which is still in use today by the Java Virtual Machine. This means that there are no registers available to the assembly programmer, and anything that needs to get referenced must be pushed on to the stack. Another important side-effect of the stack based architecture was that the B5000 descriptor system used a base and limit. This, along with the separation between instructions and data, removed the possibility for stack overflow attacks. This allows modern day virtual machines, like the JVM, to have a more secure sandbox than binary compatible systems that lack the stack architecture.

The B5000′s stack based architecture is today called Segmented Virtual Memory and is used by many machines. Burrough was also the first to start talking about how multiple processors would work in the system [4]. It wasn’t too long after that the IBM 360 series had support for multiple processors.

RISC vs. CISC

In the early 1980s David Patterson and David Ditzel introduced a reversal of directions for instruction set architectures from the Complex Instruction Set Computer to their Reduced Instruction Set Computer (RISC).

The processor field was then dominated by the VAX-11/78x systems. These systems used a CISC architecture and provided the best performance [3]. Patterson and Ditzel believed they could top CISC with a reduced instruction set, including taking advantage of the memory hierarchy and its increasing performance per dollar in the memory market, increased speed of chip designers, and optimizing what is being used the most instead of creating special instructions for edge cases.

RISC took notice of Moore’s Law and the exponential increases in memory performance and tried to reduce the number of instructions available and increase the number of instructions used [5]. RISC was thought to allow chip designers to implement the instruction set in far less time than CISC and in that amount of time, the top of the line performance would have doubled and the gains introduced by the newly designed CISC would be worthless [5]. But these arguments had their detractors. VAX Systems Architecture argued that the gap between performance in instructions versus performance in main memory was large and that this gap continued to lead to requests for an instruction set that favored high level languages, similar to the design of the B5000.

Patterson and Ditzel cited Amdahl’s Law in trying to optimize the most well-used parts of a system to have the largest effect, and in doing so, pulled work away from the processor and gave it to the compiler. This allowed the CPU to get smaller and make room for other components on the chip. Time has agreed with this argument, as the dominant CPU for mobile devices today is the Advanced RISC Machine (ARM) chip.

One of the main arguments for RISC was that its reduced instruction set would allow implementers to innovate at a much higher pace and surpass the developments of CISC implementers. This claim has not held its ground, as Intel has continued to use the x86 architecture (CISC) in its desktop CPUs and has been able to keep up with the pace of innovation seen in other ISAs.
Both papers were not clear as to the definition of either RISC or CISC. As time has gone on, RISC has remained a load/store architecture but the complexity of the two has increased.

In Conclusion

The IBM System/360 was designed to have an ISA that was good for microcoded processors, paving the way for the CISC, while the Burroughs B5000 was about pipelining and VLSI, leading to RISC.

The key difference in RISC compared to CISC is the separation of hardware and software responsibilities. RISC’s goal was to move the complexity away from the hardware to software where bugs can be fixed cheaper and faster. This also gave more work to compilers to use more efficient instructions. CISC’s goal was to make the hardware more intelligent and make the job of the compiler writer easier.

Today, RISC is used in embedded devices and high-end servers, whereas CISC dominates the desktop market and the lower-end server market.

References

  1. Amdahl, Blaauw, and Brooks. Architecture of the IBM System/360. IBM Journal of Research and Development, 8(2):87-101, April 1964.
  2. Clark, Douglas W. and Strecker, William D. Comments on “The Case for the Reduced Instruction Set Computer,” by Patterson and Ditzel.
  3. Hennessy, John L. and Patterson, David A. Computer Architecture: A Quantitative Approach. 4th Edition. Page 3. 2006.
  4. Lonergan and King. Design of the B5000 system. Datamation, vol. 7, no. 5, pp. 28-32, May, 1961.
  5. Patterson, David A. and Ditzel, David R. The Case for the Reduced Instruction Set Computer.

An Evaluation of the Future of Computing

13 May, 2010 § 2 Comments

This article was written by Brendan Grebur and Jared Wein.

Current technology can no longer support the ever shrinking size of transistors and switches. In order to combat this limitation, researchers are focused on novel approaches to either replace the current MOSFET transistors or supplement their abilities. Many notable technologies have been leveraged to solve the issue, all have yet to be realized on a large-scale. The articles explore two areas of technology aimed at resolving our computational situation. First, manufacturing hardware on the molecular scale presents new challenges as the peculiarities of quantum mechanics begins to reveal itself. Advances in nanotube technology and an ever growing knowledge of quantum physics presents new opportunities and even challenges our classical view of computation. Second, the actual means of computing are explored as the electron is abandoned for photons or nanofluids.

This paper will cover different research areas that intend to pave the way for the future of computing. The first part of this paper will describe, compare, and contrast the techniques being researched. The second part of this paper will describe two techniques that are predicted for adoption within the authors’ lifetime. The last part of this paper will cover the research area that offers the greatest improvement and largest chance to fundamentally change and improve computing.

Trillion Crummy Components [5]

The ability to perform in the face of massive component failure and faults deeply concerned system architects and operators in the early days of computing. A fortuitous advancement into integrated circuits rendered the issue moot. However, as the scales approached nanometers, the problems resurfaced. This paper looks into a method for developing FPGAs with a nanoswitch/CMOS combination.

In manufacturing ever smaller components, failure rates inevitably increase as little room is left for error. Researchers explored building nanoscale systems containing faulty components and attempted to dynamically cope with it. Wires connected by nanoswitches on the scale of 15 nanometers were produced through NIL techniques.

The design suffers from the failure or faulting of nanoswitches that connect the CMOS transistors to form the configurable logic system. Scanning of the system for these faulty components provides a view of functional components who can then be dynamically connected through the FPGA configuration. Replacing typical CMOS implemented components in an FPGA with nanoswitches freed significant amounts of space. Enough so to obtain an eight-time increase in logic density, which translates to a multiple generation chip improvement. In the end, even in the face of 50% nonoperational switches, production yield remained at 99.7%. Performance of the FPGAs was affected, but only slightly by the nonfunctional components.

The authors of this paper are trying to address the fact that faulty components can be overcome through redundancy. It gives the initial look that faulty components can easily be routed around through switches. The paper leaves many important questions unanswered, such as the minimum amount of redundancy for each of the necessary functional units. They have assumed the CMOS transistors will not fail, only the wires or switches. When this is extended to CMOS transistors, or their replacements, how many of these will need to be replicated? This could essentially undo the advantage to reducing component sizes as more components are now necessary to maintain functionality.

Carbon Nanotubes [3]

Single-walled carbon nanotubes (SWNT) exhibit a variety of advantages over silicon when used in transistors. Some of these include reduced electron scattering, better heat dissipation, and less chemically reactive material. Researchers have already produced p-type and n-type transistors which outperform their silicon based counterparts in speed and mobility.

A hurdle for implementation remains in correctly placing and orientating the SWNTs within the system, as a single nanotube width can measure 1 nm. However, a ring oscillator was constructed and sustained operation in excess of 70 MHz on a single nanotube. Thus affirming the potential of the technology. Any further abilities will depend on advancements in the construction of nanotubes.

The other application resides in constructing switches from nanotubes for nonvolatile memory. Inherent characteristics could allow impressive switch density, along with switching speeds in excess of 100 GHz, all using minimal power consumption.

Restrictive production methods have prevented full exploitation, however novel implementations could resolve these problems. One company used mats of criss-crossing nanotubes capable of being locked in certain directions resulting in a simple but quite effective approach with lifetimes of 50 million cycles and switching states in under 3 ns.

Further exploiting SWNT characteristics, nanotubes can replace copper connections as they do not exhibit the limitations of copper at the nanoscale. The bandwidth in the nanotubes is more than three orders of magnitudes larger than copper, allowing more nanotubes to be used to reduce losses in electrical current with the copper wires.

In addition, quantum computing also benefits from the utilization of nanotubes. Singled-out electrons can be ‘caged’ within the carbon structure to represent a quantum bit. An increased spin relaxation time for the particles within the nanotube dots is advantageous for constructing quantum computers.

This paper looks into completely replacing our dependency on silicon and copper for constructing circuits. SWNTs offer superior performance in almost all aspects, but hasn’t made itself ubiquitous due to large-scale manufacturing issues. The material itself is so versatile and atomically abundant, in addition to being inherently scalable, that once manufacturing hurdles are overcome we are sure to see it overwhelm the market.

Molecular, Chemical, and Organic [6]

Chemical computing currently operates on the macroscopic level to implement complex algebraic operations. Signals including optical, electrical, or chemical are used to represent data. Fluorescence cues have been the most successful, but implementing on a reduced scale remains a crux.

One approach attempts to use single molecules as temporary bridges between electrodes, but accurate measurements of its effectiveness are rarely obtained.

Scaling remains an issue, even at the theoretical level for constructing circuits from organic molecules. Researchers must deal with quantum mechanic effects, when operating at the molecular level. The behavior theorized may be inadvertently fabricated by approximation of their calculations.

Some attempts to embrace the quantum phenomenons have found success. A cascading effect seen when lining up carbon monoxide molecules results from the tunneling of vibrational energy from one molecule to another. The positioning of molecules can represent binary encoding and perform logic operations.

Traditional electron usage is abandoned here for the natural interactions of varying molecules to represent data or perform useful work. Again, other materials are researched for their potential to outperform current technology. Many issues arise from quantum phenomenons and accurately placing or observing individual molecules. Such limitations will inevitably inhibit the widespread application of organic computing.

Quantum Computers [2]

Quantum computing represents an inevitable shift in our quest for ever smaller computation scales. Only recently has the combination of quantum theory and computer science been accepted as reality. Beginning with Peter Schor’s discovery, quantum computing entered the scene by undermining one of the most important features of cryptography, difficulty factoring.

Much of the theorectical framework regarding quantum computing was almost rendered useless by the introduction of imprecision and quantum noise. Since computation occurs as infinite precision data represented by the amplitudes of configurations, any errors that enter the system would propagate and corrupt any results. However, it has been shown that error-correcting codes, coupled with redundancy, can prevent such events from destroying the configuration.

Further discoveries solidified the feasibility of quantum computers, specifically the threshold theorem. The only hindrance lie in the implementation of quantum bits and the manner in which they are controlled.

One of the greatest advancements to stem from this field was the introduction of quantum key distribution. The unconditional detection of eavesdropping allows a level of security unparalleled by traditional methods. Implementation details are surprisingly tolerant of imperfections in device endpoints and error rates. Such technology is currently marketed and actively investigated by many large technology companies.

Quantum algorithms have forced us to reconsider standard approaches to classical computation problems. Algorithms ranging from database sorting to estimating Gaussian sums have runtimes significantly smaller than any current approaches.

Quantum computing has certainly altered every aspect of computer science. Not only will circuit technology be completely redesigned, but the manner in which information can be represented opens up endless possibilities. The functionality is there, but much is still needed to actually access it.

Optical Computing [1]

By replacing electrons with photons, computers would no longer suffer common sources of failure including electromagnetic disturbances and short circuiting. Even more advantageous attributes provided by light are the immense bandwidth, low-loss transmissions, and unprecedented capacity when implemented for storage. Devices dealing with optics tend to be less expensive to manufacture and operate at higher frequency resulting in superb computational speeds.

Currently constructed logic gates operate on the order of nanoseconds, while photonic switches can top the femtosecond range. From figures like these, it would only seem natural for the technology to completely replace our silicon implementation. However, some important factors continue to prevent a full optical solution.

One issue concerns the immense difficulty in cascading large numbers of logic gates together. Half-adders can currently be produced, however extending past that seems to present a greater challenge. Another issue relates to the materials used for constructing switches that demand generous amounts of photons to function properly. Such an optical force generates counterproductive signals, including stimulated Brillouin scattering and self-phase modulations.

By exploiting particles that operate at the universal speed limit, we have reached our limit for transmission speed. In addition, production of photons is far more efficient than than electrons while potentially decreasing thermal output. The technology to effectively link a necessary amount of optical logic gates together remains to be found.

Nanofluidic Computing [4]

Fluid interaction on the micro-scale can be manipulated to perform computational logic or as means of memory storage. The speed of the actions are previously known to be less optimal than current silicon technology, so its aim is simply to supplement.

One type of microfluid implemented logic gates without electrical control by using the resistance of fluid flows. Such an approach yielded the advantage of simultaneously running multiple logic gates from the same flow. Cascading gates could now be easily implemented.

Another type used a hybrid approach where an electrical current would restrict the flow of potassium ions between silicon dioxide plates. A binary state could then be represented and logic gates created from them. Essentially, the standard transistor is replaced by fluid.

Due to limitations in the technology, using it for a reliable communication line would be more likely in standard computing systems. Applications could also extend to analyzing other fluids and performing actions based on the presence of a substance. Since microfluid systems would only require small amounts of test material, they may find a place in blood tests or testing hazardous samples.

This area of research is in a premature stage, but seems to be limited by computational speed and the ability to cascade components. Perhaps the parallel capability and small sample size could prove useful in the future, but for now it appears there are no available applications.

Summary

A Trillion Crummy Components attempts to resolve a problem that we are likely to face soon. By adding redundant components accessible through the crossbar technique, we can easily compensate to maintain a functional system in spite of failure. Here we only deal with a scaling issue, while maintaining the CMOS technology. All other papers looked to completely replace CMOS, most with the goal of ground-up nanoscale construction. Some simply used various materials enabling advantages to complete the same tasks, only using different signals to represent data, i.e. photons, chemicals, fluids. One paper, quantum computing, explored an area that could completely revolutionize everything we know about computing. Quantum computing challenged the very concept of data representation with superpositions and infinite precision variables.

Out of all these radically different approaches, some can certainly be defined as more viable than others. Carbon nanotubes offer nothing but advantages over our silicon devices. Their inherently miniature size supports massive density, coupled with power efficiency, greater heat dissipation, and astounding mechanical switching speed. With more research to perfect the manufacturing process, full-scale realization should be upon us shortly.

Optical computing also offers a formidable technology we are likely to soon see. Optics are already heavily used in the computing industry for networking, data storage (CD-ROM), and scanning. Replacing communication and data representation with light significantly decreases latency and operates flawlessly in the face of electromagnetic interference. Photons offer massive bandwidth as a result of overlapping frequencies without corruption. Optical computing offers such an increase in computational speed (10⁷) that it would prove an invaluable commodity, worthy of any development cost. Cascadability appears to be the biggest challenge, but exploration into useful materials could easily resolve the problem.

The techniques discussed for crummy computing could likely appear as transitions are made to the aforementioned devices, but is restricted to the traditional silicon devices. The future appears to indicate a radical departure from the norm. Organic and nanofluid computing are hindered by a lack of speed or reliability. Finally, quantum computing suffers from a lack of large-scale bit implementations and properly applying quantum error correction. We are sure to see the birth of quantum computers, but not within this lifetime.

Future Impact

One technology stands apart from the rest to forever change computation. Quantum computing has already provided a security technology guaranteed by physics to be unbreakable. A technology that allows for accurate representation of an infinite precision variable is, without doubt, an incredible advancement. No other technology visited provides such a feature. With such claims as solving NP-complete problems in P time, computation now becomes an afterthought. Many other traditional computing problems have been redesigned to run on quantum computers, experiencing substantial decreases in runtime. With such a raw computational power, humans are offered the chance to look further into the physics governing our world through simulation. Quantum computers are sure to offer us a wealth of information and possibilities to change the way we live and think.

References

[1] Abdeldayem, H. and Frazier, D. O. “Optical computing: need and challenge.” Communications of the ACM. Special Issue: Beyond Silicon: New Computing Paradigms (Sep 2007): 60 – 62.

[2] Bacon, D. and Leung, D. “Toward a world with quantum computers.” Communications of the ACM. Special Issue: Beyond Silicon: New Computing Paradigms (Sep 2007): 55 – 59.

[3] Kong, J. “Computation with carbon nanotube devices.” Communications of the ACM. Special Issue: Beyond Silicon: New Computing Paradigms (Sep 2007): 40 -42.

[4] Marr, D. W.M., and Munakata, T. “Micro/nanofluidic computing.” Communications of the ACM. Special Issue: Beyond Silicon: New Computing Paradigms (Sep 2007): 64 – 68.

[5] Robinett, et al. “Computing with a trillion crummy components.” Communications of the ACM. Special Issue: Beyond Silicon: New Computing Paradigms (Sep 2007): 35 – 39.

[6] Stadler, R. “Molecular, chemical, and organic computing.” Communications of the ACM. Special Issue: Beyond Silicon: New Computing Paradigms (Sep 2007): 43 – 45.

Another semester down, two more to go

12 May, 2010 § 1 Comment

I’ve finished my third semester at Michigan State for my Masters in Computer Science. This past semester I took a special course on Natural Language Processing that looked at Language and Interaction and a separate course on Advanced Computer Architecture.

I plan on taking Design & Theory of Agorithms and Artificial Intelligence in the Fall, and Advanced Operating Systems and Translation of Programming Languages (aka Compilers) in the Spring.

I’m on track to finish my degree by May, 2011.

The Thirteen Dwarfs

12 May, 2010 § Leave a comment

This article was written by Brendan Grebur and Jared Wein.

Introduction

Adapting to the next step in processor technology requires a reevaluation of current computational techniques. Many-core architecture presents a fundamentally different perspective on both software and hardware systems. With no clearly effective model to follow, we are left to our own devices for advancement. A dwarf’s approach provides guidance, but does not guarantee an effective realization of the underlying issues. However, it provides a starting point to explore the interactions of data at extreme levels of parallelism. Of course, as with any method, pitfalls may arise.

This paper is a critique of the research presented by [Asanovic 06]. We present the motivation for their work, a comparison between their proposal and present-day solutions, as well as a look at advantages and disadvantages of their proposal.

Motivation

The past 16 years, from 1986 to 2002, have showed tremendous growth in processor performance. Around 2002, the industry hit a “brick wall” when it ran in to limits in memory, power, and instruction level parallelism [Hennessy and Paterson 07]. While performance gains slowed, performance requirements continued to grow at a higher pace than before.

As an example, Intel has been forecasting an “Era of Tera”. Intel claims that the future holds a time where there is demand for teraflops of computing power, terabits per second of communication bandwidth, and terabytes of data storage [Dubey 05]. To prepare for this era, Intel has claimed a “sea change” in computing and put their resources on multi- and many-core processor architectures [Fried 04].

Previously these multi- and many-core computer architectures have not been successful stays in the mainstream [Asanovic 06]. Some reason that one of the causes for poor adoption is the increased complexity when working with parallel computing. [Asanovic 06] developed thirteen independent paradigms that represent the active areas of parallel computing. After compiling their list, they compared theirs with Intel’s list and noticed heavy overlaps between the two. This reassured [Asanovic 06] that they were on the right path.

Advantages of the Dwarfs

The first advantage to the dwarfs that have been proposed is that there exists only a relatively small number of dwarfs to be concerned with. Psychology research has shown that humans can cope with understanding about seven, plus or minus two, items very easily [Miller 56]. The number of dwarfs, at 13, is not too far off from this desirable number, and allows the test space to be conceivably small. Had the number of dwarfs proposed been in the hundreds, it may become too costly to determine if a system can handle all the dwarfs.

Another advantage focuses on benchmarking software. Some of the programs for SPEC benchmarks work on computational problems that are no longer current research areas and have less of an application when determining the system performance. The dwarf approach has a goal to focus on active areas within computer science, and to adapt the focus areas over time to stay relevant.

The years since assembly programming have shown that higher level languages could be created and provide software developers with increased productivity while maintaining software quality. The goal of the parallel programming dwarfs is to extend these common problems to programming frameworks that can help software developers program at an even higher level to maintain efficiency and quality. These dwarfs provide themselves in a way that they can be looked at in the same regard as object-oriented design patterns. One such study was able to use the dwarfs to easily find places within existing sequentially-programmed applications where refactoring could make use of parallel programming solutions [Pankratius 07]. The researchers in this study were able to achieve performance improvements of up to 8x using autotuning and parallel programming patterns described by the dwarfs.

Disadvantages of the Dwarfs

There are a number of questions that haven’t been answered by the Berkeley researchers. We present a couple of the issues below that we are either unsure of or believe are disadvantages of the dwarf paradigm.

First, it is not well-explained how the combination of dwarfs is supported on a given system. When a system claims to support two dwarfs independently, there does not appear to be a deterministic answer as to if the system will support the combination of the dwarfs. Further, the interaction between these dwarfs does not appear to be well documented.

Next, [Asanovic 06] states the power of autotuning when working with sequential programming, but admits that there have not been successful implementations of an autotuner for parallel programs. The problem space for parallel programs is much larger than that of sequential programs, and various factors will have to be elided to make development of an autotuner reasonable. Further, it is unclear if the parameters used to perform the autotuning will provide the optional parameters for use when running the actual software with vastly different datasets than that of the autotuned dataset.

Last, the area of parallel programs presents a virtual graveyard of companies and ideas that have failed to reach the market successfully. The multi- and many-core architectures have failed at producing the same programmer productivity and quality that the uniprocessor delivered. Only time will tell how the field of software development reacts and if it can adopt the shift in computing paradigms.

Conclusion

Computing performance improvements have recently hit a “brick wall” and there has been a computing shift pushed by major chip makers towards multicore systems. There are many complexities with multi- and many-core systems that can lead to unappreciated performance gains. Some of the problems may reside in the complexity of implementing common parallel computing patterns.

The researchers in the Par Lab at University of California at Berkeley have been attacking the multi- and many-core problems since 2006. From their research, they have created a list of 13 dwarfs that represent active areas in parallel computing. These dwarfs provide common patterns that can be reproduced and used for benchmarking and the creation of programming frameworks that offer an abstraction of complex problems.

Citations

Asanovic et al. The Landscape of Parallel Computing Research: A View from Berkeley. Dec 2006.

Fried, I. For Intel, the future has two cores. ZDNet. 2004. http://news.zdnet.com/2100-9584_22-138241.html

Dubey, P. Recognition, Mining and Synthesis Moves Computers to the Era of Tera. Technology@Intel Magazine. Feb 2005.

Hennessy, J. and Patterson, D. Computer Architecture: A Quantitative Approach, 4th edition, Morgan Kauffman, San Francisco, 2007.

Miller, G. A. “The magical number seven, plus or minus two: Some limits on our capacity for processing information”. Psychological Review. Vol 63, Iss 2, pp 81–97. 1956.

Pankratius et al. Software Engineering for Multicore Systems – An Experience Report. Institute for Program Structures and Data Organization. Dec 2007.

Moving Objects Through A Terrain Using Tiled Velocity Fields

16 December, 2009 § 2 Comments

In 2004, Stephen Chenney of the University of Wisconsin published a paper in SIGGRAPH Animation titled “Flow Tiles” that brought the idea of terrain editing to scenarios that require objects to flow through a multi-dimensional space.

As part of my Advanced Computer Graphics course (CSE872) at Michigan State University, two classmates and I implemented this paper. Please watch our video below and let me know what you think of it in the comments section (you can compare it to Stephen’s results):

The video was recorded with TechSmith’s Jing Pro and I used a free text-to-speech system called vozMe to create the narrative of the video since I didn’t have a microphone at hand. I then used ffmpeg to merge the audio and video together to create the final output video.

The source code for this project is available at Google Code. If you would like to help reduce some of the technical debt that exists in the code, I would be happy to help you with any questions ;)

Where Am I?

You are currently browsing entries tagged with Research at JAWS.

Follow

Get every new post delivered to your Inbox.

Join 982 other followers