Research and Theses Completed with
Duane Bailey
Williams College provides students an opportunity to work on research
with a faculty member, often as part of an honors thesis project. These
theses were completed with Duane Bailey, and are available through interlibrary
loan from Williams College.
Jump to author: Becker, [Bernheim] Brush, Burbage, de Carvalho, Chaffin, [Minnick] Cornett, Cox, Cyll, Effinger-Dean, Grassia, Healy, Hirshman, Kang, Leonard, Lindsey, Liu, McLaughry, Munson, Rubin, Sandys, Templeton '16, Templeton '20, Timilsina, Trepte, Zhu
A Private, Associative Memory Alternative for RISC Systems
Madeline Burbage
The concealed inner workings of today's memory hierarchies lead to surprising side effects, in security and performance, for otherwise simple programs on general-purpose machines. For situations where this is untenable, developers may prefer access to an alternative memory system where software has control over the movement of its data. To explore this concept, we designed and implemented the Scratchpad Accelerator, a configurable, associative scratchpad memory for the RISC-V Rocket Chip. We evaluated the security and performance of the device after adding it to lowRISC, a version of the Rocket Chip that has been programmed to an FPGA board and runs full-stack Linux. We show that the Scratchpad Accelerator provides data isolation and uniform memory access, providing a protected, predictable memory system for developers. Additionally, its associativity and structure inspire effective memory models for common data structures in software development. The general-purpose memory acceleration provided by the Scratchpad Accelerator endows software with the important ability to gain freedom and control over its data.
PDF
SOAR: a Self-Optimizing Adaptive SoC on FPGAs
Minwoo Kang
With the power utilization wall imposed on modern microprocessors,
architectural research has turned to the use of specialized
hardware as a source of continued performance scaling. As a
result, System-on-a-Chip (SoC) architectures containing hardware
accelerators have gained signficant traction over the past years.
Unfortunately, each accelerator has a specific set of tasks it can
execute, and accelerator-based systems are often incapable of
out-performaning general-purpose processors outside their target
application domains. In this work, we propose SOAR, a
reconfiguratble architecture that extends the range of hardware
acceleration by adaptively tailoring its configuration of
acceleartators to the running workload. The SOAR hardware is
built as a 64-bit RISC-V SoC with a collection of RocCC
accelerators attached to a scalar, in-order Rocket core.
Evaluation of the design implemented on a Xiling Artix-7 FPGA
under RSIC-V Linux demonstrates that our RoCC accelerators can
return speedups up to 10x and energy-efficiency improvements up to
170x over equivalent software implementations. Along with SOAR
hardware, we also introduce a software infrastructure that infers
dynamic power demands and acceleartor utilization frequencies to
determine the optimal configuration of on-chip accelerators. We
propose the use of dynamic dispatch for switching function
execution between using hardware and using software library calls.
Finally, we explain how our proposed framework can be customized
to enhance either energy-efficiency or performance, or any
arbitrary optimization goal.
PDF
Inherently Interpretable Sparse Word Embeddings through Sparse Coding
Adly Templeton '20
Word embeddings are a powerful natural language processing
technique, but they are extremely difficult to interpret. In
order to create more interpretable word embeddings, we transform
pretrained dense word embeddings into sparse embeddings. We
construct these embeddings through sparse coding. By inherently
interpretable, we mean a system where each dimension is
mechanically associated with some hint that can capture the
meaning of that dimension. We show that models trained using our
sparse embeddings can achieve good performance and are
interpretable.
PDF
Sharing Silicon: Improving FPGA Accessibility
Matheus de Carvalho '18
Field Programmable Gate Arrays (FPGAs) are
hardware circuits that can be programmed to perform any digital
circuit functionality. This flexibility makes them a useful tool in
improving system and application throughput and energy efficiency.
In this way, FPGAs can be part of the solution in rethinking our
current model of computation to better support high performance
computing and avoid Dark Silicon. In this work, we attempt to
improve FPGA accessibility and offer motivation for a more
widespread adoption of thise circuits by building on previous
efforts in this area, with special emphasis to the Xilinx PYNQ
FPGA+CPU development board. We explain in detail how to generate a
working FPGA circuit for the PYNQ board from a software-specified
function using the Xilinx Vivado Design Suite. We offer advice on
efficient FPGA design, and demonstrate the acceleration potential of
FPGAs: we find that an FPGA-implemented function runs up to 120x
faster than its Python counterpart. We also demonstrate the
potential of FPGAs to implement conservation cores by explaining the
design of an FPGA circuit that supports five simultaneous cores that
can be switched at runtime.
PDF
A Game of Life on Penrose tilings
with Kathryn Lindsey
We define rules for cellular automata played on quasiperiodic tilings of the plane arising from the multigrid method in such a way that these cellular automata are isomorphic to Conway's Game of Life. Although these tilings are nonperiodic, determining the next state of each tile is a local computation, requiring only knowledge of the local structure of the tiling and the states of finitely many nearby tiles. As an example, we show a version of a "glider" moving through a region of a Penrose tiling. This constitutes a potential theoretical framework for a method of executing computations in non-periodically structured substrates such as quasicrystals.
PDF@arXiv:1708.09301
Potential Adaptations to Artificial Chemistry
Adly Templeton '16
We propose the creation of cells within a modified version of Hutton's
enzyme artificial chemistry. First, we define some types of atoms as
active, and some as inactive. This allows us to create genetic
material distinct from enzymes. Different types of atoms, which
were originally used to encode genetic information, can now be used to
encode the target of the enzymes: inside the cell, outside the cell,
or in the membrane.
We also increase the number of reactions stored per atom. This sharply
cuts the number of genes needed and, therefore, the computation
required. This change also opens up possibilities for horizontal gene
transfer between organisms. A single atom, analogous to a plasmid of
bacteria, which contains all enzymes needed for a behavior, can be
passed as a unit between organisms. Similarly, we can construct
malicious viruses and other organisms of this sort.
Mutation is fundamental to evolution, and the mechanism of mutation
needs to be both powerful and directed. The new genetic mechanisms
allow us to tightly control mutation when reaction data is copied. The
exact mechanism of mutation is well suited to experimentation, but
there are some factors to consider. Each field of the reaction can
have a specific chance of mutation. Some prediction of useful
reactions may also be helpful: for example, the value for
post-reaction states can frequently mutate to the pre-reaction
state. Another potentially useful factor is the ability to inactivate
a reaction: this allows for increased genetic variability through
silent mutations, similar to unexpressed biological genes.
With these modifications, we hope to create a vibrant virtual
ecosystem. This system allows many sorts of competition, information
transfer, predation and parasitism. Combining these elements together
will create a system that models with some complexity the interactions
of real organisms.
PDF
Faux Vector Processor Design Using FPGA-Based Cores
Diwas Timilsina '16
In this work, we exploit these advantages of FPGAs to design an
architecture that uses general purpose processor tightly coupled with
a vector processor implemented on an FPGA. Previous work by Leonard
demonstrated possible energy efficiency with FPGA coprocessor design.
We extend his work to design a power sensitive vector processor.
We believe that an FPGA-based vector processor will achieve energy
savings comparable to conservation cores. While FPGA-based design
seems promising, the engineering work remains a prohibitive obstacle
to success. In previous works, one of the main challenges has been the
usability of the software supporting the FPGAs. We aim to create
an architecture that can run and synthesize circuits on FPGA without
relying on traditional workflow.
PDF
Computation in the Wild: Reconsidering Dynamic Systems in
Light of Irregularity
Tony Liu '16
This work would pursue a deeper exploration of irregular computing
CAs. We plan on conducting a series of experiments on a spectrum of
automata with irregular grids and connectivities, quantifying their
behavior and comparing them to traditional CAs. As
computation appears to emerge at a point of criticality somewhere in
between the ordered and chaotic automata and Langton's $\Lambda$
parameter, representing the relative order of a CA, is used to
parametrize the possible automata space. $\Lambda$ is important because
it is an indicator of the class of CAs that have the ability to
compute. However, this measure is tuned specifically for static
neighborhood definitions and uniform grids. With the experimentation
on irregular automata, the goal is to explore and develop a more
general notion of $\Lambda$ and other metrics so that we can better understand
and perhaps even quantify the conditions in which computation can
emerge in noisy systems. Another important question we hope to address
with this work is whether there is a qualitative difference between
regular and irregular grid patterns and connectivities: are there
cases where uniform CA models are sufficient for representing
biological systems?
We believe that the study of CA behavior in irregular environments is
critical to achieve a greater understanding of how biological systems
combat imperfections. Ultimately, the contribution of work on natural
computational systems is twofold: not only can we achieve a better
understanding of how some biological processes operate, but knowledge
of how these systems work can inspire alternative computing
methods. The hope is to illuminate how nature is able to perform
complex computation in noisy environments and apply these lessons to
advance future computing models.
PDF
Using Reconfigurable Hardware to Fight Dark Silicon
Isaiah Leonard '15
After four decades of exponential improvement, computing performance
has hit a wall. Power constraints and physical limitations have led to
the rise of dark silicon: most modern processors can run only 3\% of
their transistors at maximum frequency. Pairing reconfig- urable
hardware with a general purpose processor (GPP) to support partnered
computa- tion can yield performance and efficiency
improvements. Because software processors waste roughly 90\% of their
energy on instruction interpretation, customized hardware can signif-
icantly decrease the energy utilization of targeted computations. In
this work, we explore the possibility of using a GPP tightly coupled
to a dynamically reconfigurable co-processor to provide efficient,
general purpose, partnered computation.
PDF
HYDRA: A CUDA-Based Tool Supporting Error-Resilient GPU Computation and General Purpose Heterogeneous Programming
Gregory Becker '15
The large-scale supercomputers popular in high performance computing
require support for error-resiliency to protect against soft errors.
We present Hydra, a scalable, CUDA-based tool supporting reslience, as
well as programming tools that leverage that framework for other
heterogeneous programming tasks. Hydra executes CUDA kernels
redundantly on both CPU and GPU to leverage the parallelism of the
heterogeneous architecture. To support redundancy with minimal
additional programmer effort, Hydra provides wrappers for the CUDA
functions that transfer data between GPU and CPU. Our results
indicate that Hydra redundancy can be an efficient model for
error-resilience in heterogeneous architectures, as well as a useful
tool in heterogeneous programming.
PDF
Visualizing Large Communication Graphs
Steven S. Rubin '11
Communication graphs induced by parallel programs can be overwhelming in size, but are highly structured. For example, trees, grids, and n-cubes are common. In order to understand program behavior, we seek informative graph visualizations that reduce visual complexity. We leverage the concept of ellipses (...)
, an intuitive symbol that abstracts redundant structure.
Using graph grammars coupled with specific procedural methods, we create large, structured graphs of parallel program communication. These graphs that have prac- tical realizations in two dimensions can be visually simplified using an algorithm that finds contextual balls around important nodes. Graphs that are harder to visualize in two-space are more difficult to simplify. This work presents metrics and layout tech- niques that may aid in the visualization of such graphs. We evaluate these metrics by developing an ideal for want to see in abstracted visualizations of large, structured graphs. The metrics and layout techniques are implemented in Gephi, an open-source large graph visualization tool.
Our large graph visualizations are important tools for helping programmers understand interprocess communications of their parallel programs. This work establishes some prototypes for visualization that in the future may harnessed in the design of effective parallel programming environments.
PDF
The Empire Problem in Penrose Tilings
Laura Effinger-Dean '06
Nonperiodic tilings of the plane exhibit no translational symmetry. Penrose tilings are a remarkable class of nonperiodic tilings for which the set of
prototiles consists of just two shapes. The pentagrid method, introduced by
N.G. de Bruijn, allows us to generate Penrose tilings by taking a slice of the
integer lattice in five-dimensional space. The empire problem asks: Given a
subset of a Penrose tiling, what tiles appear in all tilings that include that sub-
set? We present a new approach to the empire problem that uses the pentagrid
method to identify elements of the empire.
PDF
Virtual Machines: Features and Futures
Brian Hirshman '06
As yet, there is no formal science of designing virtual
machines. Instead, virtual machine design has been ad hoc, a reaction
to the current mismatch between hardware and software. This thesis
develops a specification for a thin, general-purpose virtual machine
that is sensitive to the needs of programmers as well as hardware
designers. Some readers may consider this thesis to be a blueprint for
a novel target virtual machine. Others may see this specification as a
model for future hardware. By studying the ++VM virtual machine,
introduced in later chapters, this thesis examines the strengths and
establishes goals of virtual machine design.
PDF
Cache-Conscious Dynamic Memory Allocation
Topher Cyll '04
Today, almost all software uses dynamic memory allocation. In some
programming languages, it is impossible to even write "Hello World!"
without implicitly calling the allocator. However, the costs of
dynamic memory allocation on cache performance are relatively
unstudied. This thesis explores the e ects of dynamic memory
allocation on cache performance, as well as methods to make memory
allocators cache-conscious.
We use profling techniques to build custom memory allocators. These
allocators contain statically generated memory layouts that arrange
memory in a cache-conscious manner for previously analyzed
workloads. Using a variety of techniques to build memory layouts, we
demonstrate that such layouts can dramatically a ect program speed and
cache miss rate. We nd that some layout strategies speed up certain
programs by as much as 28% while others cause certain programs to run
up to 2.6 times slower. Additionally, we provide an open source
foundation which is fully compatible with all standard C allocation
functions in order to aid future research.
PDF
Search for the Aperiodic Tile
Feng Zhu, '02
In 1996, Gummelt identified a single decagon that
covering the
plane aperiodically. The covering allowed tiels to overlap in five
different ways. In this work we develop a partitioning of the regions
of the decagon such that points in different partitions never collide
in an overlap. Such a partioning is achieved by a special coloring process
based on infinite decomposition of the Robinson tiles in a Canto-like
construction. We show that the tile has positive measure everywhere and
that the final tiling is aperiodic and dense in the plane.
PDF
A related paper, by Bailey and Zhu, PDF.
Caching and the Java Virtual Machine
M. Art Munson, '01
Most of the research done to improve the performance of Java Virtual Machines
(JVM's) has focused on software implementations of the JVM specification. Very
little consideration has been given to how Java programs interact with hardware
resources and how hardware components can be used to improve Java performance.
We generated and analyzed opcode and memory access trace for eight different
Java benchmarks, finding that each Java opcode typically causes multiple
memory accesses. We next investigated the effectiveness of unified and split
caches at caching Java programs, using the performance of traditional compiled
programs as a basis of comparison.
Motivated by the subdivision of memmory by the JVM specification into heap memory,
constant pool data, and operand stacks, we examined the possible benefits of
adding small specialized caches to hold constant pool data and stack data.
We found that constant pool accesses have very low locaality of reference and
that the addition of a constant cache is detrimental to cache performance for
large main caches. The 24% of memory access made to operand stacks were
found to be efficiently cached in a 64 byte cache. We simulated the cache
performance of Java programs in the presence of registers by adding a register
cache to unified and split caches. We found that given fair hardware resources,
Java programs cached at least as well as traditional programs.
PDF
Automatic Generation of Penrose Empires
Jason B. Healy '00
Penrose tiles are infinite tilings that cannot tile the plane in a periodic
manner. They are of interest to researchers because they may model
a new type of matter called \emph{quasicrystals}. Being able to
easily generate and manipulate Penrose tiles makes them easier to study,
and that is the primary goal of this research. Although Penrose tiles are
infinite, we would like to be able to store them in a finite device,
such as a computer, so that they may be more easily studied. The
primary aim of this research is to attempt to discover some of the
underlying structure of Penrose tiles, so that the tilings can be reduced
and stored in an efficient, finite manner.
In addition to being able to store tilings, this research also aims
to learn more about constructing the tilings themselves. Because we are
reducing an infinite amount of data to a finite amount of
information, there will necessarily be some computation required to
reconstruct the information that is not explicitly stored. This
research will investigate methods of constructing Penrose tiles from a
small amount of initial information.
The algorithms developed and discussed in this work allow us to represent
Penrose tiles as a small set of finite variables and then compute any arbitrary
information about the tiling on demand. In this way, we can reconstruct
as much of any tiling that we desire, without needing to store the entire
tiling. It is hoped that the methods used will allow more detailed study
of Penrose tiles to take place in future research.
PDF
(Jason's more recent work on generating empires is found here.)
Exploring the Danzer Tiling
Benjamin C. Chaffin, '98
In 1989, Danzer presented a set of four tetrahedra which, along with
their mirror images, form aperiodic tilings of space when assembled according
to certain matching rules. Though not the first known aperiodic set in
three dimensions, the Danzer tetrahedra are attractive for a number of
reasons, including several similarities to the two-dimensional Penrose
tiles and Robinson triangles.
We present an expanded version of Danzer's paper[Danzer89], giving explanations
and proofs and removing some of the notation to provide a more readable
and comprehensible exposition of these tiles and their basic properties.
The tetrahedra in a Danzer tiling always meet face-to-face and vertex-to-vertex.
Thus the next fundamental unit of the tiling beyond the tiles themselves
is the vertex configuration, a ball of tiles clustered about and sharing
a central vertex. Danzer's paper states that there are exactly 27 vertex
configurations. We give a practical algorithm to compute a complete ``atlas''
of vertex configurations, and show that there are, in fact, 174.
Every patch in a Danzer tiling is duplicated an infinite number of times
in every tiling, and the distance to the nearest identical copy is within
a bounded multiple $\gamma$ of the radius of the patch. Using a computational
approach, we show how to place a bound on $\gamma$.
PDF
Generalized Forcing in Aperiodic Tilings
Linden [Minnick] Cornett, '98
An aperiodic tiling is one which uses a finite set of prototiles to
tile space such that there is no translational symmetry within any one
tiling. These tilings are of interest to researchers in the natural sciences
because they can be used to model and understand quasicrystals, a recently-discovered
type of matter which bridges the gap between glass, which has no regular
structure, and crystals, which demonstrate translational symmetry and certain
rotational symmetries. The connection between these two areas of research
helps to create a practical motivation for the study of aperiodic tilings.
We plan to study both theoretical and applied aspects of several types
of aperiodic structures, in one, two, and three dimensions. Penrose tilings
are tilings of the plane which use sets of two prototiles (either kites
and darts, or thin rhombs and thick rhombs) whose edges are marked in order
to indicate allowed arrangements of the tiles. There is a small number
of legal vertex configurations, and most of these force the placement of
other tiles which may or may not be contiguous with the original ones.
It is somewhat intuitive that in most cases the placement of a group of
tiles forces the placement of some set of adjacent tiles. It is much more
difficult to believe that many groups of tiles also force the placement
of infinitely many non-adjacent tiles. This phenomenon has implications
which may be important in designing and implementing an efficient data
structure to explore and store information about aperiodic tilings.
For the most part, we plan to study forced tiles in two-dimensions,
although we believe that this theory may be generalized to higher dimensions.
We will use musical sequences and Ammann bars, concepts established by
Conway and Ammann, to develop an algorithm for predicting the placement
of fixed bars given an initial sequence of parallel bars. This, in turn,
may be used to fix the positions of tiles and clusters of tiles within
any legal tiling. Our goal is to determine a method for predicting the
position of infinitely many forced tiles given an initial cluster of tiles.
These descriptions will be partly grammar-based inflations of musical sequences,
and partly algebraic solutions, and will themselves be aperiodic.
PDF
Exploring Aperiodic Tilings with Inflation
Forrest Trepte, '96
This work focuses on realities of implementing an abstract datatype
that traverses infinite aperiodic tilings of the plane and three-space.
The approach is to develop a generalized mechanism for specifying finite
sets of prototiles that tile space nonperiodically. When the tiling exhibits
an invertible inflation property, tiles may be addressed in a way that
supports arbitrary, possibly oriented, wanderings through space. Techniques
are presented that can be used to discover lines of symmetry in a tiling
and to construct an atlas of its local configurations. Also developed are
a number of theoretical results about this class of tilings, including
a concise statement of a local isomorphism theorem. This pretty result
suggests that any finite portion of a specific tiling by prototiles appears
infinitely often in all other tilings by the same set of tiles.
Our analysis is inspired by the index sequences that are commonly used
to uniquely specify tilings. An index sequence identifies a tiling about
a point by locating the point within an infinite hierarchy of self-similar
tilings known as inflations. Our work builds an addressing system with
index sequences of finite rather than infinite length and uses addresses
to represent the dual graph of a patch of tiles. An $n$-digit address identifies
a tile within n inflations of some prototile. We prove that any tiling
represented in our form can tile the plane or space and must do so nonperiodically.
An algorithm is constructed to find the addresses neighboring any given
tile. Appending digits to an address extends the currently represented
region, placing it within a larger patch to simulate the exploration of
an infinite expanse of tiles. The information maintained grows as needed
and the number of digits stored is logarithmic in the number of tiles in
the patch represented. Our system could be used to investigate very large
quasiperiodic tilings and the principles behind the data structures illuminate
many beautiful properties of the tilings themselves.
PDF
A Debugger-Friendly Tuple Space
A. J. [Bernheim] Brush, '96
Many of the performance features of parallel programs, including massive
concurrency and nondeterminism, make debugging difficult. To study
this issue we have continued to extend Linder, a public domain programming
environment supporting Gelernter's tuple space programming model.
The first section of this thesis develops protocols to support effective
distribution of tuples among disparate processes. This distribution
has potential performance advantages, but also complicates the coordination
of processes.
In the second section, we present protocols that support centralized
debugging of tuple space programs. The Linder Tuple Space Debugger
(LDB) is a textually-based interactive debugging environment. LDP
allows users to set breakpoints on tuples, query the tuple space, and examine
the communication history of the system. Users can graphically animate
their program's history at breakpoints and after execution using ORNL's
ParaGraph, a graphical visualization system. With its unique mixture
of history information and interactive control we believe novice users
will find LDB a conductive debugging environment.
TS++: Communication Specification using Path
Expressions in the Tuple Space Programming Model
Stephen McLaughry, '95
The tuple space model provides a simple, flexible means of specifying
interprocess communication for parallel programming. However, this
flexibility can result in incorrect programs, that are then extremely difficult
to debug. We discuss a few extensions to the tuple space model, that
will permit the programmer to specify information about the indented communication
structure of a program. This will reduce the chance of error and
simplify debugging tasks, without greatly changing the paradigm.
Support for Dynamic Itinerary-Based Communication
Sean David Sandys, '94
Current technologies for programming massively parallel machines do
not provide the programmer with the appropriate abstractions for communication.
The Canister System was developed to allow programmers to compose point-to-point
messages into communication patterns, called itineraries. While programmers
may find this abstraction more appropriate for many algorithms, itineraries
suffer from the fact that they must be statically determined at compile
time.
This work presents an extension to itinerary-based specification in
the Canister System that allows the programmer to specify dynamic process
instantiation and itinerary extension in a controlled manner. This
mechanism, while not universal, allows programmers to specify more algorithms
safely using a more appropriate level of communication abstraction.
GraPPLe: A Graphical Parallel Programming Language
[Frank] Sebastian Grassia, '93
Program Composition Notation (PCN), a language for parallel programming,
provides a precise, compact, and efficient paradigm for parallel programming.
However, several aspects of programming in PCN are nonintuitive, even to
experienced programmers. We believe that a visual programming language
that draws on PCN as its paradigm can address PCN's non-intuitiveness and
exploit graphical methods to make realization of communication channels
and algorithms much easier.
This research focuses on the specification and implementation of a new
graphical parallel programming language (GraPPLe). Key considerations
in the design include: the exploitation of graphics to evince the fact
that parallel programming is actually a necessary first step in sequential
programming, the goal of providing a framework that maximizes implicit
parallelism, and the interchangeability of visual and textual forms of
expression. Our hope is that, due to the strong correspondence between
GraPPLe and its underlying textual language, the system will be of use
as a teaching aid, as well as a viable development tool for parallel programmers.
Parallel Algorithm Animation
Michael Cox, '91
This thesis is concerned with the animation of algorithms for tightly
coupled multiple-instruction, multiple-data computers for the purpose of
visualizing the communication patterns which occur during execution of
the algorithm. In order to understand how a tightly coupled algorithm
operates it is essential to understand the interprocess communication which
occurs during execution. An animation of the algorithm can provide
a global view of these patterns in a form easily comprehended by the user.
There have been recent attempts to organize the record of communication
into a hierarchy of abstract user-defined communication events to increase
the viewer's ability to comprehend the display. However these systems
contain a great deal of complexity due to their attempt to describe the
algorithm at a global level. The nondeterminism inherent in parallel
programs makes the specification of a global event hierarchy very difficult.
This thesis proposes an alternative paradigm for specifying the event hierarchy,
attempting to reduce the complexity of the specification by working on
a per-process basis, creating a global view of the algorithm only after
the high level events have been recognized.