MY SCIENTIFIC
AND TECHNICAL PRODUCTION
I can summarize my HPC experience as a non linear trip from
problem formulation to efficient implementation on various
platforms, with different algorithmic and computation paradigms
in between. Keep in mind that there is always an effort (typical and/or personal) to
switch for a moment to another fundamental approach or context
application. At the price of the consequent step-by-step
intellectual and time overhead, due to the need to move from an
intermediate level to a more mature state for each, the trip
reveals rewarding for me so far, since all these skills are indeed
additive. By the way, although being a general consensus, learning
and investigating really appear incremental to me.
Back
to Homepage
-
Claude Tadonki, and Bernard
Philippe, Parallel multiplication of a vector by a Kronecker product of
matrices, Parallel
Distributed Computing Practices PDCP, volume 2(4),
1999.
Abstract. We provide a parallel algorithm for the Kronecker product of matrices based on a cyclic partition of loops. Our general model unifies one scheme with redundant computations but no communication and an opposite scheme without redundant computation but with interprocessor communications. Both schema can be mixed with a certain balance influenced by the volume of floating point operations and the characteristics of the target architecture. This work has valuated experimental validations ({\small CENJU and INTEL PARAGON}).
-
Claude Tadonki, and Bernard
Philippe, Parallel multiplication of a vector by a Kronecker product of
matrices (part II), Parallel Distributed Computing Practices PDCP, volume 3(3),
2000.
Abstract. This paper presents a revised and generalized version of our previous parallel algorithm for the Kronecker product of matrices. We show how to map the computation on any number of processors that is a divisor of the product of the matrices sizes (instead of being a prefix of this product as previously). Moreover, we show that the minimum number of parallel communications with p processors is log(p) whatever the algorithm, and that our algorithm achieves this optimal performance. The work is validated by significant efficiencies obtained from experimental measures on the {\small CRAY} machine.
-
Sanjay Rajopadhye, Tanguy
Risset, et Claude Tadonki, Algebraic Path Problem on linear arrays,
Techniques et Sciences Informatiques TSI, 20 (5), 2001.
Abstract. We seek a linear SPMD implementation of the Warshal algorithm for the Algebraic Path Problem (Unified model of the transitive closure, shortest paths, Gauss elimination, ...). Our parallel algorithm is systolic-like in its original version, and offers the important advantage of being able to run on a shorter number of processors by a natural round- robbing remapping. We derive and validate a blocked version for standard distributed memory parallel machines, as the cost of data communication would be severe otherwise. We show experimental validations on the {\small CRAY} machine.
-
Claude Tadonki, A
Recursive Method for Graph Scheduling, Scientific Annals of Cuza University,
Vol 11, p.121-131, 2002
Abstract. This paper presents a recursive graph scheduling method. Our paradigm applies on any acyclic graph that can be partitioned into isomorphic subgraphs. Indeed, many common problem domains are 3D graphs that can be organized as a chain of isomorphic 2D subgraphs. Starting from any valid schedule of the source subgraph of the chain, we use the generic isomorphism to systematically derive a valid global schedule, with a local pipeline between two consecutive subgraphs. Our technique is then characterized by the nature of the partition and the source schedule. We discuss the impact of these characteristics on the complexity on the generated parallel schedules.
-
C. Beltran, C. Tadonki and J.-Ph. Vial,
Solving the p-median problem with a semi-Lagrangian relaxation, Computational Optimization and
Applications, Volume 35(2), October 2006. (pdf)
Abstract. This paper deals with operation research and non differentiable optimization. The so-called P-median problem is the problem of locating P "facilities" relative to a set of "customers" such that the sum of the shortest demand weighted distance between "customers" and "facilities" is minimized. Indeed, this a classical combinatorial optimization problem with a huge set of potential solutions. Using a semi-Lagrangian relaxation, we tackle the problem in its associated continuous formulation and report our non-smooth convex optimization engineering results.
-
F. Babonneau, C. Beltran, A. Haurie, C. Tadonki and J.-P. Vial,
Proximal-ACCPM: a versatile oracle based optimization
method, Computational and Management
Science, Volume 9, 2007. (pdf)
Abstract. Oracle Based Optimization (OBO) conveniently designates an approach to handle a class of convex optimization problems in which the information pertaining to the function to be minimized and/or to the feasible domain takes the form of a linear outer approximation revealed by an oracle. We show how difficult problems can be cast in this format, and then solved within our context. We present our method, so-called Proximal-ACCPM, to trigger the OBO approach and give a snapshot on numerical results. This paper summarizes several contributions with the OBO approach and aims to give, in a single report, enough information on the method and its implementation to facilitate new applications.
-
C. Tadonki,
Mathematical and Computational Engineering in X-Ray Crystallography, International Journal of Advanced Computer Engineering, volume 1(2) 2008.
Abstract. The main purpose of X-Ray Crystallography is to predict a macromolecular structure using from X-Ray synchrotron radiation. Among existing paradigms to achieve this task, analytical approaches come up as good candidates for automation trough mathematical approximations and computational engineering. In addition to the later, statistical processing are required in order to refine the data according to the physical model and the boarding effects of the experiments. We revisit the basis of the problem and focus on a more precise effect, so-called radiation damage, for which it has been also proven that it can be artificially managed to become instructive.
-
T. Saidani, L. Lacassagne, J. Falcou, C. Tadonki, Samir Bouaziz, Parallelization Schemes for Memory Optimization on the Cell Processor : A Case Study on the Harris Corner Detector, Transactions on High-Performance Embedded Architectures and Compilers, volume 3(3) 2008.
Abstract. The Cell processor is a typical example of heterogeneous multiprocessor on-chip architecture that uses several levels of parallelism to deliver high performance. Although its efficiency potential, the execution mode and part of hardware specificities make it being non trivial to deal with. Indeed, reducing the gap between peak performance and effective performance is the challenge for compiler design and efficient implementations. Image processing and media applications are typical "main stream" applications one could consider while investigating on Cell benchmarks. Our investigations, trough various implementation of the Harris detection algorithm, reveal that the impact of DMA controlled data transfers and synchronizations between SPEs are key points for global performance.
-
C. Tadonki,
Integer Programming Heuristic for the Dual Power Setting Problem in Wireless
Sensors Networks, Int. Journal of Advanced Research in Computer Engineering, vol 3(1) 2009.
Abstract. We seek an integer programming based heuristic for solving the dual power management problem in wireless sensor networks. For a given network with two possible transmission powers (low and high), the problem is to find a minimum size subset of nodes such that if they are assigned high transmission power while the others are assigned low transmission power, the network will be strongly connected. The main purpose behind this efficient setting is to minimize the total communication power consumption while maintaining the network connectivity. In a theoretical point of view, the problem is known to be difficult to solve exactly. An approach to approximate the solution is to work with a spanning tree of clusters. Each cluster is a strongly connected component when consider low transmission power. We follow the same approach, and we formulate the node selection problem inside clusters as an integer programming problem which is solved exactly using specialized codes. Experimental results show that our algorithm is efficient both in execution time and solution quality.
-
C. Tadonki, G. Grodidier, O. Pene,
An efficient CELL library for lattice quantum chromodynamics, ACM SIGARCH Computer Architecture News, vol 38(4) 2011.
Abstract. Quantum chromodynamics (QCD) is the theory of subnuclear physics, aiming at modeling the strong nuclear force, which is responsible for the interactions of nuclear particles. Numerical QCD studies are performed through a discrete formalism called LQCD (Lattice Quantum Chromodynamics). Typical simulations involve very large volume of data and numerically sensitive entities, thus the crucial need of high performance computing systems. We propose a set of CELL-accelerated routines for basic LQCD calculations. Our framework is provided as a unified library and is particularly optimized for an iterative use. Each routine is parallelized among the SPUs, and each SPU achieves it task by looping on small chunk of arrays from the main memory. Our SPU implementation is vectorized with double precision data, and the cooperation with the PPU shows a good overlap between data transfers and computations. Moreover, we permanently keep the SPU context and use mailboxes to synchronize between consecutive calls. We validate our library by using it to derive a CELL version of an existing LQCD package (tmLQCD). Experimental results on individual routines show a significant speedup compare to standard processor, 11 times better than a 2.83 GHz INTEL processor for instance (without SSE). This ratio is around 9 (with QS22 blade) when consider a more cooperative context like solving a linear system of equations (usually referred as Wislon-Dirac inversion). Our results clearly demonstrate that the CELL is a very promising way for high-scale LQCD simulations.
Conférence/Symposium/Workshop Papers (grouped by topic)
1. Parallel Computing (algorithm, scheduling, complexity, implementation, dynamic system)
Overview. This subset of my outputs is related to general purpose parallel computing. It includes adhoc parallel algorithms, methodology for systematic parallel scheduling, efficient parallel implementation, and parallel dynamic systems. My actual focus in this topic is on the design and analysis of powerful methodologies specific to multicore processors and accelarators based architectures (CELL, GPU, ...), both for domain specific considerations and a wider audience. I keep investigating on fundamental aspects, since new hypothesis came up with emergent architectures and the increasing and pervasive HPC demand.
-
Claude Tadonki,
Système
d'équations récurrentes et multiplication parallèle d'un vecteur par un
produit tensoriel de matrices,
Rencontres
Francophones de Parallelisme Renpar'11,
Rennes (France), 1999.
-
Sanjay Rajopadhye, Tanguy
Risset, et Claude Tadonki,
The algebraic path problem revisited,
European Conference on Parallel Computing
Europar99,
Toulouse (France), Lncs Sringer-Verlag, N° 1685, p. 698-707, August 1999.
-
Claude Tadonki,
Ordonnancements
canoniques,
Renpar12,
Rencontres Francophones de Parallelisme, Besançon (France), Juin 2000.
-
Claude Tadonki,
Parallel
Cholesky Factorization,
Parallel
Matrix Algorithms and Appliations PMAA
Worshop, Neuchatel (Switzerland), August 2000.
-
Claude Tadonki, et Bernard
Philippe,
Méthodologie de conception d'algorithmes efficaces pour le
produit tensoriel,
CARI2000,
Tananarive (Madagascar), Octobre 2000.
-
Patrice Quinton, Claude
Tadonki, et Maurice Tchuente,
Un échéancier systolique et son
utilisation dans l'ATM,
CARI2000,
Tananarive (Madagascar), Octobre 2000.
-
Claude Tadonki,
Complexité
des ordonnancements canoniques et dérivation d'architecture,
Rencontres Francophones de Parallelisme Renpar13,
Paris (France), Avril 2001
( get it! ).
-
Claude Tadonki,
A
Recursive Method for Graph Scheduling,
International Symposium on
Parallel and Distributed Computing (SPDC),
Iasi, Romania, July 2002 ( get it! ).
-
R. Ndoundam, C. Tadonki, and M. Tchuente,
Parallel chip firing game associated with n-cube orientation,
International Conference on Computational Science,
ICCS04 (LNCS/Springer), Krakow, Poland, June
2004 .
-
T. Saidani, J. Falcou, C. Tadonki, L. Lacassagne, and D. Etiemble,
Algorithmic Skeletons within an Embedded Domain Specific Language for the CELL Processor,
Parallel Architectures and Compilation Techniques (PACT),
PACT09, Raleigh, North Carolina (USA), September 12-16, 2009. (pdf)
-
C. Tadonki, G. Grosdidier, and O. Pene,
An efficient CELL library for Lattice Quantum Chromodynamics,
International Workshop on Highly Efficient Accelerators and Reconfigurable Technologies (HEART) in conjunction with the 24th ACM International Conference on Supercomputing (ICS), pp. 67-71,
Epochal Tsukuba, Tsukuba, Japan, June 1-4, 2010. (ACM Computer Architecture News)
-
C. Tadonki, L. Lacassagne
T. Saidani, J. Falcou, K. Hamidouche,
The Harris algorithm revisited on the CELL processor ,
International Workshop on Highly Efficient Accelerators and Reconfigurable Technologies (HEART) in conjunction with the 24th ACM International Conference on Supercomputing (ICS), pp. 97-100,
Epochal Tsukuba, Tsukuba, Japan, June 1-4, 2010. (ACM Computer Architecture News)
-
C. Tadonki,
Ring pipelined algorithm for
the algebraic path problem on the CELL Broadband Engine,
Workshop on Applications for Multi and Many Core Architectures (WAMMCA 2010) in conjunction with the International Symposium on Computer Architecture and
High Performance Computing (SBAC PAD 2010),
Petropolis, Rio de Janeiro, Brazil, October 27-30, 2010. (IEEE digital library) - abstract - slides - pdf - code
-
C. Tadonki,
Large Scale Kronecker Product on Supercomputers,
2nd Workshop on Architecture and Multi-Core Applications (WAMCA 2011) in conjunction with the International Symposium on Computer Architecture and
High Performance Computing (SBAC PAD 2011),
Petropolis, Rio de Janeiro, Brazil, October 27-30, 2010. (IEEE digital library) - abstract - slides - pdf - code
2. Operation Research (algorithm, modeling, method, tool)
Overview. The main concern here is operation research and convex optimization. My work on this topic covers the design and implementation of efficient solvers for both continuous optimization and discrete optimization. The connection between the two universes through geometric, analytic, and algebraic techniques (global optimization, semidefinite programming, spectral theory, ...) is something which makes the topic very exciting, since this synergy has proven to be a good way of tackling difficult combinatorial problems.
-
L. Drouet, A. Dubois, A. Haurie and C. Tadonki,
A MARKAL-Lite Model for Sustainable Urban Transportation,
Optimization days, Montreal, Canada, May, 2003.
-
Claude Tadonki,
ProxAccpm: A convex optimization solver,
International Symposium on Mathematical Programing,
ISMP2003, Copengagen, Danmark, August
2003
( get it! ).
-
O. Briant, C. Lemaréchal,K.
Monneris,N. Perrot,C. Tadonki,F.
Vanderbeck,J.-P. Vial,C. Beltran,P. Meurdesoif,
Comparison of various approaches for column
generation,
Eigth Aussois Workshop on Combinatorial Optimization, 5-9 january 2004.
-
Claude Tadonki and Jean-Philippe Vial,
Efficient algorithm for linear pattern separation,
International Conference on Computational Science,
ICCS04 (LNCS/Springer), Krakow, Poland, June
2004 .
-
Cesar Beltran, Claude Tadonki, Jean-Philippe Vial,
Semi-Lagrangian relaxation ,
Computational Management Science Conference and Workshop on Computational Econometrics and Statistics,
Link, Neuchatel, Switzerland, April
2004 .
-
Claude Tadonki, Cesar Beltran and Jean-Philippe
Vial ,
Portfolio management with integrality constraints,
Computational Management Science Conference and Workshop on Computational Econometrics and Statistics,
Link, Neuchatel, Switzerland, April
2004 .
-
C. Beltran, C. Tadonki and J.-Ph. Vial,
The p-median problem solved by semi-Lagrangian
relaxation,
First Mathematical Programming Society International Conference on Continuous Optimization (ICCOPT
I), Troy, USA, August 2-4, 2004.
3. Scientific and Technical Computing (Sensors network, power aware computing, program comprehension, data analysis)
Overview. This group is related to specialized algorithms, program optimization and data analysis. My investigations on sensors networks mainly focus on the network topology (disk graph) and the cooperation among sensors (distributed algorithms). Concerning the power aware computing topic, the problem is to reduced the energy dissipated by the execution of a given program, particularly in a context where the energy is a critical resource (embedded systems). My contribution includes combinatorial and analytical methodologies to achieve the task of modeling energy complexity and how to reschedule the algorithm accordingly. Regarding data refinement, this is related to statistical and approximation approaches to improve the matching between experimental data and the model. Further steps in experimental research are sensitive to this agreement.
-
Claude Tadonki, Mitali Singh, Jose Rolim and Viktor K. Prasanna,
Combinatorial Techniques for Memory Power State Scheduling in Energy Constrained
Systems,
Workshop on Approximation and Online Algorithms
(WAOA),
WAOA2003 (LNCS/Springer), Budapest, Hungary, September
2003 .
-
Claude Tadonki and Jose Rolim ,
An analytical model for energy minimization,
III Workshop on Efficient and Experimental Algorithms,
WEA04 (LNCS/Springer), Angra dos Reis, Rio de Janeiro, Brazil, May
2004.
-
Claude Tadonki ,
Universal Report: A Generic Reverse Engineering Tool
,
12th IEEE International Workshop on Program
Comprehension,
IWPC 2004 (IEEE), University of Bari, Bari, Italy , June
2004 .
-
Claude Tadonki and Jose Rolim,
An integer
programming heuristic for the dual power management problem in wireless
sensor networks,
2nd International Workshop on Managing Ubiquitous Communications and Services,
MUCS2004, Dublin, Ireland,
December 13, 2004.
-
Claude Tadonki,
Refinement experiments with RADDAM data,
EMBL bilateral meeting,
Hamburg, Germany,
June 26-28, 2006.
-
Claude Tadonki,
Off-line settings in
wireless networks,
3rd International Symposium on Computational Intelligence and Intelligent Informatics,
ISCIII2007, Agadir, Morocco,
March 28-30, 2007.
-
Claude Tadonki, Système
d'équations récurrentes et multiplication parallèle d'un vecteur par un
produit tensoriel de matrices, Rencontres
Francophones de Parallelisme Renpar'11,
Rennes (France), 1999.
-
Claude Tadonki,
Système
d'équations récurrentes et multiplication parallèle d'un vecteur par un
produit tensoriel de matrices,
Rencontres
Francophones de Parallelisme Renpar'11,
Rennes (France), 1999.
-
Sanjay Rajopadhye, Tanguy
Risset, et Claude Tadonki, The algebraic path problem revisited,
European Conference on Parallel Computing
Europar99,
Toulouse (France), Lncs Sringer-Verlag, N° 1685, p. 698-707, August 1999.
-
Claude Tadonki, Ordonnancements
canoniques, Renpar12,
Rencontres Francophones de Parallelisme, Besançon (France), Juin 2000.
-
Claude Tadonki, Parallel
Cholesky Factorization, Parallel
Matrix Algorithms and Appliations PMAA
Worshop, Neuchatel (Switzerland), August 2000.
-
Claude Tadonki, et Bernard
Philippe, Méthodologie de conception d'algorithmes efficaces pour le
produit tensoriel, CARI2000,
Tananarive (Madagascar), Octobre 2000.
-
Patrice Quinton, Claude
Tadonki, et Maurice Tchuente, Un échéancier systolique et son
utilisation dans l'ATM, CARI2000,
Tananarive (Madagascar), Octobre 2000.
-
Claude Tadonki, Complexité
des ordonnancements canoniques et dérivation d'architecture, Rencontres Francophones de Parallelisme Renpar13,
Paris (France), Avril 2001
( get it! ).
-
Claude Tadonki, A
Recursive Method for Graph Scheduling, International Symposium on
Parallel and Distributed Computing (SPDC),
Iasi, Romania, July 2002
( get it! )
.
-
L. Drouet, A. Dubois, A. Haurie and C. Tadonki, A MARKAL-Lite Model for Sustainable Urban Transportation,
Optimization days, Montreal, Canada, May, 2003.
-
Claude Tadonki, ProxAccpm: A convex optimization solver,
International Symposium on Mathematical Programing,
ISMP2003, Copengagen, Danmark, August
2003
( get it! ).
-
Claude Tadonki, Mitali Singh, Jose Rolim and Viktor K. Prasanna, Combinatorial Techniques for Memory Power State Scheduling in Energy Constrained
Systems,
Workshop on Approximation and Online Algorithms (WAOA),
WAOA2003 (LNCS/Springer), Budapest, Hungary, September
2003 .
-
O. Briant, C. Lemaréchal,K.
Monneris,N. Perrot,C. Tadonki,F.
Vanderbeck,J.-P. Vial,C. Beltran,P. Meurdesoif, Comparison of various approaches for column
generation,
Eigth Aussois Workshop on Combinatorial Optimization, 5-9 january 2004.
-
Claude Tadonki and Jean-Philippe Vial, Efficient algorithm for linear pattern separation,
International Conference on Computational Science,
ICCS04 (LNCS/Springer), Krakow, Poland, June
2004 .
-
R. Ndoundam, C. Tadonki, and M. Tchuente, Parallel chip firing game associated with n-cube orientation,
International Conference on Computational Science,
ICCS04 (LNCS/Springer), Krakow, Poland, June
2004 .
-
Cesar Beltran, Claude Tadonki, Jean-Philippe Vial, Semi-Lagrangian relaxation ,
Computational Management Science Conference and Workshop on Computational Econometrics and Statistics,
Link, Neuchatel, Switzerland, April
2004 .
-
Claude Tadonki, Cesar Beltran and Jean-Philippe
Vial , Portfolio management with integrality constraints,
Computational Management Science Conference and Workshop on Computational Econometrics and Statistics,
Link, Neuchatel, Switzerland, April
2004 .
-
Claude Tadonki and Jose Rolim ,
An analytical model for energy minimization,
III Workshop on Efficient and Experimental Algorithms,
WEA04 (LNCS/Springer), Angra dos Reis, Rio de Janeiro, Brazil, May
2004.
-
C. Beltran, C. Tadonki and J.-Ph. Vial, The p-median problem solved by semi-Lagrangian
relaxation, First Mathematical Programming Society International Conference on Continuous Optimization (ICCOPT
I), Troy, USA, August 2-4, 2004.
-
Claude Tadonki ,
Universal Report: A Generic Reverse Engineering Tool
, 12th IEEE International Workshop on Program
Comprehension,
IWPC 2004 (IEEE), University of Bari, Bari, Italy , June
2004 .
-
Claude Tadonki and Jose Rolim, An integer
programming heuristic for the dual power management problem in wireless
sensor networks, 2nd International Workshop on Managing Ubiquitous Communications and Services,
MUCS2004, Dublin, Ireland,
December 13, 2004.
-
Claude Tadonki, Refinement experiments with RADDAM data, EMBL bilateral meeting,
Hamburg, Germany,
June 26-28, 2006.
-
Claude Tadonki, Off-line settings in
wireless networks, 3rd International Symposium on Computational Intelligence and Intelligent Informatics,
ISCIII2007, Agadir, Morocco,
March 28-30, 2007.
Technical Reports
-
Claude Tadonki, and Bernard
Philippe, Parallel multiplication of a vector by a Kronecker product of
matrices, IRISA report n°
1194, 1998.
Optimal Parallel Algorithm for the Kronecker Product in log(p) communication steps
-
Patrice Quinton, Claude
Tadonki, et Maurice Tchuente, Un
échéancier systolique et son utilisation dans l'ATM, IRISA report
n° 1348,
2000.
-
Claude Tadonki, Synthèse
d'ordonnancements parallèles par reproduction canonique, IRISA report n°
1349, also INRIA report n° 3996, 2000.
-
David Cachera, Sanjay
Rajopadhye, Tanguy Risset, and Claude Tadonki, Parallelization of
the algebraic path problem on linear simd/spmd arrays, IRISA report n°
1409, 2001.
-
Claude Tadonki and
Jean-Philippe Vial, The linear separation problem revisited with accpm
, Cahier de Recherche n° 2002.11, University of Geneva, June 2002.
( get it! )
-
F. Babonneau, C. Beltran, O. du Merle, C. Tadonki and J.-P. Vial,
The proximal analytic center cutting plane method,
Technical report, Logilab, HEC, University of Geneva, 2003.
-
Cesar Beltran, Claude Tadonki, and Jean-philippe Vial,
Semi-Lagrangian relaxation,
Technical report, Logilab, HEC, University of Geneva, 2004.
Book Chapters
-
Ordonnancement pour l'informatique parallele, A. Moukrim and C. Picouleau (Edt), Hermes,
( details! ).
Unpublished/Submited material
-
A.
Dubois, A. Haurie, C. Tadonki, and D. Zachary, An Operational Energy
Modelling System, 2003
( get it! )
-
R.
Ndoundam, C. Tadonki, and M. Tchuente, Parallel chip firing game
associated with n-cube orientation, 2000
( get it! ).
-
F. Babonneau, C. Beltran, O. du Merle, C.
Tadonki, and J.-P. Vial, The proximal analytic center cutting
plane method
, 2003
( get it! ).
-
C.
Tadonki, Universal Report: A generic reverse engeneering tool, 2003
(
get it! ).
-
C.
Tadonki, M. Singh, R. Jose,
and V. Prasanna, Combinatorial technic for memory power state scheduling in
energy-constrained system, 2003
( get it! ).
-
D.
Cachera, S. Rajopadye, T. Risset, and C. Tadonki, Algorithmic
tiling for efficient parallel APP implementation, 2003
( get it! ).
-
C. Tadonki and J.-P. Vial,
Portfilio Selection with Cardinality and Bound Constraint, 2003
( get it! ).
-
C. Tadonki and J. Rolim,
An analytical model for energy minimization, 2003
( get it! ).
-
C. Tadonki and J. Rolim,
An integer
programming heuristic for the dual power management problem in wireless
sensor networks, 2004 ( get it! ).
Work in progress
-
Portfolio Selection with
Cardinality Constraints
-
Efficient Matrix
Computations in Cutting Planes Algorithms
-
Algorithmic Technics of
Designing Energy Efficient Algorithms
-
Structural Method for
Lower Bounds in Complexity
-
Improved Graph Model for
Optical Network of Sensors
-
Dynamic Behavior of
Parallel Chip Firing Game on Regular Graphs
-
Chapter of a book on
"Parallel Scheduling" published by Hermes
-
Book in
"Scientific Computation"
How
to use CPLEX with Matlab
Softwares
Download
my CV in PDF
The Kronecker product, also called tensor product, is
a fundamental matrix algebra operation, which is widely
used as a natural formalism to express a convolution of
many interactions or representations. Given a set of matrices,
we need to multiply their Kronecker product by a
vector. This operation is a critical kernel for iterative algorithms,
thus needs to be computed efficiently. In a previous
work, we have proposed a cost optimal parallel algorithm
for the problem, both in terms of floating point computation
time and interprocessor communication steps. However, the
lower bound of data transfers can only be achieved if we
really consider (local) logarithmic broadcasts. In practice,
we consider a communication loop instead. Thus, it becomes
important to care about the real cost of each broadcast.
As this local broadcast is performed simultaneously
by each processor, the situation is getting worse on a large
number of processors (supercomputers). We address the
problem in this paper in two points. In one hand, we propose
a way to build a virtual topology which has the lowest
gap to the theoretical lower bound. In the other hand, we
consider a hybrid implementation, which has the advantage
of reducing the number of communicating nodes. We illustrate
our work with some benchmarks on a large SMP
8-Core supercomputer.