Several open-source projects that I authored or contributed to.
Roaring bitmaps are fast and lightweight sets of unsigned 32 bits integers. They can be several orders of magnitude more efficient than classical implementations (e.g. hash tables, balanced binary trees, uncompressed bitmaps), both in terms of time and memory consumption. Implementations exist in several languages, including C and Python:
- CRoaring is a C implementation. I implemented several parts of the library: range constructor, select query, subset test.
- PyRoaring is a Python wrapper for the C implementation. It allows to use roaring bitmaps transparently in Python, with the speed of C. I implemented most of the library, using the Cython programming language.
Performance tests on Grid’5000.
On a near-daily basis, I run micro-benchmarks on hundreds of Grid’5000 nodes. Several statistical tests are then performed on several metrics (e.g. CPU performance, frequency, temperature, power consumption). I implemented the tests in Python, it consists in estimating the likelihood to observe a new value (or a sequence of new values) given the old observations.
These tests allowed to detect several events affecting significantly the nodes performance. These events went unnoticed by both the staff and the users. With further investigations, I could pinpoint the root cause most of the time (cooling issue, faulty memory, power instability, BIOS upgrade).
I implemented this library during my PhD thesis, more information can be found in the chapter 11 of my thesis.
Python module to compute a segmented linear regression.
It is implemented in Python, using a two-phase greedy algorithm. Several objective functions are available: Ordinary Least Square, Weighted Least Square (implemented with a closed formula), Logarithmic Least Square (implemented with a gradient descent).
This module gave much better fits than other alternatives I tested on my data, like Datadog’s piecewise or Cubist.
I implemented this library during my PhD thesis, more information can be found in the chapter 5 of my thesis (section 5.3.3).
Experiment engine for Grid’5000.
This library allows to write experiments as readable and concise Python scripts. Common tasks (e.g. node deployment, resource monitoring) are factorized in the engine. Experiment data and meta-data are automatically collected in an archive.
I implemented this library during my PhD thesis, more information can be found in the chapter 9 of my thesis.
Simgrid is a 20 year old free software to simulate a wide range of distributed systems: grid, P2P, cloud, MPI applications. Its simulation models have been theoretically and experimentally assessed.
I mostly used and contributed to SMPI a module of Simgrid which can realistically simulate unmodified MPI applications. I implemented several features to improve the simulation efficiency. I also greatly improved the calibration procedure for modeling an existing platform.
Platypus is a modular and open source question answering framework. Its main purpose is to answer to questions asked in a natural language (e.g. in English). For instance, it answers correctly to the question:
What is the birth date of the president of the United States?
For more information, read the report or the slides. This work was initially a student project. It was done as part of the master’s first year at the ENS Lyon, in a team of seven. The project has then been sold to Lexistems and is no longer actively maintained by ourselves.