[Stellenanzeige] MSc thesis projects at Genevention

Genevention GmbH is a Bioinformatics Startup that develops algorithms and databases to identify biomarkers from high-throughput experimental data. We apply cutting-edge machine learning techniques to the massive amount of available -omics data to gain insights into biological mechanisms driving diseases and agricultural loss. We aim to provide the tools that help medical researchers, pharmaceutical engineers and agricultural companies to understand and combat these mechanisms. We are offering several thesis projects at the Master student level:

Identification of disease-specific biomarkers

Today, thousands of datasets from genomic and transcriptomic high-throughput experiments in the context of many diseases are available to the public. The molecules and molecule signatures that are relevant within a particular disease, so-called biomarkers, are often identified based on single datasets with healthy and diseased subjects. By neglecting information on molecule signatures from other datasets and diseases, the resulting signatures are often not very specific to a disease but overlap with other signatures.

In this project, we want to identify highly disease-specific molecule signatures by taking into account the huge amount of publicly available transcriptomic datasets. For this purpose, different data mining and machine learning techniques shall be evaluated on molecule profiles from different diseases and results shall be provided in terms of a disease-specific molecule signature database.

Development of rich databases

High-throughput experiments in life sciences are not only characterized by their raw data, but also by the information describing the context in which the data was obtained – the so-called metadata. In genomics and transcriptomics this metadata comprises information about subjects (e.g. age, gender), protocols (e.g. chemicals, instruments) and experimental conditions (e.g. diseases, gene knockouts). These descriptions are valuable in comparing data across experiments and make huge experiment collections searchable for particular criteria. However, although standardized vocabularies („ontologies“) for most of these metadata categories exist, experimental information is often only stored in terms of free text in lab notebooks.

Genevention develops databases that facilitate organizing and retrieving experimental meta-information. Here, customized data acquisition forms, performant database technologies and domain-specific ontologies are combined to collect and harmonize the data, indicate the lack of information and allow its semi-automatic completion and easy retrieval. In this project, we plan to extend the database framework with modules for seamless integration of relevant data from scientific literature and external biological databases. For this purpose, literature databases such as PubMed (pubmed.org) shall be analyzed w.r.t. relevance for experimental information by means of text mining techniques. Furthermore, relevant connections of metadata categories to external databases shall be included in the database system in a dynamical fashion. Challenges that need to be addressed in this project comprise database optimization (e.g. query speed-up) and security (e.g. efficient database encryption), while adhering to the standard software development lifecycle.

Further project outlines are available on request and we are also happy to discuss individual ideas.

Desired Skills:

Linux/shell scripting/data management
Java/Python/R
SQL/noSQL databases
Machine Learning/Data Mining/bioinformatics basics
work proactively in a team and have good communication and documentation skills

Information and contact:

Dr. Thomas Lingner

Genevention GmbH

Rudolf-Wissell-Str. 28

37079 Göttingen

lingner@genevention.com

0176/23970603