GSoC Project Ideas

Below, you can find some ideas on the directions in which we could jointly push Polypheny forward. Please consider them as starting points for your proposal. Of course, if you have other ideas, we would be thrilled to hear them. Please have no hesitation to contact us and get feedback on what you plan to do beforehand.

Simply copying and pasting one of the ideas will not work. On the other hand, creating an entirely new project idea without first consulting the mentors might be difficult as well.

Driver for PHP, NodeJS, Ruby, …

Currently, there is a JDBC driver, a C++ driver, a .NET driver, a Go driver and a Python connector for Polypheny. In this project, support for other languages or frameworks shall be added. This project is explicitly for developers with experience with interacting with databases in a specific language or framework. Feel free to link references to experience with that language or framework in your proposal.

Expected outcome: A driver for a not yet supported programming language or framework that allows to query Polypheny using this language or framework.

Difficulty: medium
Size: medium (~175 hours)
Skills: Good knowledge of the programming language
Mentor: Yiming, Martin

Natural Language Interface for Polypheny’s Workflow Engine

With the next release, Polypheny will introduce a workflow engine for modeling ETL (Extract, Transform, Load) workflows. This project aims to simplify workflow creation by developing a Natural Language Processing (NLP) interface, allowing users to describe operations in plain language. The system will interpret user input, generate corresponding workflow configurations, and suggest optimizations, making ETL design more accessible to non-experts. The project involves building a robust NLP model to process ETL-related instructions, mapping them to workflow components, and ensuring seamless integration with Polypheny’s engine. Handling ambiguities, refining user input, and maintaining accuracy will be key challenges.

Expected outcome: A functional NLP-powered interface that enables users to configure ETL workflows using natural language.

Difficulty: hard
Size: large (~350 hours)
Skills: Natural Language Processing (NLP), Java
Mentor: David

Enhancing Presentation Mode for Polypheny Notebooks

Polypheny Notebooks provide an interactive environment for data analysis, visualization, and insight sharing. A previous GSoC project introduced a presentation mode, allowing users to transform notebooks into dynamic, slide-based presentations. While the initial implementation laid a strong foundation, further improvements are needed to enhance usability, functionality, and overall user experience. This project aims to refine and expand the existing presentation mode, ensuring a seamless transition between data exploration and structured presentations. Enhancements include improving slide customization, refining navigation controls, optimizing performance, and addressing any usability gaps. The goal is to make the presentation mode more intuitive and robust, empowering users to effectively communicate complex data insights.

Expected outcome: A refined and improved presentation mode in Polypheny Notebooks, offering better usability and enhanced features for presenting notebooks as structured slides.

Difficulty: medium
Size: small (~90 hours)
Skills: TypeScript, Angular
Mentor: Marco

LDAP Query Interface

LDAP (Lightweight Directory Access Protocol) is a widely used, open and vendor-neutral, industry standard application protocol for accessing and maintaining distributed directory information services over an Internet Protocol (IP) network. It provides a common interface for accessing and manipulating directory information, such as usernames and passwords, email addresses, and other directory-based information. By adding support for querying Polypheny using LDAP, Polypheny can seamlessly be integrated in applications using LDAP.

Expected outcome: A new query interface that allows to retrieve data managed by Polypheny using basic LDAP queries.

Difficulty: medium-hard
Size: large (~350 hours)
Skills: Java
Mentor: Heiko

CouchDB-like HTTP Query Interface

CouchDB is a popular document-oriented database system. It features an HTTP query interface that allows querying and manipulating data. The idea of this project is to build a query interface for Polypheny that adheres to the specification of the CouchDB query API. This would allow to seamlessly replace an CouchDB database with Polypheny or to use applications written for CouchDB with Polypheny.

Expected outcome: A new query interface that allows to retrieve data managed by Polypheny using the CouchDB query syntax.

Difficulty: medium-hard
Size: large (~350 hours)
Skills: Java
Mentor: Isabel

RADIUS Query Interface

RADIUS (Remote Authentication Dial-In User Service) is a networking protocol that provides centralized Authentication, Authorization, and Accounting (AAA) management for users who connect and use a network service. It is commonly used by ISPs and corporations to manage access to the Internet or internal networks, authenticate users, and keep track of their actions. Adding a query interface that adheres to the RADIUS protocol to Polypheny would allow integrating Polypheny (and thus the data managed by it) for such applications without an intermediate layer.

Expected outcome: A new query interface that allows to retrieve data managed by Polypheny using RADIUS requests.

Difficulty: medium
Size: large (~350 hours)
Skills: Java
Mentor: Martin

Streamlining Polypheny’s UI

Over time, multiple refactorings in Polypheny’s frontend have led to inconsistencies in the user interface, affecting both usability and visual coherence. This project aims to streamline the look and feel of the entire Polypheny UI, ensuring a more polished, consistent, and user-friendly experience. The primary focus will be on aligning styles, refining UI components, and improving the overall usability of different sections, including Polypheny Notebooks. Special attention will be given to the notification system, enhancing how system events and user actions are communicated. This includes improving clarity, visibility, and handling of notifications to provide better feedback and ensure users stay informed without unnecessary interruptions. As an optional extension, this project can also introduce support for background activity notifications, allowing long-running tasks to execute in the background while keeping users updated with real-time status messages. Implementing this feature would increase the project size from medium to large.

Expected outcome: A visually refined and more consistent UI across Polypheny, specifically focusing on the schema management an the notebooks. The notification system will be enhanced for better usability, with optional support for background activity notifications to improve user experience.

Difficulty: easy-medium
Size: medium (~175 hours) or large (~350 hours)
Skills: TypeScript, Angular
Mentor: David, Martin

Server-side Query-to-File

For some applications, especially for those making use of the multimedia and file storage capabilities of Polypheny, it is useful to represent and interact with a table (or the result of an arbitrary query) as file system. With Query to File we already have a prototype implementation of this using FUSE and running on the client computer. The idea of this project is to integrate this concept directly into Polypheny. Instead of an application running on the local machine, Polypheny should provide a FTP or WebDAV share that could then be mounted on other machines.

Expected outcome: A new query interface that exposes a mountable file system containing the result of a specifiable query.

Difficulty: medium-hard
Size: large (~350 hours)
Skills: Java
Mentor: Isabel

Queryable Multimodel Information Schema for Polypheny

This project aims to introduce a queryable information schema to Polypheny, akin to the information schema found in traditional relational databases. The information schema is a critical meta-database that provides access to metadata about the schema. The goal is to create a unified information schema that provides comprehensive metadata across these varied data models, reflecting the structure, relationships, and specifics of each model within a single schema. This schema will enable users to query metadata about tables, documents, columns, nodes, edges, datatypes, and their interrelations. Implementing this feature involves designing a flexible schema that accommodates the nuances of multimodel data, populating it with accurate metadata, and providing efficient querying capabilities. This feature will facilitate better database understanding, optimization, and integration with other tools and frameworks by providing another method for accessing database metadata.

Expected outcome: A virtual schema that can be queried to retrieve information on the database schema.

Difficulty: medium
Size: medium (~175 hours)
Skills: Java
Mentor: Marc

Get Polypheny