GSoC Project Ideas
Below, you can find some ideas on the directions in which we could jointly push Polypheny forward. Please consider them as starting points for your proposal. Of course, if you have other ideas, we would be thrilled to hear them. Please have no hesitation to contact us and get feedback on what you plan to do beforehand.
Simply copying and pasting one of the ideas will not work. On the other hand, creating an entirely new project idea without first consulting the mentors might be difficult as well.
Presentation Mode for Polypheny Notebooks
Polypheny Notebooks serve as an interactive environment for data analysis, visualization, and comprehension. This project aims to elevate the utility of Polypheny Notebooks by integrating an advanced presentation mode, transforming notebooks into dynamic, slide-based presentations. This mode will enable users to seamlessly transition between in-depth data exploration and a structured, narrative presentation format, ideal for sharing insights and findings with a broader audience. This enhancement will bridge the gap between data exploration and communication, making it easier for users to convey complex data stories in a compelling and accessible manner. This project can also be extended to be a medium (~175 hours) project.
Expected outcome: A new view for Polypheny Notebooks, that can be entered by pressing a button in the UI, that allows to present notebooks as presentations consisting of multiple ‘slides’.
Difficulty: medium
Size: small (~90 hours)
Skills: TypeScript, Angular
Mentor: Marco
LDAP Query Interface
LDAP (Lightweight Directory Access Protocol) is a widely used, open and vendor-neutral, industry standard application protocol for accessing and maintaining distributed directory information services over an Internet Protocol (IP) network. It provides a common interface for accessing and manipulating directory information, such as usernames and passwords, email addresses, and other directory-based information. By adding support for querying Polypheny using LDAP, Polypheny can seamlessly be integrated in applications using LDAP.
Expected outcome: A new query interface that allows to retrieve data managed by Polypheny using basic LDAP queries.
Difficulty: medium-hard
Size: large (~350 hours)
Skills: Java
Mentor: Heiko
Better Visualization of Query Plans
Polypheny visualizes query plans in its user interface. Although this feature is very powerful and provides various insights, there is potential for visual improvements. This project idea is about visually improving the plan view. This might include adding the estimated number of rows to the edges or extending the thickness of the edges depending on it. A proposal for this project idea should include a concept on the planned changes.
Expected outcome: An improved version of the query plan view available in the existing user interface that is visually more appealing.
Difficulty: easy
Size: small (~90 hours)
Skills: Angular,TypeScript
Mentor: David, Marc
CouchDB-like HTTP Query Interface
CouchDB is a popular document-oriented database system. It features an HTTP query interface that allows querying and manipulating data. The idea of this project is to build a query interface for Polypheny that adheres to the specification of the CouchDB query API. This would allow to seamlessly replace an CouchDB database with Polypheny or to use applications written for CouchDB with Polypheny.
Expected outcome: A new query interface that allows to retrieve data managed by Polypheny using the CouchDB query syntax.
Difficulty: medium-hard
Size: large (~350 hours)
Skills: Java
Mentor: Isabel
RADIUS Query Interface
RADIUS (Remote Authentication Dial-In User Service) is a networking protocol that provides centralized Authentication, Authorization, and Accounting (AAA) management for users who connect and use a network service. It is commonly used by ISPs and corporations to manage access to the Internet or internal networks, authenticate users, and keep track of their actions. Adding a query interface that adheres to the RADIUS protocol to Polypheny would allow integrating Polypheny (and thus the data managed by it) for such applications without an intermediate layer.
Expected outcome: A new query interface that allows to retrieve data managed by Polypheny using RADIUS requests.
Difficulty: medium
Size: large (~350 hours)
Skills: Java
Mentor: Martin
Driver for C++, .Net, PHP, …
Currently, there is a JDBC driver and a Python connector for Polypheny. In this project, support for other languages or frameworks shall be added. This project is explicitly for developers with experience with interacting with databases in a specific language or framework. Feel free to link references to experience with that language or framework in your proposal.
Expected outcome: A driver for a not yet supported programming language or framework that allows to query Polypheny using this language or framework.
Difficulty: medium
Size: medium (~175 hours)
Skills: Java
Mentor: Yiming, Martin
Server-side Query-to-File
For some applications, especially for those making use of the multimedia and file storage capabilities of Polypheny, it is useful to represent and interact with a table (or the result of an arbitrary query) as file system. With Query to File we already have a prototype implementation of this using FUSE and running on the client computer. The idea of this project is to integrate this concept directly into Polypheny. Instead of an application running on the local machine, Polypheny should provide a FTP or WebDAV share that could then be mounted on other machines.
Expected outcome: A new query interface that exposes a mountable file system containing the result of a specifiable query.
Difficulty: medium-hard
Size: large (~350 hours)
Skills: Java
Mentor: Isabel
Cypher Quality Assurance
Testing is a crucial part of the software development process, as it helps to ensure the quality and functionality of the software. It helps to identify bugs, improve the user experience, and it ensures that the software meets the specified requirements and standards. Query languages have well-defined syntax and require specific behavior of the database system. Instead of adding an entire new feature, this project aims at improving the test coverage for the cypher query language in Polypheny. By following the official documentation of the openCypher query language and by systematically adding test cases, existing bugs can be identified, and future regressions can be avoided. It may be possible to use LLMs to support the generation of these test cases.
Expected outcome: A comprehensive set of integration tests that cover as much as possible of the features described in our Cypher documentation.
Difficulty: easy
Size: medium (~175 hours)
Skills: Java
Mentor: Marco, David
Queryable Multimodel Information Schema for Polypheny
This project aims to introduce a queryable information schema to Polypheny, akin to the information schema found in traditional relational databases. The information schema is a critical meta-database that provides access to metadata about the schema. The goal is to create a unified information schema that provides comprehensive metadata across these varied data models, reflecting the structure, relationships, and specifics of each model within a single schema. This schema will enable users to query metadata about tables, documents, columns, nodes, edges, datatypes, and their interrelations. Implementing this feature involves designing a flexible schema that accommodates the nuances of multimodel data, populating it with accurate metadata, and providing efficient querying capabilities. This feature will facilitate better database understanding, optimization, and integration with other tools and frameworks by providing another method for accessing database metadata.
Expected outcome: A virtual schema that can be queried to retrieve information on the database schema.
Difficulty: medium
Size: medium (~175 hours)
Skills: Java
Mentor: Marc