Deep Learning

AI, Acoustics & Ornithology for sound-based bird classification

Have you ever wondered about the name of the bird you just heard singing? A group of women from local Polish chapter of Women in Machine Learning & Data Science (WiMLDS) organization not only thought about it but also decided to create a solution, on their own, to be able to detect birds species based on the sound they make.

Introduction to Explainable AI: why should we understand AI decisions?

Is XAI really that important? Why should we try to explain our models' predictions? A short introduction to explainable AI.

Detecting and Reducing Bias in Data

Currently, in contrast to shallow models exploited in the past, most deep learning systems extract features automatically, and to do that, they tend to rely on a huge amount of labeled data. Whereas the quality of dataset used to train neural networks has a huge impact on the models’ performance, those datasets are often noisy, biased and sometimes even contain incorrectly labeled samples. Moreover, deep neural networks (DNNs) are black-box models that usually have tens of layers with millions of parameters, and very complex latent space, which make their decisions very hard to interpret. Such fragile models are increasingly used to solve very sensitive and critical tasks. Therefore, the demand for a clear reasoning and correct decision is very high, especially when DNNs are used in transportation (autonomous cars), in healthcare, for legal systems, finances, and military. To address those challenges the project aims to develop methods of Explainable Artificial Intelligence (XAI) which might help to uncover and reduce the problem of bias in data. The project involves investigation and integration of explainability into new and existing Artificial Intelligence systems, and mostly focuses on Deep Neural Networks in the field of Computer Vision. One of the ways of categorizing XAI methods is to divide them into local and global explanations. Local analysis aims to explain a single prediction of a model, whereas a global one tries to explain how the whole model works in general. The project aims to develop novel methods of both local and global explainability to help explain deep neural network-based systems in order to justify them, to control their reasoning process, and to discover new knowledge.

DetectWaste and ClassifyWaste

The proposed classify-waste benchmark is a merged collection of publicly available datasets with eight classification labels. The proposed detect-waste benchmark is a merged collection of Extended TACO (dataset created by us) and publicly available datasets with detection annotations: Wade-AI, UAVVaste, TrashCan, TrashICRA, Drinking-Waste, and MJU-Waste.


Deaf people are affected by many forms of exclusion, especially now in the pandemic world. HearAI aims to build a deep learning solution to make the world more accessible for the Deaf community and increase the existing knowledge base in using AI for Polish Sign Language.

Punctuation Restoration

Speech transcripts generated by Automatic Speech Recognition (ASR) systems typically do not contain any punctuation or capitalization. In longer stretches of automatically recognized speech, the lack of punctuation affects the general clarity of the output text [1]. The primary purpose of punctuation (PR) and capitalization restoration (CR) as a distinct natural language processing (NLP) task is to improve the legibility of ASR-generated text, and possibly other types of texts without punctuation. Aside from their intrinsic value, PR and CR may improve the performance of other NLP aspects such as Named Entity Recognition (NER), part-of-speech (POS) and semantic parsing or spoken dialog segmentation [2, 3]. As useful as it seems, It is hard to systematically evaluate PR on transcripts of conversational language; mainly because punctuation rules can be ambiguous even for originally written texts, and the very nature of naturally-occurring spoken language makes it difficult to identify clear phrase and sentence boundaries [4,5]. Given these requirements and limitations, a PR task based on a redistributable corpus of read speech was suggested. 1200 texts included in this collection (totaling over 240,000 words) were selected from two distinct sources: WikiNews and WikiTalks. Punctuation found in these sources should be approached with some reservation when used for evaluation: these are original texts and may contain some user-induced errors and bias. The texts were read out by over a hundred different speakers. Original texts with punctuation were forced-aligned with recordings and used as the ideal ASR output. The goal of the task is to provide a solution for restoring punctuation in the test set collated for this task. The test set consists of time-aligned ASR transcriptions of read texts from the two sources. Participants are encouraged to use both text-based and speech-derived features to identify punctuation symbols (e.g. multimodal framework [6]). In addition, the train set is accompanied by reference text corpora of WikiNews and WikiTalks data that can be used in training and fine-tuning punctuation models..

Tiny Hero - Generating pixel characters with GANs

Dataset TinyHero includes 64x64 retro-pixel character. All characters were generated with [Universal LPC spritesheet by makrohn]( Each character in the dataset was randomly generated including: sex, body type, skin color and equipment with LPC spritesheet with 4 different angles view.

Skin Lesion Classification

In the last twenty years the interest of automated skin lesion classification dynamically increased partially because of public datasets appearing. Automated computer-aided skin cancer detection in dermatoscopic images is a very challenging task due to uneven datasets sizes, the huge intra-class variation with small interclass variation, and numerous artifacts. During my work on the project I approached the problem in two ways: with hand-crafted features based on extended ABCD rule and a shallow neural network, with Convolutional Neural Networks.

Bird Song Classification

Sound-Based Bird Classification using Convolutional Neural Networks and Mel-Cepstrum Sepctrograms

Detect waste in Pomerania

Using detection models to localize and classify waste on images and video.