I am a data scientist from Germany. Besides machine learning using IoT data I am collecting experience in time-series models and nlp. For more details check my publications and projects.
Timeline
My background
Since 08/2021
Data Scientist
at DPS Innovations GmbH
11/2016 - 01/2021
Research Scientist
at the University of Würzburg/Germany, chair for data science.
Can machine learning identify the appropriate reading level of a passage of text, and help inspire learning? Reading is an essential skill for academic success. When students have access to engaging passages offering the right level of challenge, they naturally develop reading skills.
Currently, most educational texts are matched to readers using traditional readability methods or commercially available formulas. However, each has its issues. Tools like Flesch-Kincaid Grade Level are based on weak proxies of text decoding (i.e., characters or syllables per word) and syntactic complexity (i.e., number or words per sentence). As a result, they lack construct and theoretical validity. At the same time, commercially available formulas, such as Lexile, can be cost-prohibitive, lack suitable validation studies, and suffer from transparency issues when the formula's features aren't publicly available.
CommonLit, Inc., is a nonprofit education technology organization serving over 20 million teachers and students with free digital reading and writing lessons for grades 3-12. Together with Georgia State University, an R1 public research university in Atlanta, they are challenging Kagglers to improve readability rating methods.
In this competition, you’ll build algorithms to rate the complexity of reading passages for grade 3-12 classroom use. To accomplish this, you'll pair your machine learning skills with a dataset that includes readers from a wide variety of age groups and a large collection of texts taken from various domains. Winning models will be sure to incorporate text cohesion and semantics.
If successful, you'll aid administrators, teachers, and students. Literacy curriculum developers and teachers who choose passages will be able to quickly and accurately evaluate works for their classrooms. Plus, these formulas will become more accessible for all. Perhaps most importantly, students will benefit from feedback on the complexity and readability of their work, making it far easier to improve essential reading skills.
While text embeddings using RoBERTa seem to be the most promising approach, I am applying GloVe embeddings in combination with sequence models to solve the task. A more in-depth explanation will follow after the final competition deadline.
P2Map
Learning Environmental Maps - Integrating Participatory Sensing and Human Perception
Personal sensors are increasingly popular and many communities are working on providing mobile, low-cost sensor solutions in order to measure their personal environment, to map their immediate surroundings, to validate official sources, and ultimately to impact policy making. Such public interest can be useful to build sensor networks with a much greater spatial coverage and allow for efficient large-scale case studies. However, at the same time low-cost sensor are mostly inaccurate and applied measuring protocols are usually not compatible with official regulations. Additionally, the temporal coverage for mobile sensors is not as high as for stationary ones and personalization often results in less regular measurements. Even more, low-quality devices together with measurement biases of special interest groups can lead to misinterpretations of the data and in the end to an erroneous perception of reality.
Thus, this project works on three intertwined problems: 1) We are analyzing perceptions, subjective opinions, behavior and different motivations of user groups and individuals in the context of participatory sensing. 2) In the same context we are investigating how to optimize the applicability of personalized, low-cost and mobile sensors. In particular this means optimizing sensor measurements by different advanced calibration mechanisms on the one hand and providing appropriate information to correctly interpret the measurements on the other hand. And 3), we aim to build maps with integrated views on sensor values, corresponding predictions, as well as perceptions, and subjective data. This will facilitate an aggregated view on the collected data on the one hand and provide meaningful information for interpreting the measurements on the other hand.
Overall, we aim to combine results from user analysis and sensor characteristics utilizing advanced machine learning methods in order to allow for emerging synergies in the area of more accurate maps and perceptual feedback. To this end, we will integrate data from official sources, different types of devices and user studies explicitly focusing on perceptions and other subjective data in the context of noise and air quality. The envisioned results of this project are 1) a deeper understanding of perception and subjective impressions in the context of participatory sensing, 2) how to leverage the collected data and user information in order to extract usable statistics, as well as 3) visualizations, e.g., maps, to allow for an informed interpretation of the collected data and the environment.
Publications
My scientific publications
Evaluating the multi-task learning approach for land use regression modelling of air pollution.
Dulny, Andrzej; Steininger, Michael; Lautenschlager, Florian; Krause, Anna; Hotho, Andreas
in Journal of Physics: Conference Series (2021)
MapLUR: Exploring a New Paradigm for Estimating Air Pollution Using Deep Learning on Map Images.
Steininger, Michael; Kobs, Konstantin; Zehe, Albin; Lautenschlager, Florian; Becker, Martin; Hotho, Andreas
in ACM Trans. Spatial Algorithms Syst. (2020)
OpenLUR: Off-the-shelf air pollution modeling with open features and machine learning.
Lautenschlager, Florian; Becker, Martin; Kobs, Konstantin; Steininger, Michael; Davidson, Padraig; Krause, Anna; Hotho, Andreas
in Atmospheric Environment (2020)
SimLoss: Class Similarities in Cross Entropy.
Kobs, Konstantin; Steininger, Michael; Zehe, Albin; Lautenschlager, Florian; Hotho, Andreas
in International Symposium on Methodologies for Intelligent Systems (2020)
Anomaly Detection in Beehives using Deep Recurrent Autoencoders.
Davidson, Padraig; Steininger, Michael; Lautenschlager, Florian; Kobs, Konstantin; Krause, Anna; Hotho, Andreas
in arXiv preprint arXiv:2003.04576 (2020)
EveryAware Gears: A Tool to visualize and analyze all types of Citizen Science Data.
Lautenschlager, Florian; Becker, Martin; Steininger, Michael; Hotho, Andreas
in SemGeoSoc Workshop (2018)
Air Trails--Urban Air Quality Campaign Exploration Patterns.
Becker, Martin; Lautenschlager, Florian; Hotho, Andreas
in AGILE Workshop (2018)