William Spoth

Connoisseur of coffee and data

Employment

Data Engineer

February 2021 - Current

Improved OLTP and OLAP database performance, stability, and implemented best practices. This includes monitoring, query rewriting, database layout modification, partitioning, index creation, and major version upgrades.
Create and improve partner integrations, this includes utilizing state-of-the-art ETL tools such as Apache NiFi and AWS Glue.
Maintain and develop features for our real-time and historical analytic reporting software offering.

Software Development Engineer Intern

August 2020 - November 2020

Prototype a massive system port to merge execution engines for Redshift and Spectrum.
Enable Redshift queries directly over S3 data without performing ingestion.
Prototype Apache Arrow scanner implementation.

Software Engineer Intern

May 2019 - August 2019

Implemented non performance degrading support for historical queries in PostgreSQL.
Updated existing systems to Spring Boot microservices.
Created automatic system deployment scripts using Kubernetes and Docker.
Managed data pipelines and refactored RPC calls to external systems.

Education

Doctor of Philosophy - Computer Science

University at Buffalo February 2022

Masters of Science - Computer Science

University at Buffalo December 2018

GPA: 3.95/4.0
Coursework: Applied Cryptography, Computer Security, Differential Privacy, Languages and Runtime for Big Data, Data-Oriented Computing, Algorithms for Mordern Computing Systems, Software Engineering Concepts, Pattern Recognition, Computer Architecture, Data Intensive Computing.

Bachelor of Science - Computer Science

University at Buffalo May 2016

GPA: 3.64/4.0
Coursework: Database Concepts, Machine Learning, Operating Systems, Realtime Embeded Systems, Theory of Computation, Programming Languages, Algorithms, Data Structures.

Bachelor of Arts - Psychology

University at Buffalo May 2016

GPA: 3.64/4.0

Awards/
Accomplishments

Teaching Assistant of the Year (2018): Helping project teams implement and understand core database concepts and create an autograder to assess correctness, performance, and report meaningful feedback.
Undergraduate Research Assistant (2013-2016): Provide research ideas and help in testing.
Magna Cum Laude
Eagle Scout Scholarship Award

Research

Schema Drill

Querying JSON data is especially difficult due to its lack of global schema, multiple record versionings, heterogenous records, and optional fields. JSON data piplines often require extensive cleaning and manipluation that resembles python more than it does SQL, which shouldn't be the case. Schema Drill takes in a set of JSON records, and instead of outputting one single schema that poorly fits the dataset, outputs a small number of schemas that better matches data relationships, as well as partitions disimiliar records. These tighter schemas avoid many of the "IF JSON contains PATH" expressions that are often required by domain experts during preprocessing.

GitHub Paper

json-schema-scala

A scala native json-schema draft-07 parser using FastParse. This tools shreds json-schemas into easy to manipluate scala objects that can be quickly serialized and deserialized. This tool additionally supports JSON validation and calculates the number of possible accepted schemas for a subset of the specification.

GitHub

MESS

Full database delpoyments are over-kill for most user needs as exibited by Microsoft Excel's per user market dominance. However simple tasks such as importing, joining, and programatically manipulating multiple csv files often requires more expertice than the average Excel user has. MESS aims to allow non-experts to easily manipulate, clean, and combine both csv and JSON data, without the learning curve of databases.

Adaptive Databases

Relational databases are some of the most notoriously fickle pieces of user facing software on the market. Beyond the common gripes of the uninitiated, preforming innocuous tasks such as fusing relational and JSON data becomes seemingly impossible. Often requiring inline IF statements, sanity/null checks, and a mish-mosh of type conversions. Adaptive databases attempt to fuse the structured and performant world of relational databases, with the unstructed and user friendly world of No-SQL. By removing strict typing, predicting joins, and handle index maintence behind the scenes, we bridge both the preformance and usability gap between these two worlds.

Paper