Introducing Learning, Unsupervised: Explore ML with the Sisu team

By Vlad Feinberg - March 10, 2021

Today I’m thrilled to introduce Learning, Unsupervised, an inside look at the latest ML research we’re reading at Sisu. Since we first started in 2018, members of the Sisu team from across the organization have been meeting every other week to discuss papers and technical readings on the latest in ML to help inform our work. We want to share what we’re learning, what we’re discussing, and how it informs what we’re building with excerpts and takeaways from each session.

ML is a fast-paced field with advances every week. At the same time, ML builds on well-exercised classical techniques from statistics and optimization. We value intellectual humility, and as part of that, we like to keep our eyes open to what others have done. That’s why, as a team, we dedicate time during the workday to discuss papers our reading group members are interested in.

For instance, we’ve explored Fixed-X Knockoffs and glmnet to understand core concepts behind false discovery control and procedures that can be applied to help realize such guarantees in a production system. Some more recent research, such as Prophet and R-Learner, are viable approaches to future capabilities within Sisu, so having an in-depth discussion about such papers’ assumptions and implications in an applied setting is illuminating.

As quickly as ML is evolving, so too is our team. That’s why our Learning, Unsupervised discussions are open to everyone, not just our engineers. Technical understanding is positive-sum: the more people who are informed about the issues behind large-scale inference in business analytics, the better our contribution will be recognized in organizations.

We hope that everyone who stops by to check out our notes finds them motivating enough to read each paper on their own! It’s why anyone in our company can join the reading group and present on a paper or topic. Learning is positive-sum.

What you can expect to read:

  • Learning, Unsupervised will explore topics in causal inference, optimization, statistical inference, and ML systems.
  • Each post will contain links to the material, motivation for picking the paper, core innovative nuggets, and raw notes with detailed derivations.
  • We’ll post notes every other week after our discussion, modulo holidays.

We think you’ll find these takeaways as engaging as we have. If you have ideas for what we should read next, drop me a line at [email protected]

If you’re interested in studying and applying these types of practical ML problems, we’re hiring across the stack in engineering. Come join us!

Read more

Efficient Featurization of Common N-grams via Dynamic Programming

Based on our experience working with massive text datasets, John Hallman describes a filtering approach to efficiently use n-gram featurization on large datasets.

Read more

Neural Networks for Sparse Data with Graph Coloring

A look at how graph coloring techniques can be used to make neural networks applicable to large, sparse data.

Read more