Over the course of a few months in late 2017 and early 2018, I gave several talks around Cleveland introducing audiences to deep learning. This post describes my experiences along the way.
These talks were slight variations on a core thesis – that deep learning is essentially a collection of methods for learning hierarchical, distributed representations. This may not seem all that controversial, but I find that the usual methods of introducing deep learning, e.g. “neuroscience-inspired machine learning” or “non-linear modeling” to be less theoretically grounded, and therefore tangential to reality for technical audiences.
By doing this, I’d aligned myself with an educational school of thought that’s in direct opposition to the spirit of “practice now, theory later” demonstrated most successfully by fast.ai. That’s not at all why I chose this route, but in retrospect I’m standing by the choice.
I’ll first briefly describe and link to the talks I gave, although you can skip to my thoughts on this approach if you like.
At first, I just wanted to be as precise as possible when defining deep learning. I gave the first version of this talk1 at the inaugural meeting of the Cleveland AI Group, with the goal of establishing the group as a community where people can find trustworthy information on these topics. At the time of writing, this is still the guiding feature of the group’s mission. Billy Barbaro did an excellent job aligning his preceding talk with that goal.
One thing I learned from this talk and the Q&A that followed is just how diverse misconceptions of AI and machine learning can be. These misconceptions range from slight biases about how machine learning differs from statistical modeling to fundamental misunderstandings of the limitations of agents.2
The second version3 was at the Cleveland R User Group. I assumed the audience would contain a mix of programmers and statisticians, so I included some common practices like and added in a few examples of Keras code in R. I could’ve done this much better by motivating some of the theoretical slides with algorithms in pseudocode or actual tidyverse/base R code.
The third and final version4 was presented to a group of postdocs and researchers at the Cleveland Clinic’s Genomic Medicine Institute. Billy and I gave a joint talk here, so as to give a complete introduction. The examples were modified to be more genomic, and I added a bit at the end about my experiences at the Machine Learning for Healthcare workshop at NIPS 2017 in order to stimulate ideas for applying this kind of technology to their own research. We met with postdocs afterward for some excellent discussion around their projects, diving a bit deeper into how machine learning can be applied. They’re doing a lot of cool work there, and if you’re interested in applying machine learning to genomics, I think the Genomic Medicine Institute would be an excellent place for it!
I think there are some strong arguments for what I’m calling the “practice now, theory later” approach. It’s similar to the “see one, do one, teach one” tradition from surgical medicine. There is some evidence suggesting experiential learning is more effective than traditional, pedagogical approaches, although such literature reviews present their own complications. There’s also something about experiential learning that just feels more modern.
My main criticism for teaching machine learning this way is that in trying to “blend” theory with application, educators frequently do not give enough attention to theory. In practice, this means students often end up disregarding the “teach one” step. Rigorous theoretical material takes much longer to comprehend than simpler heuristic methods, so it’s often given cursory treatment, especially in self-paced courses. When this happens, you end up with a lot of people who (roughly) know what tools to use for problems similar to certain examples they’ve seen, but a lack of theoretical grounding for how to approach a more general class of problems.
Evidence for this imbalance can be seen across the internet, but one such place is the Artificial Intelligence & Deep Learning Facebook group. Many of the posts end up being newcomers asking questions. We can split specific-enough questions into application and theory questions. If I had to guess at the prevalence here, I’d say it’s around 3:1, respectively. That’s arguably suggestive on it’s own, but what’s more telling is that the number of comments on these posts is also imbalanced. On average, application question posts receive several times more comments than theory questions.5 The quality of the responses doesn’t matter here – the point is that group members feel much more confident answering application questions than they do theory questions.
This is not meant to replace a true empirical analysis, since it’s likely that we’re seeing some sample bias here. But I think it’s representative of what I’ve experienced while talking to data scientists and ML engineers in Cleveland, Denver, and the OpenMined Slack, many of whom are likely self-taught. Unfortunately, I can’t rigorously build a comparison between the two approaches without more data about the learning programs of self-taught people succeeding in industry.
All this is to say that for those who seem to have the capability, I tend to recommend courses that focus on theory as much or more than application.
How is this related to my research interests?
In all likelihood, these learners are going to be the ones facilitating the spread of machine learning outside of the tech industry. I’m concerned with safe, secure, and private machine learning, and I’m focused on creating tools and methods that enable these kinds of systems. However, the ultimate social impact of machine learning depends on its implementation. Using these tools and algorithms safely and effectively requires not just an understanding of the algorithms themselves, but quite a bit of theoretical background from statistics and mathematics.
For most application-driven areas, the necessary theory isn’t all that complex – most people trying to break into data science and machine learning roles are fully capable of understanding statistical concepts, linear algebra, and calculus. The problem is that they’re just not doing it. That should never be because the materials they’ve used to learn didn’t present theory, or didn’t bridge theory and practice well enough.
I observed that these were most often guided by anthropomorphisms. This part was unsurprising – journalists (minus a few noteworthy exceptions) have mishandled presenting recent AI advances to the public, mostly piggybacking on Hollywood’s endeavors in science fiction. ↩
For example, the first application question I saw today had about 8 primary thread comments, whereas the first theoretical question had 0, even though they’d both been posted around the same time this morning. I’ve repeated this several times over the span of a week with similar results. ↩