Scala, Apache Spark and Deeplearning4j
Scala programmers seeking to build machine learning solutions can use Deeplearning4j’s Scala API ScalNet or work with the Java framework using the Builder
pattern.
Skymind’s numerical computing library, ND4J (n-dimensional arrays for the JVM), comes with a Scala API, ND4S. Our full walkthrough of Deeplearning4j’s Apache Spark integration is here. Our examples include a number of tutorials using Scala notebooks with Zepellin.
Scala
Scala is one of the most exciting languages to be created in the 21st century. It is a multi-paradigm language that fully supports functional, object-oriented, imperative and concurrent programming. It also has a strong type system, and from our point of view, strong type is a convenient form of self-documenting code.
Scala works on the JVM and has access to the riches of the Java ecosystem, but it is less verbose than Java. As we employ it for ND4J, its syntax is strikingly similar to Python, a language that many data scientists are comfortable with. Like Python, Scala makes programmers happy, but like Java, it is quite fast.
GET STARTED WITH DEEP LEARNING
Finally, Apache Spark is written in Scala, and any library that purports to work on distributed run times should at the very least be able to interface with Spark. Deeplearning4j and ND4J go a step further, because they work in a Spark cluster, and boast Scala APIs called ScalNet and ND4S.
We believe Scala’s many strengths will lead it to dominate numerical computing, as well as deep learning. We think that will happen on Spark. And we have tried to build the tools to make it happen now.
Apache Spark
Deeplearning4j depends on Apache Spark for fast ETL. While many machine-learning tools rely on Spark for computation, this is in fact quite inefficient, and slows down neural net training. The trick to using Apache Spark is pushing the computation to a numerical computing library like ND4J, and its underlying C++ code.
See also
- Artificial Intelligence (AI) for Scala
- Docs: Deeplearning4j on Spark
- Course: Atomic Scala - a recommended beginner’s course
- Martin Odersky’s Coursera course on Scala
- Book: Scala for Data Science
- Video Course: Problem-solving using Scala
- Learn: The Scala Programming Language
- A Scala Tutorial for Java programmers (PDF)
- Scala By Example, by Martin Odersky (PDF)
- An Intro to Scala on ND4J
- Our early-stage Scala API: (One example on Github)
- SF Spark Talk: Deeplearning4j on Spark, and Data Science on the JVM, with ND4J
- Q&A with Adam Gibson about Spark with Alexy Khrabrov
- Our Spark integration
- ND4J: Scientific Computing for the JVM
- Scala Basics for Python Developers
- Why We Love Scala at Coursera
A non-exhaustive list of organizations using Scala:
- AirBnB
- Amazon
- Apple
- Ask.com
- AT&T
- Autodesk
- Bank of America
- Bloomberg
- Credit Suisse
- eBay
- Foursquare
- (The) Guardian
- IBM
- Klout
- NASA
- Netflix
- precog
- Siemens
- Sony
- Tumblr
- UBS
- (The) Weather Channel
- Xerox
- Yammer