Running Scala scripts in #Spark

The Spark shell serves us all well, you can quickly prototype some simple lines of Scala (or Python with PySpark) and you quit the program with a little more insight than you started with.

Spark-logo-192x100px

There are points in time when those scraps of code are handy enough to warrant keeping hold of them. Scala is nice in the sense that you can either run the script without compiling or you can compile your code to a full application.

WordCount From The Shell

Take the (classic) word count functionality. With Spark it’s a doddle…

scala> val text = sc.textFile("/Users/Jason/coffee.csv")
scala> val counts = text.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
scala> counts.collect

15/02/21 14:52:55 INFO DAGScheduler: Job 0 finished: collect at <console>:17, took 0.898995 s
res0: Array[(String, Int)] = Array((Tea,66461), (Latte,8324), (Capuccino,8391), (Flat_White,8499), (Americano,8325))

It’s not fun retyping that in every time you want to do a quick word count though.

WordCount From The Command Line with a Script

Saving the lines you ran in the shell as a script is easy enough to do. Create a text file, let’s call this one wc.scala

vimscala

To run from the command is just a case of firing up the shell again but using the -i flag to specify an input file.

$SPARKHOME/bin/spark-shell -i wc.scala

Note that the shell doesn’t exit. So edit your wc.scala file and add an exit call as the last line.

System.exit(0)

 

 

 

15 responses to “Running Scala scripts in #Spark”

  1. Good info. How would you (1) handle passing arguments into this file (say, the path to the file), and (2) handle passing args in when you’re using the main method of an object?

  2. At this point I would be looking at writing a proper Spark application. When I’m testing the same lines I’m using within the spark-shell then I’ll create a quick text file to run back in the shell, it saves typing it out again and again. Once I’m happy with the way those lines are working then I’ll transfer to a proper application.

  3. Useful article!

    However, there is a typo:
    $SPARKHOME/bin/spark-shell -i wc.spark

    Earlier, you named the example file “wc.scala” not “wc.spark”

  4. When I am running this command spark-shell -i filename.scala. No error is coming but only getting defined object. How to see the output.

  5. Hi, when I am loading a file using spark-shell -i HelloWorld.scala, it starts the spark shell and after then it shows:

    Loading HelloWorld.scala…
    defined module HelloWorld

    or

    scala> :load HelloWorld.scala
    Loading HelloWorld.scala…
    defined module HelloWorld

  6. Not sure what’s going on as I don’t know what your Scala script contains. I’ve just tested my old word count example and it’s worked out, I’m going to do a very quick post to show.

  7. Not sure what’s going on as I don’t know what your Scala script contains. I’ve just tested my old word count example and it’s worked out, I’m going to do a very quick post to show.

  8. Thanks this is exactly what I was looking for :

    Prateek Gupta
    said:

    April 8, 2017 at 6:00 am

    Hi, when I am loading a file using spark-shell -i HelloWorld.scala, it starts the spark shell and after then it shows:

    Loading HelloWorld.scala…
    defined module HelloWorld

    or

    scala> :load HelloWorld.scala
    Loading HelloWorld.scala…
    defined module HelloWorld

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: