{"id":1252,"date":"2013-09-01T14:49:35","date_gmt":"2013-09-01T21:49:35","guid":{"rendered":"http:\/\/blog.light42.com\/wordpress\/?p=1252"},"modified":"2013-09-14T08:49:44","modified_gmt":"2013-09-14T15:49:44","slug":"ampcamp-3-the-stack","status":"publish","type":"post","link":"http:\/\/blog.light42.com\/wordpress\/?p=1252","title":{"rendered":"AmpCamp 3 &#8211; The Stack"},"content":{"rendered":"<p><a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/09\/spark-project-header1.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/09\/spark-project-header1.png\" alt=\"spark-project-header1\" width=\"750\" height=\"160\" class=\"alignright size-medium\" \/><\/a><\/p>\n<p>The <a href=\"https:\/\/amplab.cs.berkeley.edu\/software\/\" title=\"BDAS\" target=\"_blank\">Berkeley Data Analytics Stack (BDAS)<\/a> was the central subject at <a href=\"http:\/\/ampcamp.berkeley.edu\/3\/\" title=\"AmpCamp 3\" target=\"_blank\">AmpCamp 3<\/a>. <\/p>\n<p><strong>Spark<\/strong> is the core of the stack. It has been recently adopted for <a href=\"http:\/\/spark.incubator.apache.org\/\" title=\"spark at Apache.org\" target=\"_blank\">incubation as an Apache Project<\/a>. True to form for a fast-moving OSS project, we actually used the 0.80 git repo version, rather than the 0.73 that you will find on the Apache site.<\/p>\n<p><a href=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/09\/Scala_logo.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.light42.com\/wordpress\/wp-content\/uploads\/2013\/09\/Scala_logo-300x84.png\" alt=\"Scala_logo\" width=\"200\" height=\"62\" class=\"alignright size-medium wp-image-1253\" \/><\/a><\/p>\n<p>Spark is built on <strong><a href=\"http:\/\/www.scala-lang.org\/\" title=\"scala\" target=\"_blank\">Scala<\/a><\/strong>, which runs in a Java VM. All the lab exercises used <strong><a href=\"http:\/\/openjdk.java.net\/projects\/jdk7\/\" target=\"_blank\">openJDK7<\/a><\/strong>, and they ran very very well. (we moved all of the <a href=\"http:\/\/live.osgeo.org\" title=\"osgeo-live\" target=\"_blank\">OSGeo Live<\/a> java projects to openJDK7 last December or so, and have never looked back)<\/p>\n<p>Among the components of <strong>BDAS<\/strong>, I got the sense that <strong>Spark<\/strong> and <strong>Mesos<\/strong> were most stable, with <strong>Shark<\/strong> (the SQL interface ) and <strong>pySpark<\/strong> (the python interface ) also strong.. The in-memory filesystem <em>Tachyon<\/em> was presented as clearly in early stages, as well as the very interesting <em>GraphX<\/em>, <em>MLBase<\/em> and <em>BlinkDB<\/em>.<\/p>\n<p>All the lab exercises were executed in an <strong>Amazon Web-Services<\/strong> (AWS) cluster. Thanks to excellent tech and teamwork, the labs flowed smoothly both days. However, I was interested in bringing up the BDAS stack on my own machines. Here is what I did to do that:<\/p>\n<p>* make a working directory, I called mine <code>amplab3<\/code><\/p>\n<p>* I found that <code>java -version<\/code> showed 1.6 even though I had installed 1.7. So I used these two steps to change it:<br \/>\n<code><br \/>\ndbb@i7c:~\/amplab3$ update-java-alternatives -l<br \/>\njava-1.6.0-openjdk-amd64 1061 \/usr\/lib\/jvm\/java-1.6.0-openjdk-amd64<br \/>\njava-1.7.0-openjdk-amd64 1051 \/usr\/lib\/jvm\/java-1.7.0-openjdk-amd64<\/p>\n<p>sudo apt-get install icedtea-7-plugin<br \/>\nsudo update-java-alternatives -s java-1.7.0-openjdk-amd64<br \/>\n<\/code><\/p>\n<p>* Install <strong>Hadoop<\/strong> from <strong>Cloudera<\/strong> (CDH4) via .debs, following <a href=\"http:\/\/www.cloudera.com\/content\/cloudera-content\/cloudera-docs\/CDH4\/latest\/CDH4-Quick-Start\/cdh4qs_topic_3.html\" title=\"CDH4 install\" target=\"_blank\">these instructions<\/a><\/p>\n<p>* verify the Hadoop install by starting it<br \/>\n<code><br \/>\nsudo service hadoop-hdfs-namenode start<br \/>\nsudo service hadoop-hdfs-secondarynamenode start<br \/>\n<\/code><\/p>\n<p>* I did not use an Ubuntu package for Scala. I used openJDK7 from repository, then got <a href=\"http:\/\/www.scala-lang.org\/download\/2.9.3.html\" title=\"scala-2.9.3\" target=\"_blank\">Scala here<\/a>, unpacked <code>scala-2.9.3.tgz<\/code> in the working directory.<\/p>\n<p>* <code>git clone https:\/\/github.com\/mesos\/spark.git<\/code><\/p>\n<p>* <code>cd spark; cp conf\/spark-env.sh.template conf\/spark-env.sh<\/code><\/p>\n<p>* add <code>SCALA_HOME=\/path\/to\/scala-2.9.3<\/code> to <code>conf\/spark-env.sh<\/code><\/p>\n<p>* <code>sbt\/sbt assembly<\/code><\/p>\n<p>You should be ready to go!  Of course, I did a few extra things, and I took many detours along the way, but that was about it. In particular, I found that the hadoop data directory in this setup is under <code>\/var\/lib\/hadoop-*<\/code>, so I created an alias for that and pointed to a fast local disk with plenty of free space. You may use the conf files in \/etc to get the same effect, but I did not want to change things at a fine level yet.<\/p>\n<p>Once spark or pyspark is running, you can hit port <del datetime=\"2013-09-14T15:49:19+00:00\">3030 <\/del> <strong>4040 <\/strong> with a web browser for an interface to the engine. Other useful docs here:<\/p>\n<p>  http:\/\/spark.incubator.apache.org\/docs\/latest\/configuration.html<br \/>\n  http:\/\/ampcamp.berkeley.edu\/exercises-strata-conf-2013\/index.html<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Berkeley Data Analytics Stack (BDAS) was the central subject at AmpCamp 3. Spark is the core of the stack. It has been recently adopted for incubation as an Apache Project. True to form for a fast-moving OSS project, we actually used the 0.80 git repo version, rather than the 0.73 that you will find [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[],"_links":{"self":[{"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/1252"}],"collection":[{"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1252"}],"version-history":[{"count":26,"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/1252\/revisions"}],"predecessor-version":[{"id":1372,"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/1252\/revisions\/1372"}],"wp:attachment":[{"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1252"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1252"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/blog.light42.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1252"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}