Skip to content
 

AmpCamp 3 – HDFS

Hadoop-blue-200x200

As described in a previous post, I used Cloudera .debs to install Hadoop/HDFS on an Ubuntu 12.04 ‘Precise’ single node. Now, to put some data into the HDFS system and use it. (note- this did not work in a 32bit VM)

The Hadoop/HDFS install consisted of two steps: obtain and install .deb cdh4-repository, which enables a suite of other packages (perhaps auto-magically updated); then use those packages to install the features you want.

apt-get update shows the Cloudera repository in the list.

Hit http://archive.cloudera.com precise-cdh4 Release.gpg
Hit http://archive.cloudera.com precise-cdh4 Release
Hit http://archive.cloudera.com precise-cdh4/contrib Sources
Hit http://archive.cloudera.com precise-cdh4/contrib amd64 Packages
Ign http://archive.cloudera.com precise-cdh4/contrib TranslationIndex
Ign http://archive.cloudera.com precise-cdh4/contrib Translation-en_US
Ign http://archive.cloudera.com precise-cdh4/contrib Translation-en

Packages look like this:
CDH4-all-pkgs

Some quick reading shows I have at least two choices to easily interface to HDFS. One is the hue suite, and another is an HDFS Fuse interface. More options are listed on this MountableHDFS page.

Cloudera supplies an HDFS Fuse mount with their system. Instructions on how to use the FUSE extension are here.

When HDFS is running with this system (with or without FUSE) you can view a web interface at port 50070. A port list here.

after mucking with the environment a bit

Exception in thread "main" java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS):

I was getting an error when trying to start the secondarynamenode, that the namenode had an invalid address. so, investigating:

update-alternatives --get-selections | grep hadoop
hadoop-conf auto /etc/hadoop/conf.empty

less /etc/hadoop/conf.empty/core-site.xml

core-site.xml turned out to be missing the actual namenode name,
which was not apparent in the doc I was reading.. So, edit
/etc/hadoop/conf.empty/core-site.xml and add a name and value property xml tags. The system starts up after this change.

<configuration>
 <property>
  <name>fs.default.name</name>
  <value>hdfs://127.0.1.1:50070</value>
 </property>
</configuration>