QuestionSetGHow does hadoop rack awareness affect streaming vs. random access?
Random access workloads in a GPFS-SNC cluster get an additional perfor- mance boost because of client-side caching. There’s just no caching like this in HDFS. Good random access performance is important for Hadoop workloads, in spite of the underlying design, which favors sequential access. 

Compare using pig, hive, jaql, java, and python as query languages for hadoop. What are the differences, advantages for each?

Pig is used to allow more analyzing large data sets and spend less time having to write mapper and reducers.  Its designed to handle any kinda of data with or without a schema.  High level over view of pig, user needs to LOAD the data to manipulate form HDFS.  Then execute transformations (with mapper and reducer) then DUMP the data or STORE the results in a file. Hive, now, was created to leverage the existing SQL knowledge base by creating HIVE to write SQL in stead of learning a new language like pig.  The new query language is called Hive Query Language (HQL) and has syntax similar to the standard SQL.  JAQL is a query language for JSON - support for structured and unstructured data. JAQL allow for joins, groups, and filters and stored in HDFS - like a combo of PIG and HIVE.  Advantage is JSON given that web application developers are progressively moving away from XML to more of a JSON data structure.  Since hadoop is written in java, the native java language can be leveraged to create additional custom tools to work work with the already editing toolset.  Lastly, Python able to run hadopp mappers and reducers with streaming; language independent.

Do the pig command-line task in the part2 video. Submit the url to the transcript as the answer for this question.
hduser@benjamin-VirtualBox:/usr/local/hadoop$ bin/hadoop dfs -ls hdfs://localhost:54310/user/hduser/
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Found 6 items
drwxr-xr-x   - hduser supergroup          0 2013-12-17 10:15 hdfs://localhost:54310/user/hduser/benjamin-amaunet-output
drwxr-xr-x   - hduser supergroup          0 2013-12-16 18:44 hdfs://localhost:54310/user/hduser/benjamin-gutenberg
drwxr-xr-x   - hduser supergroup          0 2013-12-16 19:00 hdfs://localhost:54310/user/hduser/benjamin-gutenberg-output
-rw-r--r--   1 hduser supergroup         53 2013-12-17 10:10 hdfs://localhost:54310/user/hduser/countries.dat
-rw-r--r--   1 hduser supergroup        197 2013-12-17 10:10 hdfs://localhost:54310/user/hduser/customers.dat
drwxr-xr-x   - hduser supergroup          0 2013-12-17 10:14 hdfs://localhost:54310/user/hduser/mayo
hduser@benjamin-VirtualBox:/usr/local/hadoop$ /usr/local/hadoop/bin/hadoop dfs -put ~/ForeignAssistanceData.csv 
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

hduser@benjamin-VirtualBox:/usr/local/hadoop$ 
hduser@benjamin-VirtualBox:/usr/local/hadoop$ 
hduser@benjamin-VirtualBox:/usr/local/hadoop$ /home/hduser/pig.sh 
2013-12-17 15:25:53,563 [main] INFO  org.apache.pig.Main - Apache Pig version 0.12.0 (r1529718) compiled Oct 07 2013, 12:20:14
2013-12-17 15:25:53,569 [main] INFO  org.apache.pig.Main - Logging error messages to: /usr/local/hadoop/pig_1387250753561.log
2013-12-17 15:25:53,657 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/hduser/.pigbootup not found
2013-12-17 15:25:53,896 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2013-12-17 15:25:53,896 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2013-12-17 15:25:53,896 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:54310
2013-12-17 15:25:53,899 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
2013-12-17 15:25:55,025 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2013-12-17 15:25:55,025 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:54311
2013-12-17 15:25:55,029 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
grunt> 
grunt> records = LOAD 'ForeignAssistanceData.csv' using PigStorage(',') AS (country:chararray, sum:long);
2013-12-17 15:26:02,073 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.maxtasks.per.job is deprecated. Instead, use mapreduce.jobtracker.maxtasks.perjob
2013-12-17 15:26:02,073 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.system.dir is deprecated. Instead, use mapreduce.jobtracker.system.dir
2013-12-17 15:26:02,073 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.max.attempts is deprecated. Instead, use mapreduce.reduce.maxattempts
2013-12-17 15:26:02,073 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.map.tasks.maximum is deprecated. Instead, use mapreduce.tasktracker.map.tasks.maximum
2013-12-17 15:26:02,074 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.local.dir.minspacekill is deprecated. Instead, use mapreduce.tasktracker.local.dir.minspacekill
2013-12-17 15:26:02,074 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.job.history.block.size is deprecated. Instead, use mapreduce.jobtracker.jobhistory.block.size
2013-12-17 15:26:02,074 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.backup.address is deprecated. Instead, use dfs.namenode.backup.address
2013-12-17 15:26:02,075 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.name.edits.dir is deprecated. Instead, use dfs.namenode.edits.dir
2013-12-17 15:26:02,075 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.task.timeout is deprecated. Instead, use mapreduce.task.timeout
2013-12-17 15:26:02,075 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.task.tracker.task-controller is deprecated. Instead, use mapreduce.tasktracker.taskcontroller
2013-12-17 15:26:02,075 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.local.dir.minspacestart is deprecated. Instead, use mapreduce.tasktracker.local.dir.minspacestart
2013-12-17 15:26:02,075 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - jobclient.progress.monitor.poll.interval is deprecated. Instead, use mapreduce.client.progressmonitor.pollinterval
2013-12-17 15:26:02,075 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.shuffle.merge.percent is deprecated. Instead, use mapreduce.reduce.shuffle.merge.percent
2013-12-17 15:26:02,076 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.block.size is deprecated. Instead, use dfs.blocksize
2013-12-17 15:26:02,076 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.df.interval is deprecated. Instead, use fs.df.interval
2013-12-17 15:26:02,076 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.skip.reduce.max.skip.groups is deprecated. Instead, use mapreduce.reduce.skip.maxgroups
2013-12-17 15:26:02,076 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts
2013-12-17 15:26:02,076 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.task.profile is deprecated. Instead, use mapreduce.task.profile
2013-12-17 15:26:02,076 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapreduce.job.counters.limit is deprecated. Instead, use mapreduce.job.counters.max
2013-12-17 15:26:02,077 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.child.tmp is deprecated. Instead, use mapreduce.task.tmp.dir
2013-12-17 15:26:02,077 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.replication.min is deprecated. Instead, use dfs.namenode.replication.min
2013-12-17 15:26:02,077 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.safemode.threshold.pct is deprecated. Instead, use dfs.namenode.safemode.threshold-pct
2013-12-17 15:26:02,077 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.https.client.keystore.resource is deprecated. Instead, use dfs.client.https.keystore.resource
2013-12-17 15:26:02,077 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.reduce.tasks.maximum is deprecated. Instead, use mapreduce.tasktracker.reduce.tasks.maximum
2013-12-17 15:26:02,077 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.userlog.limit.kb is deprecated. Instead, use mapreduce.task.userlog.limit.kb
2013-12-17 15:26:02,077 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
2013-12-17 15:26:02,077 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.parallel.copies is deprecated. Instead, use mapreduce.reduce.shuffle.parallelcopies
2013-12-17 15:26:02,077 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2013-12-17 15:26:02,077 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.skip.attempts.to.start.skipping is deprecated. Instead, use mapreduce.task.skip.start.attempts
2013-12-17 15:26:02,077 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - jobclient.completion.poll.interval is deprecated. Instead, use mapreduce.client.completion.pollinterval
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - jobclient.output.filter is deprecated. Instead, use mapreduce.client.output.filter
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.max.objects is deprecated. Instead, use dfs.namenode.max.objects
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.expiry.interval is deprecated. Instead, use mapreduce.jobtracker.expire.trackers.interval
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.datanode.max.xcievers is deprecated. Instead, use dfs.datanode.max.transfer.threads
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.submit.replication is deprecated. Instead, use mapreduce.client.submit.file.replication
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.taskScheduler is deprecated. Instead, use mapreduce.jobtracker.taskscheduler
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.temp.dir is deprecated. Instead, use mapreduce.cluster.temp.dir
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.taskmemorymanager.monitoring-interval is deprecated. Instead, use mapreduce.tasktracker.taskmemorymanager.monitoringinterval
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compression.codec is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.userlog.retain.hours is deprecated. Instead, use mapreduce.job.userlog.retain.hours
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.healthChecker.interval is deprecated. Instead, use mapreduce.tasktracker.healthchecker.interval
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.retiredjobs.cache.size is deprecated. Instead, use mapreduce.jobtracker.retiredjobs.cache.size
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.max.tracker.failures is deprecated. Instead, use mapreduce.job.maxtaskfailures.per.tracker
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.replication.considerLoad is deprecated. Instead, use dfs.namenode.replication.considerLoad
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.acls.enabled is deprecated. Instead, use mapreduce.cluster.acls.enabled
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.slowstart.completed.maps is deprecated. Instead, use mapreduce.job.reduce.slowstart.completedmaps
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.handler.count is deprecated. Instead, use mapreduce.jobtracker.handler.count
2013-12-17 15:26:02,078 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - job.end.retry.attempts is deprecated. Instead, use mapreduce.job.end-notification.retry.attempts
2013-12-17 15:26:02,079 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.http.address is deprecated. Instead, use dfs.namenode.http-address
2013-12-17 15:26:02,079 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.persist.jobstatus.dir is deprecated. Instead, use mapreduce.jobtracker.persist.jobstatus.dir
2013-12-17 15:26:02,079 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.name.dir.restore is deprecated. Instead, use dfs.namenode.name.dir.restore
2013-12-17 15:26:02,079 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.https.address is deprecated. Instead, use dfs.namenode.https-address
2013-12-17 15:26:02,079 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.healthChecker.script.timeout is deprecated. Instead, use mapreduce.tasktracker.healthchecker.script.timeout
2013-12-17 15:26:02,079 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.shuffle.connect.timeout is deprecated. Instead, use mapreduce.reduce.shuffle.connect.timeout
2013-12-17 15:26:02,079 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.backup.http.address is deprecated. Instead, use dfs.namenode.backup.http-address
2013-12-17 15:26:02,079 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.secondary.http.address is deprecated. Instead, use dfs.namenode.secondary.http-address
2013-12-17 15:26:02,079 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.umaskmode is deprecated. Instead, use fs.permissions.umask-mode
2013-12-17 15:26:02,079 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
2013-12-17 15:26:02,079 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.replication.interval is deprecated. Instead, use dfs.namenode.replication.interval
2013-12-17 15:26:02,079 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.name.dir is deprecated. Instead, use dfs.namenode.name.dir
2013-12-17 15:26:02,079 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.indexcache.mb is deprecated. Instead, use mapreduce.tasktracker.indexcache.mb
2013-12-17 15:26:02,079 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - keep.failed.task.files is deprecated. Instead, use mapreduce.task.files.preserve.failedtasks
2013-12-17 15:26:02,079 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.heartbeats.in.second is deprecated. Instead, use mapreduce.jobtracker.heartbeats.in.second
2013-12-17 15:26:02,079 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.permissions is deprecated. Instead, use dfs.permissions.enabled
2013-12-17 15:26:02,079 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.speculative.execution.slowTaskThreshold is deprecated. Instead, use mapreduce.job.speculative.slowtaskthreshold
2013-12-17 15:26:02,079 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.checkpoint.dir is deprecated. Instead, use dfs.namenode.checkpoint.dir
2013-12-17 15:26:02,079 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.dns.interface is deprecated. Instead, use mapreduce.tasktracker.dns.interface
2013-12-17 15:26:02,080 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.speculative.execution.slowNodeThreshold is deprecated. Instead, use mapreduce.job.speculative.slownodethreshold
2013-12-17 15:26:02,080 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
2013-12-17 15:26:02,080 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.https.need.client.auth is deprecated. Instead, use dfs.client.https.need-auth
2013-12-17 15:26:02,080 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.checkpoint.edits.dir is deprecated. Instead, use dfs.namenode.checkpoint.edits.dir
2013-12-17 15:26:02,080 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.persist.jobstatus.hours is deprecated. Instead, use mapreduce.jobtracker.persist.jobstatus.hours
2013-12-17 15:26:02,086 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reuse.jvm.num.tasks is deprecated. Instead, use mapreduce.job.jvm.numtasks
2013-12-17 15:26:02,086 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - topology.node.switch.mapping.impl is deprecated. Instead, use net.topology.node.switch.mapping.impl
2013-12-17 15:26:02,086 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.task.cache.levels is deprecated. Instead, use mapreduce.jobtracker.taskcache.levels
2013-12-17 15:26:02,086 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.instrumentation is deprecated. Instead, use mapreduce.tasktracker.instrumentation
2013-12-17 15:26:02,086 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.access.time.precision is deprecated. Instead, use dfs.namenode.accesstime.precision
2013-12-17 15:26:02,086 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.queue.name is deprecated. Instead, use mapreduce.job.queuename
2013-12-17 15:26:02,086 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.child.log.level is deprecated. Instead, use mapreduce.reduce.log.level
2013-12-17 15:26:02,086 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.balance.bandwidthPerSec is deprecated. Instead, use dfs.datanode.balance.bandwidthPerSec
2013-12-17 15:26:02,086 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.persist.jobstatus.active is deprecated. Instead, use mapreduce.jobtracker.persist.jobstatus.active
2013-12-17 15:26:02,086 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.map.output.compression.codec is deprecated. Instead, use mapreduce.map.output.compress.codec
2013-12-17 15:26:02,086 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.task.tracker.http.address is deprecated. Instead, use mapreduce.tasktracker.http.address
2013-12-17 15:26:02,089 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapreduce.jobtracker.split.metainfo.maxsize is deprecated. Instead, use mapreduce.job.split.metainfo.maxsize
2013-12-17 15:26:02,089 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.task.profile.reduces is deprecated. Instead, use mapreduce.task.profile.reduces
2013-12-17 15:26:02,089 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.inmem.merge.threshold is deprecated. Instead, use mapreduce.reduce.merge.inmem.threshold
2013-12-17 15:26:02,089 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.safemode.extension is deprecated. Instead, use dfs.namenode.safemode.extension
2013-12-17 15:26:02,090 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.checkpoint.period is deprecated. Instead, use dfs.namenode.checkpoint.period
2013-12-17 15:26:02,090 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2013-12-17 15:26:02,090 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.data.dir is deprecated. Instead, use dfs.datanode.data.dir
2013-12-17 15:26:02,090 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
2013-12-17 15:26:02,090 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - job.end.retry.interval is deprecated. Instead, use mapreduce.job.end-notification.retry.interval
2013-12-17 15:26:02,090 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compression.type is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type
2013-12-17 15:26:02,090 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - io.sort.spill.percent is deprecated. Instead, use mapreduce.map.sort.spill.percent
2013-12-17 15:26:02,090 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.permissions.supergroup is deprecated. Instead, use dfs.permissions.superusergroup
2013-12-17 15:26:02,090 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2013-12-17 15:26:02,090 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - tasktracker.http.threads is deprecated. Instead, use mapreduce.tasktracker.http.threads
2013-12-17 15:26:02,090 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress
2013-12-17 15:26:02,090 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.input.buffer.percent is deprecated. Instead, use mapreduce.reduce.input.buffer.percent
2013-12-17 15:26:02,090 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.tasks.sleeptime-before-sigkill is deprecated. Instead, use mapreduce.tasktracker.tasks.sleeptimebeforesigkill
2013-12-17 15:26:02,090 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.dns.nameserver is deprecated. Instead, use mapreduce.tasktracker.dns.nameserver
2013-12-17 15:26:02,091 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.shuffle.read.timeout is deprecated. Instead, use mapreduce.reduce.shuffle.read.timeout
2013-12-17 15:26:02,091 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.max.tracker.blacklists is deprecated. Instead, use mapreduce.jobtracker.tasktracker.maxblacklists
2013-12-17 15:26:02,091 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - topology.script.number.args is deprecated. Instead, use net.topology.script.number.args
2013-12-17 15:26:02,091 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.shuffle.input.buffer.percent is deprecated. Instead, use mapreduce.reduce.shuffle.input.buffer.percent
2013-12-17 15:26:02,091 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.merge.recordsBeforeProgress is deprecated. Instead, use mapreduce.task.merge.progress.records
2013-12-17 15:26:02,091 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - dfs.write.packet.size is deprecated. Instead, use dfs.client-write-packet-size
2013-12-17 15:26:02,091 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.restart.recover is deprecated. Instead, use mapreduce.jobtracker.restart.recover
2013-12-17 15:26:02,091 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
2013-12-17 15:26:02,091 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.jobhistory.lru.cache.size is deprecated. Instead, use mapreduce.jobtracker.jobhistory.lru.cache.size
2013-12-17 15:26:02,091 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.map.child.log.level is deprecated. Instead, use mapreduce.map.log.level
2013-12-17 15:26:02,091 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.task.tracker.report.address is deprecated. Instead, use mapreduce.tasktracker.report.address
2013-12-17 15:26:02,091 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.speculative.execution.speculativeCap is deprecated. Instead, use mapreduce.job.speculative.speculativecap
2013-12-17 15:26:02,092 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.skip.map.max.skip.records is deprecated. Instead, use mapreduce.map.skip.maxrecords
2013-12-17 15:26:02,092 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.task.profile.maps is deprecated. Instead, use mapreduce.task.profile.maps
2013-12-17 15:26:02,092 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.instrumentation is deprecated. Instead, use mapreduce.jobtracker.instrumentation
grunt> grouped = GROUP records BY country;                                                               
grunt> thesum = FOREACH grouped GENERATE group, SUM(records.sum);                                        
grunt> DUMP thesum;                                                                                      
2013-12-17 15:26:17,009 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY
2013-12-17 15:26:17,052 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2013-12-17 15:26:17,164 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2013-12-17 15:26:17,179 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
2013-12-17 15:26:17,209 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2013-12-17 15:26:17,210 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2013-12-17 15:26:17,240 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - session.id is deprecated. Instead, use dfs.metrics.session-id
2013-12-17 15:26:17,241 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
2013-12-17 15:26:17,266 [main] WARN  org.apache.pig.backend.hadoop20.PigJobControl - falling back to default JobControl (not using hadoop 0.20 ?)
java.lang.NoSuchFieldException: runnerState
	at java.lang.Class.getDeclaredField(Class.java:1938)
	at org.apache.pig.backend.hadoop20.PigJobControl.<clinit>(PigJobControl.java:51)
	at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.newJobControl(HadoopShims.java:98)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:287)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:190)
	at org.apache.pig.PigServer.launchPlan(PigServer.java:1322)
	at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1307)
	at org.apache.pig.PigServer.storeEx(PigServer.java:978)
	at org.apache.pig.PigServer.store(PigServer.java:942)
	at org.apache.pig.PigServer.openIterator(PigServer.java:855)
	at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774)
	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
	at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
	at org.apache.pig.Main.run(Main.java:541)
	at org.apache.pig.Main.main(Main.java:156)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
2013-12-17 15:26:17,276 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2013-12-17 15:26:17,284 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2013-12-17 15:26:17,293 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job2100377518151093059.jar
2013-12-17 15:26:19,809 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job2100377518151093059.jar created
2013-12-17 15:26:19,810 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar
2013-12-17 15:26:19,832 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2013-12-17 15:26:19,847 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2013-12-17 15:26:19,847 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
2013-12-17 15:26:19,848 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2013-12-17 15:26:19,849 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
2013-12-17 15:26:19,849 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2013-12-17 15:26:19,849 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=-1
2013-12-17 15:26:19,849 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Could not estimate number of reducers and no requested or default parallelism set. Defaulting to 1 reducer.
2013-12-17 15:26:19,850 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2013-12-17 15:26:19,934 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2013-12-17 15:26:19,940 [JobControl] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-17 15:26:19,945 [JobControl] ERROR org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl - Error while trying to run jobs.
java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:225)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:186)
	at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:456)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:342)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
	at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
	at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:240)
	at org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:121)
	at java.lang.Thread.run(Thread.java:724)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)
2013-12-17 15:26:19,949 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2013-12-17 15:26:19,959 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2013-12-17 15:26:19,959 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job null has failed! Stop running all dependent jobs
2013-12-17 15:26:19,960 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2013-12-17 15:26:19,974 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backend error: Unexpected System Error Occured: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:225)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:186)
	at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:456)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:342)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
	at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
	at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:240)
	at org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:121)
	at java.lang.Thread.run(Thread.java:724)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)

2013-12-17 15:26:19,977 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2013-12-17 15:26:19,978 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 

HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
2.2.0	0.12.0	hduser	2013-12-17 15:26:17	2013-12-17 15:26:19	GROUP_BY

Failed!

Failed Jobs:
JobId	Alias	Feature	Message	Outputs
N/A	grouped,records,thesum	GROUP_BY,COMBINER	Message: Unexpected System Error Occured: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:225)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:186)
	at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:456)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:342)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
	at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
	at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:240)
	at org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:121)
	at java.lang.Thread.run(Thread.java:724)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)
	hdfs://localhost:54310/tmp/temp-437325604/tmp46626078,

Input(s):
Failed to read data from "hdfs://localhost:54310/user/hduser/ForeignAssistanceData.csv"

Output(s):
Failed to produce result in "hdfs://localhost:54310/tmp/temp-437325604/tmp46626078"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
null


2013-12-17 15:26:19,981 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2013-12-17 15:26:19,986 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias thesum
Details at logfile: /usr/local/hadoop/pig_1387250753561.log
grunt> 

Do the hive command-line task in the part3 video. Submit the url to the transcript as the answer for this question.
hduser@benjamin-VirtualBox:/usr/local/hadoop$ /home/hduser/hive/bin/hive
13/12/17 15:35:40 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
13/12/17 15:35:40 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
13/12/17 15:35:40 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
13/12/17 15:35:40 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
13/12/17 15:35:40 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
13/12/17 15:35:40 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
13/12/17 15:35:40 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative

Logging initialized using configuration in jar:file:/home/hduser/hive/lib/hive-common-0.12.0.jar!/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hduser/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
hive> C      

CAST         CLUSTER      CLUSTERED    COLLECTION   COLUMNS      COMMENT      CREATE
hive> CREATE TABLE

TABLE         TABLES        TABLESAMPLE
hive> CREATE TABLE foreign_aid
    > (country STRING, sum BI

BIGINT   BINARY
    > (country STRING, sum BIGINT )
    > ROW FORMAT DELIMITED 
    > FILEDS TERMINATED BY ','
    > STORED AS TE 

TEMPORARY    TERMINATED   TEXTFILE
    > STORED AS TEXTFILE ;
NoViableAltException(26@[1518:103: ( tableRowFormatMapKeysIdentifier )?])
	at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
	at org.antlr.runtime.DFA.predict(DFA.java:144)
	at org.apache.hadoop.hive.ql.parse.HiveParser.rowFormatDelimited(HiveParser.java:22901)
	at org.apache.hadoop.hive.ql.parse.HiveParser.tableRowFormat(HiveParser.java:23091)
	at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:4388)
	at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2016)
	at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1298)
	at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:938)
	at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
FAILED: ParseException line 4:0 cannot recognize input near 'FILEDS' 'TERMINATED' 'BY' in serde properties specification

hive> 
    > 
    > 
    > 
    > STORED AS TEXTFILE ;
[7]+  Stopped                 /home/hduser/hive/bin/hive
hduser@benjamin-VirtualBox:/usr/local/hadoop$ 
hduser@benjamin-VirtualBox:/usr/local/hadoop$ 
hduser@benjamin-VirtualBox:/usr/local/hadoop$ 
hduser@benjamin-VirtualBox:/usr/local/hadoop$ /home/hduser/hive/bin/hive
13/12/17 15:37:29 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
13/12/17 15:37:29 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
13/12/17 15:37:29 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
13/12/17 15:37:29 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
13/12/17 15:37:29 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
13/12/17 15:37:29 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
13/12/17 15:37:29 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative

Logging initialized using configuration in jar:file:/home/hduser/hive/lib/hive-common-0.12.0.jar!/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hduser/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
hive> 
    > 
    > CREATE TABLE foreign_aid     
    > (country STRING, sum BIGINT )
    > ROW FORMAT DELIMITED         
    > FIELDS TERMINATED BY ','     
    > STORED AS TEXTFILE ;
OK
Time taken: 29.013 seconds
hive> 
    > SHOW TABLES;
OK
foreign_aid
Time taken: 0.32 seconds, Fetched: 1 row(s)
hive> DESCRIBE format

format           format_number(
hive> DESCRIBE foreign_aid;
OK
country             	string              	None                
sum                 	bigint              	None                
Time taken: 0.153 seconds, Fetched: 2 row(s)
hive> LOAD DAT

DATA       DATE       DATETIME
hive> LOAD DATA INP

INPATH        INPUTFORMAT
hive> LOAD DATA INPATH 'foe
    > re
    > ;
MismatchedTokenException(26!=286)
	at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
	at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
	at org.apache.hadoop.hive.ql.parse.HiveParser.loadStatement(HiveParser.java:1414)
	at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1253)
	at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:938)
	at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
FAILED: ParseException line 1:18 mismatched input 'foe' expecting StringLiteral near 'INPATH' in load statement

hive> LOAD DATA INPATH 'ForeignAssistanceData.csv'
    > OVERWRITE INTO TABLE foreign_aid;
Loading data to table default.foreign_aid
Table default.foreign_aid stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 100234, raw_data_size: 0]
OK
Time taken: 0.702 seconds
hive> SELECT * FROM foreign_aid LIMIT 10;
OK
Afghanistan	314552000
Afghanistan	1200000
Afghanistan	400000000
Afghanistan	1176000
Afghanistan	2400000
Afghanistan	650000
Afghanistan	1500000
Afghanistan	23000000
Afghanistan	1100000
Afghanistan	40000000
Time taken: 0.432 seconds, Fetched: 10 row(s)
hive> SELECT country, SUM(sum) FROM foreign_aid GROUP BY country;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hduser/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
13/12/17 15:43:06 WARN conf.Configuration: file:/tmp/hduser/hive_2013-12-17_15-43-02_056_7905750711769655792-1/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
13/12/17 15:43:06 WARN conf.Configuration: file:/tmp/hduser/hive_2013-12-17_15-43-02_056_7905750711769655792-1/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
13/12/17 15:43:06 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
13/12/17 15:43:06 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
13/12/17 15:43:06 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
13/12/17 15:43:06 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
13/12/17 15:43:06 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
13/12/17 15:43:06 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
13/12/17 15:43:06 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
Execution log at: /tmp/hduser/.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2013-12-17 15:43:11,508 null map = 0%,  reduce = 0%
2013-12-17 15:43:15,225 null map = 100%,  reduce = 100%
Ended Job = job_local1229925622_0001
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
OK
Afghanistan	5398420000
Africa Regional Office (DOS)	160550000
Africa Regional Office (USAID)	231685000
African Union (DOS)	1760000
Albania	45367000
Algeria	13744000
Ambassador-at-Large for Global Women’s Issues (DOS)	5000000
Angola	132593000
Argentina	2745000
Armenia	105965000
Asia Middle East Regional Office (USAID)	52980000
Azerbaijan	63004000
Bahrain	37254000
Bangladesh	470835000
Barbadas	841000
Barbados and Eastern Caribbean (DOS & USAID)	73522000
Belarus	22072000
Belize	6374000
Benin	58015000
Bolivia	61539000
Bosnia and Herzegovina	99999000
Botswana	138869000
Brazil	36845000
Bulgaria	22851000
Bureau for Democracy	NULL
Bureau for Economic Growth	NULL
Bureau for Energy Resources (DOS)	9000000
Bureau for Food Security (USAID)	594100000
Bureau for Global Health (USAID)	1625573000
Bureau for Policy	NULL
Bureau of Arms Control	NULL
Bureau of Democracy	NULL
Bureau of International Narcotics and Law Enforcement Affairs (DOS)	392741000
Bureau of International Security and Nonproliferation (DOS)	467140000
Bureau of Oceans and International Environmental and Scientific Affairs (DOS)	240616000
Bureau of Political-Military Affairs (DOS)	425648000
Bureau of Population	NULL
Burkina Faso	56367000
Burma	83800000
Burundi	79006000
Cambodia	170147000
Cameroon	27646000
Cape Verde	1257000
Central Africa Regional Office (USAID)	45798000
Central African Republic	10268000
Central America Regional Office (USAID)	61600000
Central Asia Regional Office (USAID)	35832000
Chad	91581000
Chile	2455000
China	27150000
Colombia	843218000
Comoros	252000
Costa Rica	5255000
Cote d'Ivoire	293168000
Croatia	9546000
Cuba	40000000
Cyprus	7000000
Czech Republic	14892000
Democratic Republic of Congo	516246000
Djibouti	15013000
Dominican Republic	64126000
East Africa Regional Office (USAID)	125709000
Ecuador	58891000
Educational and Cultural Affairs (DOS)	5000000
Egypt	3213739000
El Salvador	72263000
Estonia	7262000
Ethiopia	1315017000
Eurasia Regional Office (DOS & USAID)	116915000
Europe Regional Office (DOS & USAID)	37217000
Foreign Assistance Program Evaluation	600000
Gabon	612000
Georgia	236853000
Ghana	378928000
Greece	202000
Guatemala	196109000
Guinea	37257000
Guinea-Bissau	25000
Guyana	24909000
Haiti	757177000
Honduras	133242000
Hungary	3847000
India	250762000
Indonesia	432387000
International Fund for Ireland (USAID)	5000000
International Organizations (DOS) - ICAO International Civil Aviation Organization	1881000
International Organizations (DOS) - IDLO International Development Law Organization	1188000
International Organizations (DOS) - IMO International Maritime Organization	792000
International Organizations (DOS) - International Chemicals and Toxins Programs	7260000
International Organizations (DOS) - International Conservation Programs	15500000
International Organizations (DOS) - International Panel on Climate Change / UN Framework Convention on Climate Change	23500000
International Organizations (DOS) - Montreal Protocol Multilateral Fund	56232000
International Organizations (DOS) - Multilateral Action Initiatives	2000000
International Organizations (DOS) - OAS Development Assistance	8250000
International Organizations (DOS) - OAS Fund for Strengthening Democracy	7440000
International Organizations (DOS) - UN OCHA UN Office for the Coordination of Humanitarian Affairs	5940000
International Organizations (DOS) - UN Voluntary Funds for Technical Cooperation in the Field of Human Rights	2772000
International Organizations (DOS) - UN Women (formerly UNIFEM)	8000000
International Organizations (DOS) - UN Women Trust Fund (formerly UNIFEM Trust Fund)	7500000
International Organizations (DOS) - UN-HABITAT UN Human Settlements Program	3800000
International Organizations (DOS) - UNCDF UN Capital Development Fund	1905000
International Organizations (DOS) - UNDF UN Democracy Fund	9510000
International Organizations (DOS) - UNDP UN Development Program	153535000
International Organizations (DOS) - UNEP UN Environment Program	15400000
International Organizations (DOS) - UNESCO/ICSECA International Contributions for Scientific	NULL
International Organizations (DOS) - UNFPA UN Population Fund	77700000
International Organizations (DOS) - UNHCHR UN High Commissioner for Human Rights	5000000
International Organizations (DOS) - UNICEF UN Children's Fund	258355000
International Organizations (DOS) - UNVFVT UN Voluntary Fund for Victims of Torture	11700000
International Organizations (DOS) - WMO World Meteorological Organization	4180000
International Organizations (DOS) - WTO Technical Assistance	2290000
International Organizations and Development Institutions (US Treasury) - African Development Bank (AfDB)	32417720
International Organizations and Development Institutions (US Treasury) - African Development Fund (AfDF)	172500000
International Organizations and Development Institutions (US Treasury) - Asian Development Bank (AsDB)	106586000
International Organizations and Development Institutions (US Treasury) - Asian Development Fund (AsDF)	100000000
International Organizations and Development Institutions (US Treasury) - Clean Technology Fund (CTF)	184630000
International Organizations and Development Institutions (US Treasury) - Global Agriculture and Food Security Program (GAFSP) 	135000000
International Organizations and Development Institutions (US Treasury) - Global Environment Facility (GEF)	89820000
International Organizations and Development Institutions (US Treasury) - Inter-American Development Bank (IDB and FSO)	75000000
International Organizations and Development Institutions (US Treasury) - Inter-American Investment Corporation (IIC)	4670000
International Organizations and Development Institutions (US Treasury) - International Bank for Reconstruction and Development (IBRD)	117364344
International Organizations and Development Institutions (US Treasury) - International Development Association (IDA)	1325000000
International Organizations and Development Institutions (US Treasury) - International Fund for Agricultural Development (IFAD)	30000000
International Organizations and Development Institutions (US Treasury) - Multilateral Investment Fund (MIF)	25000000
International Organizations and Development Institutions (US Treasury) - Strategic Climate Funds (SCF)	49900000
Iraq	2630487000
Israel	6150000000
Jamaica	18611000
Jordan	1451600000
Kazakhstan	135491000
Kenya	1259768000
Kosovo	134182000
Kyrgyz Republic	120515000
Laos	16904000
Latin America and Caribbean Regional Office (USAID)	90688000
Latvia	7027000
Lebanon	429931000
Lesotho	44550000
Liberia	422469000
Libya	7046000
Lithuania	7525000
Macedonia	38977000
Madagascar	148273000
Malawi	385345000
Malaysia	4779000
Maldives	6633000
Mali	407401000
Malta	300000
Marshall Islands	2096000
Mauritania	18435000
Mauritius	265000
Mexico	747054000
Micronesia	1992000
Middle East Multilaterals (DOS)	1500000
Middle East Partnership Initiative (DOS)	70000000
Middle East Regional Office (USAID)	5000000
Middle East Regional Office Cooperation (DOS)	5000000
Middle East Response Fund	5000000
Moldova	60600000
Mongolia	17675000
Montenegro	12765000
Morocco	84891000
Mozambique	772019000
Multilateral Food Security Programs	14600000
Multinational Force and Observers (DOS)	28000000
Namibia	190595000
Near East Regional Democracy (DOS)	35000000
Near East Regional Office (DOS)	142000000
Nepal	189293000
Nicaragua	43067000
Niger	73954000
Nigeria	1310419000
Office of Development Partners (USAID)	44124000
Office of Innovation and Development Alliances (USAID)	86418000
Office of the Coordinator for Counterterrorism (DOS)	260291000
Office of the Global AIDS Coordinator (DOS)	3747032000
Office to Monitor and Combat Trafficking In Persons (DOS)	39528000
Oman	23788000
Pakistan	3688232000
Panama	10208000
Papua New Guinea	12500000
Paraguay	12633000
Peru	194510000
Philippines	321883000
Poland	66465000
Pooled Funding	73000000
Portugal	125000
Regional Office Development Mission-Asia (USAID)	124480000
Republic of Congo	191000
Romania	28604000
Russia	166485000
Rwanda	438475000
S/GPI - Special Representative for Global Partnerships	3000000
S/SRMC - Special Representative to Muslim Communities	3000000
Samoa	155000
Sao Tome and Principe	298000
Saudi Arabia	9000
Senegal	230702000
Serbia	77887000
Seychelles	235000
Sierra Leone	40375000
Singapore	500000
Slovak Republic	4153000
Slovenia	2319000
Somalia	385059000
South Africa	1104865000
South America Regional Office (USAID)	21530000
South Asia Regional Office (USAID)	6050000
South Sudan	619577000
South and Central Asia Regional Office (DOS)	17048000
Southern Africa Regional Office (USAID)	61200000
Sri Lanka	35599000
Sudan	196024000
Suriname	661000
Swaziland	77256000
Syrian Arab Republic	55500000
Taiwan	500000
Tajikistan	114399000
Tanzania	1054710000
Thailand	26509000
The Bahamas	733000
The Gambia	231000
Timor-Leste	32648000
Togo	1028000
Trans-Sahara Counter-Terrorism Partnership (DOS)	4500000
Trinidad and Tobago	649000
Tunisia	95912000
Turkey	11147000
Turkmenistan	19474000
U.S. Department of Defense - World-Wide	103830000
U.S. Department of the Treasury - Office of Technical Assistance	27000000
USAID Capital Investment Fund	318900000
USAID Development Credit Authority Admin	16600000
USAID Forward: Program Effectiveness Initiatives	71773000
USAID Inspector General Operating Expense	102500000
USAID Operating Expense	2850720000
Uganda	988628000
Ukraine	287225000
Unallocated Earmarks	64922000
Uruguay	1598000
Uzbekistan	56212000
Venezuela	11000000
Vietnam	233481000
West Africa Regional Office (USAID)	168925000
West Bank and Gaza	1023656000
Western Hemisphere Regional Office (DOS)	416700000
Worldwide	26707979
Yemen	255372000
Zambia	713595000
Zimbabwe	245074000
Time taken: 13.766 seconds, Fetched: 250 row(s)
hive> 

What is hadoop streaming? How is it relevant to our python examples?
Hadoop Streamin API framwork to allow one to write a map and reduce hadoop function in other languages.  We are then able to run a python script to leverage this API framwork and run map and reduce functions outside of java.

What is Hbase and what is the relationship to hdfs? Describe when it would be used.
HBase is a distributed scalable big data store - column databasae that runs on top of HDFS.  HBase is used for random, realtime, read-write access for very large tables.  HBase does not contain a SQL language like most databases. One can use HBase to create a REST application that requires the avaiabliity of sparse data.  In addition, one can use Avro service, an Apache project, to help applications read and write data to file in an effieent way.  Avro also allows for versioning if an applications schema changes the file can change with the application. 

Compare cassandra and Hbase. What are the differences, similarities, and what is the best use of each?
HBase has a wide Infrastructure using Zookeeper, NameNode, HDFS.  Its said that organizations who are will to deploy a Hadoop cluster will be comfortable with leveraging Hadoop knowledge by using HBase.  Cassandra infrastructure and operations are different than Hadoop, the general knowledge requirements are different than Hadoop.  However, for analytics, many Cassandra deployments use Cassandra + Storm (which uses Zookeeper), and/or Cassandra + Hadoop. Hbase support out-of-the-box simple aggregations in HBase - sum, min, max, ave, etc.. Other aggregations can be built by defining java-classes to perform the aggregation.  On the other hand aggregations in Cassandra are not supported  - client must provide custom aggregations.  When the aggregation requirement spans multiple rows, Random Partitioning makes aggregations very difficult for the client.   Recommendation is to use Storm or Hadoop for aggregations.