QuestionSetGHow does hadoop rack awareness affect streaming vs. random access? Random access workloads in a GPFS-SNC cluster get an additional perfor- mance boost because of client-side caching. There’s just no caching like this in HDFS. Good random access performance is important for Hadoop workloads, in spite of the underlying design, which favors sequential access. Compare using pig, hive, jaql, java, and python as query languages for hadoop. What are the differences, advantages for each? Pig is used to allow more analyzing large data sets and spend less time having to write mapper and reducers. Its designed to handle any kinda of data with or without a schema. High level over view of pig, user needs to LOAD the data to manipulate form HDFS. Then execute transformations (with mapper and reducer) then DUMP the data or STORE the results in a file. Hive, now, was created to leverage the existing SQL knowledge base by creating HIVE to write SQL in stead of learning a new language like pig. The new query language is called Hive Query Language (HQL) and has syntax similar to the standard SQL. JAQL is a query language for JSON - support for structured and unstructured data. JAQL allow for joins, groups, and filters and stored in HDFS - like a combo of PIG and HIVE. Advantage is JSON given that web application developers are progressively moving away from XML to more of a JSON data structure. Since hadoop is written in java, the native java language can be leveraged to create additional custom tools to work work with the already editing toolset. Lastly, Python able to run hadopp mappers and reducers with streaming; language independent. Do the pig command-line task in the part2 video. Submit the url to the transcript as the answer for this question. hduser@benjamin-VirtualBox:/usr/local/hadoop$ bin/hadoop dfs -ls hdfs://localhost:54310/user/hduser/ DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Found 6 items drwxr-xr-x - hduser supergroup 0 2013-12-17 10:15 hdfs://localhost:54310/user/hduser/benjamin-amaunet-output drwxr-xr-x - hduser supergroup 0 2013-12-16 18:44 hdfs://localhost:54310/user/hduser/benjamin-gutenberg drwxr-xr-x - hduser supergroup 0 2013-12-16 19:00 hdfs://localhost:54310/user/hduser/benjamin-gutenberg-output -rw-r--r-- 1 hduser supergroup 53 2013-12-17 10:10 hdfs://localhost:54310/user/hduser/countries.dat -rw-r--r-- 1 hduser supergroup 197 2013-12-17 10:10 hdfs://localhost:54310/user/hduser/customers.dat drwxr-xr-x - hduser supergroup 0 2013-12-17 10:14 hdfs://localhost:54310/user/hduser/mayo hduser@benjamin-VirtualBox:/usr/local/hadoop$ /usr/local/hadoop/bin/hadoop dfs -put ~/ForeignAssistanceData.csv DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. hduser@benjamin-VirtualBox:/usr/local/hadoop$ hduser@benjamin-VirtualBox:/usr/local/hadoop$ hduser@benjamin-VirtualBox:/usr/local/hadoop$ /home/hduser/pig.sh 2013-12-17 15:25:53,563 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0 (r1529718) compiled Oct 07 2013, 12:20:14 2013-12-17 15:25:53,569 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/local/hadoop/pig_1387250753561.log 2013-12-17 15:25:53,657 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/hduser/.pigbootup not found 2013-12-17 15:25:53,896 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2013-12-17 15:25:53,896 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2013-12-17 15:25:53,896 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:54310 2013-12-17 15:25:53,899 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used 2013-12-17 15:25:55,025 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2013-12-17 15:25:55,025 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:54311 2013-12-17 15:25:55,029 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS grunt> grunt> records = LOAD 'ForeignAssistanceData.csv' using PigStorage(',') AS (country:chararray, sum:long); 2013-12-17 15:26:02,073 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.maxtasks.per.job is deprecated. Instead, use mapreduce.jobtracker.maxtasks.perjob 2013-12-17 15:26:02,073 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.system.dir is deprecated. Instead, use mapreduce.jobtracker.system.dir 2013-12-17 15:26:02,073 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.max.attempts is deprecated. Instead, use mapreduce.reduce.maxattempts 2013-12-17 15:26:02,073 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.map.tasks.maximum is deprecated. Instead, use mapreduce.tasktracker.map.tasks.maximum 2013-12-17 15:26:02,074 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.local.dir.minspacekill is deprecated. Instead, use mapreduce.tasktracker.local.dir.minspacekill 2013-12-17 15:26:02,074 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.job.history.block.size is deprecated. Instead, use mapreduce.jobtracker.jobhistory.block.size 2013-12-17 15:26:02,074 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.backup.address is deprecated. Instead, use dfs.namenode.backup.address 2013-12-17 15:26:02,075 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.name.edits.dir is deprecated. Instead, use dfs.namenode.edits.dir 2013-12-17 15:26:02,075 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.timeout is deprecated. Instead, use mapreduce.task.timeout 2013-12-17 15:26:02,075 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.tracker.task-controller is deprecated. Instead, use mapreduce.tasktracker.taskcontroller 2013-12-17 15:26:02,075 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.local.dir.minspacestart is deprecated. Instead, use mapreduce.tasktracker.local.dir.minspacestart 2013-12-17 15:26:02,075 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - jobclient.progress.monitor.poll.interval is deprecated. Instead, use mapreduce.client.progressmonitor.pollinterval 2013-12-17 15:26:02,075 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.shuffle.merge.percent is deprecated. Instead, use mapreduce.reduce.shuffle.merge.percent 2013-12-17 15:26:02,076 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.block.size is deprecated. Instead, use dfs.blocksize 2013-12-17 15:26:02,076 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.df.interval is deprecated. Instead, use fs.df.interval 2013-12-17 15:26:02,076 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.skip.reduce.max.skip.groups is deprecated. Instead, use mapreduce.reduce.skip.maxgroups 2013-12-17 15:26:02,076 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts 2013-12-17 15:26:02,076 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.profile is deprecated. Instead, use mapreduce.task.profile 2013-12-17 15:26:02,076 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.job.counters.limit is deprecated. Instead, use mapreduce.job.counters.max 2013-12-17 15:26:02,077 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.child.tmp is deprecated. Instead, use mapreduce.task.tmp.dir 2013-12-17 15:26:02,077 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.replication.min is deprecated. Instead, use dfs.namenode.replication.min 2013-12-17 15:26:02,077 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.safemode.threshold.pct is deprecated. Instead, use dfs.namenode.safemode.threshold-pct 2013-12-17 15:26:02,077 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.https.client.keystore.resource is deprecated. Instead, use dfs.client.https.keystore.resource 2013-12-17 15:26:02,077 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.reduce.tasks.maximum is deprecated. Instead, use mapreduce.tasktracker.reduce.tasks.maximum 2013-12-17 15:26:02,077 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.userlog.limit.kb is deprecated. Instead, use mapreduce.task.userlog.limit.kb 2013-12-17 15:26:02,077 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 2013-12-17 15:26:02,077 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.parallel.copies is deprecated. Instead, use mapreduce.reduce.shuffle.parallelcopies 2013-12-17 15:26:02,077 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - hadoop.native.lib is deprecated. Instead, use io.native.lib.available 2013-12-17 15:26:02,077 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.skip.attempts.to.start.skipping is deprecated. Instead, use mapreduce.task.skip.start.attempts 2013-12-17 15:26:02,077 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - jobclient.completion.poll.interval is deprecated. Instead, use mapreduce.client.completion.pollinterval 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - jobclient.output.filter is deprecated. Instead, use mapreduce.client.output.filter 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.max.objects is deprecated. Instead, use dfs.namenode.max.objects 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.expiry.interval is deprecated. Instead, use mapreduce.jobtracker.expire.trackers.interval 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.datanode.max.xcievers is deprecated. Instead, use dfs.datanode.max.transfer.threads 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.submit.replication is deprecated. Instead, use mapreduce.client.submit.file.replication 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.taskScheduler is deprecated. Instead, use mapreduce.jobtracker.taskscheduler 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.temp.dir is deprecated. Instead, use mapreduce.cluster.temp.dir 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.taskmemorymanager.monitoring-interval is deprecated. Instead, use mapreduce.tasktracker.taskmemorymanager.monitoringinterval 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compression.codec is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.userlog.retain.hours is deprecated. Instead, use mapreduce.job.userlog.retain.hours 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.healthChecker.interval is deprecated. Instead, use mapreduce.tasktracker.healthchecker.interval 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.retiredjobs.cache.size is deprecated. Instead, use mapreduce.jobtracker.retiredjobs.cache.size 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.max.tracker.failures is deprecated. Instead, use mapreduce.job.maxtaskfailures.per.tracker 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.replication.considerLoad is deprecated. Instead, use dfs.namenode.replication.considerLoad 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.acls.enabled is deprecated. Instead, use mapreduce.cluster.acls.enabled 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.slowstart.completed.maps is deprecated. Instead, use mapreduce.job.reduce.slowstart.completedmaps 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.handler.count is deprecated. Instead, use mapreduce.jobtracker.handler.count 2013-12-17 15:26:02,078 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - job.end.retry.attempts is deprecated. Instead, use mapreduce.job.end-notification.retry.attempts 2013-12-17 15:26:02,079 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.http.address is deprecated. Instead, use dfs.namenode.http-address 2013-12-17 15:26:02,079 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.persist.jobstatus.dir is deprecated. Instead, use mapreduce.jobtracker.persist.jobstatus.dir 2013-12-17 15:26:02,079 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.name.dir.restore is deprecated. Instead, use dfs.namenode.name.dir.restore 2013-12-17 15:26:02,079 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.https.address is deprecated. Instead, use dfs.namenode.https-address 2013-12-17 15:26:02,079 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.healthChecker.script.timeout is deprecated. Instead, use mapreduce.tasktracker.healthchecker.script.timeout 2013-12-17 15:26:02,079 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.shuffle.connect.timeout is deprecated. Instead, use mapreduce.reduce.shuffle.connect.timeout 2013-12-17 15:26:02,079 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.backup.http.address is deprecated. Instead, use dfs.namenode.backup.http-address 2013-12-17 15:26:02,079 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.secondary.http.address is deprecated. Instead, use dfs.namenode.secondary.http-address 2013-12-17 15:26:02,079 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.umaskmode is deprecated. Instead, use fs.permissions.umask-mode 2013-12-17 15:26:02,079 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb 2013-12-17 15:26:02,079 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.replication.interval is deprecated. Instead, use dfs.namenode.replication.interval 2013-12-17 15:26:02,079 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.name.dir is deprecated. Instead, use dfs.namenode.name.dir 2013-12-17 15:26:02,079 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.indexcache.mb is deprecated. Instead, use mapreduce.tasktracker.indexcache.mb 2013-12-17 15:26:02,079 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - keep.failed.task.files is deprecated. Instead, use mapreduce.task.files.preserve.failedtasks 2013-12-17 15:26:02,079 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.heartbeats.in.second is deprecated. Instead, use mapreduce.jobtracker.heartbeats.in.second 2013-12-17 15:26:02,079 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.permissions is deprecated. Instead, use dfs.permissions.enabled 2013-12-17 15:26:02,079 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.speculative.execution.slowTaskThreshold is deprecated. Instead, use mapreduce.job.speculative.slowtaskthreshold 2013-12-17 15:26:02,079 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.checkpoint.dir is deprecated. Instead, use dfs.namenode.checkpoint.dir 2013-12-17 15:26:02,079 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.dns.interface is deprecated. Instead, use mapreduce.tasktracker.dns.interface 2013-12-17 15:26:02,080 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.speculative.execution.slowNodeThreshold is deprecated. Instead, use mapreduce.job.speculative.slownodethreshold 2013-12-17 15:26:02,080 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor 2013-12-17 15:26:02,080 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.https.need.client.auth is deprecated. Instead, use dfs.client.https.need-auth 2013-12-17 15:26:02,080 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.checkpoint.edits.dir is deprecated. Instead, use dfs.namenode.checkpoint.edits.dir 2013-12-17 15:26:02,080 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.persist.jobstatus.hours is deprecated. Instead, use mapreduce.jobtracker.persist.jobstatus.hours 2013-12-17 15:26:02,086 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reuse.jvm.num.tasks is deprecated. Instead, use mapreduce.job.jvm.numtasks 2013-12-17 15:26:02,086 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - topology.node.switch.mapping.impl is deprecated. Instead, use net.topology.node.switch.mapping.impl 2013-12-17 15:26:02,086 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.cache.levels is deprecated. Instead, use mapreduce.jobtracker.taskcache.levels 2013-12-17 15:26:02,086 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.instrumentation is deprecated. Instead, use mapreduce.tasktracker.instrumentation 2013-12-17 15:26:02,086 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.access.time.precision is deprecated. Instead, use dfs.namenode.accesstime.precision 2013-12-17 15:26:02,086 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.queue.name is deprecated. Instead, use mapreduce.job.queuename 2013-12-17 15:26:02,086 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.child.log.level is deprecated. Instead, use mapreduce.reduce.log.level 2013-12-17 15:26:02,086 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.balance.bandwidthPerSec is deprecated. Instead, use dfs.datanode.balance.bandwidthPerSec 2013-12-17 15:26:02,086 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.persist.jobstatus.active is deprecated. Instead, use mapreduce.jobtracker.persist.jobstatus.active 2013-12-17 15:26:02,086 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.output.compression.codec is deprecated. Instead, use mapreduce.map.output.compress.codec 2013-12-17 15:26:02,086 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.tracker.http.address is deprecated. Instead, use mapreduce.tasktracker.http.address 2013-12-17 15:26:02,089 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.jobtracker.split.metainfo.maxsize is deprecated. Instead, use mapreduce.job.split.metainfo.maxsize 2013-12-17 15:26:02,089 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.profile.reduces is deprecated. Instead, use mapreduce.task.profile.reduces 2013-12-17 15:26:02,089 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.inmem.merge.threshold is deprecated. Instead, use mapreduce.reduce.merge.inmem.threshold 2013-12-17 15:26:02,089 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.safemode.extension is deprecated. Instead, use dfs.namenode.safemode.extension 2013-12-17 15:26:02,090 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.checkpoint.period is deprecated. Instead, use dfs.namenode.checkpoint.period 2013-12-17 15:26:02,090 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 2013-12-17 15:26:02,090 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.data.dir is deprecated. Instead, use dfs.datanode.data.dir 2013-12-17 15:26:02,090 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir 2013-12-17 15:26:02,090 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - job.end.retry.interval is deprecated. Instead, use mapreduce.job.end-notification.retry.interval 2013-12-17 15:26:02,090 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compression.type is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type 2013-12-17 15:26:02,090 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.sort.spill.percent is deprecated. Instead, use mapreduce.map.sort.spill.percent 2013-12-17 15:26:02,090 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.permissions.supergroup is deprecated. Instead, use dfs.permissions.superusergroup 2013-12-17 15:26:02,090 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent 2013-12-17 15:26:02,090 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - tasktracker.http.threads is deprecated. Instead, use mapreduce.tasktracker.http.threads 2013-12-17 15:26:02,090 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress 2013-12-17 15:26:02,090 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.input.buffer.percent is deprecated. Instead, use mapreduce.reduce.input.buffer.percent 2013-12-17 15:26:02,090 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.tasks.sleeptime-before-sigkill is deprecated. Instead, use mapreduce.tasktracker.tasks.sleeptimebeforesigkill 2013-12-17 15:26:02,090 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.dns.nameserver is deprecated. Instead, use mapreduce.tasktracker.dns.nameserver 2013-12-17 15:26:02,091 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.shuffle.read.timeout is deprecated. Instead, use mapreduce.reduce.shuffle.read.timeout 2013-12-17 15:26:02,091 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.max.tracker.blacklists is deprecated. Instead, use mapreduce.jobtracker.tasktracker.maxblacklists 2013-12-17 15:26:02,091 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - topology.script.number.args is deprecated. Instead, use net.topology.script.number.args 2013-12-17 15:26:02,091 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.shuffle.input.buffer.percent is deprecated. Instead, use mapreduce.reduce.shuffle.input.buffer.percent 2013-12-17 15:26:02,091 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.merge.recordsBeforeProgress is deprecated. Instead, use mapreduce.task.merge.progress.records 2013-12-17 15:26:02,091 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.write.packet.size is deprecated. Instead, use dfs.client-write-packet-size 2013-12-17 15:26:02,091 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.restart.recover is deprecated. Instead, use mapreduce.jobtracker.restart.recover 2013-12-17 15:26:02,091 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 2013-12-17 15:26:02,091 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.jobhistory.lru.cache.size is deprecated. Instead, use mapreduce.jobtracker.jobhistory.lru.cache.size 2013-12-17 15:26:02,091 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.child.log.level is deprecated. Instead, use mapreduce.map.log.level 2013-12-17 15:26:02,091 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.tracker.report.address is deprecated. Instead, use mapreduce.tasktracker.report.address 2013-12-17 15:26:02,091 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.speculative.execution.speculativeCap is deprecated. Instead, use mapreduce.job.speculative.speculativecap 2013-12-17 15:26:02,092 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.skip.map.max.skip.records is deprecated. Instead, use mapreduce.map.skip.maxrecords 2013-12-17 15:26:02,092 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.profile.maps is deprecated. Instead, use mapreduce.task.profile.maps 2013-12-17 15:26:02,092 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.instrumentation is deprecated. Instead, use mapreduce.jobtracker.instrumentation grunt> grouped = GROUP records BY country; grunt> thesum = FOREACH grouped GENERATE group, SUM(records.sum); grunt> DUMP thesum; 2013-12-17 15:26:17,009 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY 2013-12-17 15:26:17,052 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]} 2013-12-17 15:26:17,164 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2013-12-17 15:26:17,179 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner 2013-12-17 15:26:17,209 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2013-12-17 15:26:17,210 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2013-12-17 15:26:17,240 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - session.id is deprecated. Instead, use dfs.metrics.session-id 2013-12-17 15:26:17,241 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId= 2013-12-17 15:26:17,266 [main] WARN org.apache.pig.backend.hadoop20.PigJobControl - falling back to default JobControl (not using hadoop 0.20 ?) java.lang.NoSuchFieldException: runnerState at java.lang.Class.getDeclaredField(Class.java:1938) at org.apache.pig.backend.hadoop20.PigJobControl.(PigJobControl.java:51) at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.newJobControl(HadoopShims.java:98) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:287) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:190) at org.apache.pig.PigServer.launchPlan(PigServer.java:1322) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1307) at org.apache.pig.PigServer.storeEx(PigServer.java:978) at org.apache.pig.PigServer.store(PigServer.java:942) at org.apache.pig.PigServer.openIterator(PigServer.java:855) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) at org.apache.pig.Main.run(Main.java:541) at org.apache.pig.Main.main(Main.java:156) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) 2013-12-17 15:26:17,276 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2013-12-17 15:26:17,284 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2013-12-17 15:26:17,293 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job2100377518151093059.jar 2013-12-17 15:26:19,809 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job2100377518151093059.jar created 2013-12-17 15:26:19,810 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar 2013-12-17 15:26:19,832 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2013-12-17 15:26:19,847 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code. 2013-12-17 15:26:19,847 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache 2013-12-17 15:26:19,848 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize [] 2013-12-17 15:26:19,849 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2013-12-17 15:26:19,849 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2013-12-17 15:26:19,849 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=-1 2013-12-17 15:26:19,849 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Could not estimate number of reducers and no requested or default parallelism set. Defaulting to 1 reducer. 2013-12-17 15:26:19,850 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2013-12-17 15:26:19,934 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2013-12-17 15:26:19,940 [JobControl] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2013-12-17 15:26:19,945 [JobControl] ERROR org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl - Error while trying to run jobs. java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:225) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:186) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:456) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:342) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265) at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335) at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:240) at org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:121) at java.lang.Thread.run(Thread.java:724) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270) 2013-12-17 15:26:19,949 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2013-12-17 15:26:19,959 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 2013-12-17 15:26:19,959 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job null has failed! Stop running all dependent jobs 2013-12-17 15:26:19,960 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2013-12-17 15:26:19,974 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backend error: Unexpected System Error Occured: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:225) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:186) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:456) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:342) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265) at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335) at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:240) at org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:121) at java.lang.Thread.run(Thread.java:724) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270) 2013-12-17 15:26:19,977 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! 2013-12-17 15:26:19,978 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.2.0 0.12.0 hduser 2013-12-17 15:26:17 2013-12-17 15:26:19 GROUP_BY Failed! Failed Jobs: JobId Alias Feature Message Outputs N/A grouped,records,thesum GROUP_BY,COMBINER Message: Unexpected System Error Occured: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:225) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:186) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:456) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:342) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265) at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335) at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:240) at org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:121) at java.lang.Thread.run(Thread.java:724) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270) hdfs://localhost:54310/tmp/temp-437325604/tmp46626078, Input(s): Failed to read data from "hdfs://localhost:54310/user/hduser/ForeignAssistanceData.csv" Output(s): Failed to produce result in "hdfs://localhost:54310/tmp/temp-437325604/tmp46626078" Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: null 2013-12-17 15:26:19,981 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2013-12-17 15:26:19,986 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias thesum Details at logfile: /usr/local/hadoop/pig_1387250753561.log grunt> Do the hive command-line task in the part3 video. Submit the url to the transcript as the answer for this question. hduser@benjamin-VirtualBox:/usr/local/hadoop$ /home/hduser/hive/bin/hive 13/12/17 15:35:40 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 13/12/17 15:35:40 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 13/12/17 15:35:40 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 13/12/17 15:35:40 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 13/12/17 15:35:40 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 13/12/17 15:35:40 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 13/12/17 15:35:40 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative Logging initialized using configuration in jar:file:/home/hduser/hive/lib/hive-common-0.12.0.jar!/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hduser/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] hive> C CAST CLUSTER CLUSTERED COLLECTION COLUMNS COMMENT CREATE hive> CREATE TABLE TABLE TABLES TABLESAMPLE hive> CREATE TABLE foreign_aid > (country STRING, sum BI BIGINT BINARY > (country STRING, sum BIGINT ) > ROW FORMAT DELIMITED > FILEDS TERMINATED BY ',' > STORED AS TE TEMPORARY TERMINATED TEXTFILE > STORED AS TEXTFILE ; NoViableAltException(26@[1518:103: ( tableRowFormatMapKeysIdentifier )?]) at org.antlr.runtime.DFA.noViableAlt(DFA.java:158) at org.antlr.runtime.DFA.predict(DFA.java:144) at org.apache.hadoop.hive.ql.parse.HiveParser.rowFormatDelimited(HiveParser.java:22901) at org.apache.hadoop.hive.ql.parse.HiveParser.tableRowFormat(HiveParser.java:23091) at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:4388) at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2016) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1298) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:938) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) FAILED: ParseException line 4:0 cannot recognize input near 'FILEDS' 'TERMINATED' 'BY' in serde properties specification hive> > > > > STORED AS TEXTFILE ; [7]+ Stopped /home/hduser/hive/bin/hive hduser@benjamin-VirtualBox:/usr/local/hadoop$ hduser@benjamin-VirtualBox:/usr/local/hadoop$ hduser@benjamin-VirtualBox:/usr/local/hadoop$ hduser@benjamin-VirtualBox:/usr/local/hadoop$ /home/hduser/hive/bin/hive 13/12/17 15:37:29 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 13/12/17 15:37:29 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 13/12/17 15:37:29 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 13/12/17 15:37:29 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 13/12/17 15:37:29 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 13/12/17 15:37:29 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 13/12/17 15:37:29 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative Logging initialized using configuration in jar:file:/home/hduser/hive/lib/hive-common-0.12.0.jar!/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hduser/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] hive> > > CREATE TABLE foreign_aid > (country STRING, sum BIGINT ) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > STORED AS TEXTFILE ; OK Time taken: 29.013 seconds hive> > SHOW TABLES; OK foreign_aid Time taken: 0.32 seconds, Fetched: 1 row(s) hive> DESCRIBE format format format_number( hive> DESCRIBE foreign_aid; OK country string None sum bigint None Time taken: 0.153 seconds, Fetched: 2 row(s) hive> LOAD DAT DATA DATE DATETIME hive> LOAD DATA INP INPATH INPUTFORMAT hive> LOAD DATA INPATH 'foe > re > ; MismatchedTokenException(26!=286) at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617) at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) at org.apache.hadoop.hive.ql.parse.HiveParser.loadStatement(HiveParser.java:1414) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1253) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:938) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) FAILED: ParseException line 1:18 mismatched input 'foe' expecting StringLiteral near 'INPATH' in load statement hive> LOAD DATA INPATH 'ForeignAssistanceData.csv' > OVERWRITE INTO TABLE foreign_aid; Loading data to table default.foreign_aid Table default.foreign_aid stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 100234, raw_data_size: 0] OK Time taken: 0.702 seconds hive> SELECT * FROM foreign_aid LIMIT 10; OK Afghanistan 314552000 Afghanistan 1200000 Afghanistan 400000000 Afghanistan 1176000 Afghanistan 2400000 Afghanistan 650000 Afghanistan 1500000 Afghanistan 23000000 Afghanistan 1100000 Afghanistan 40000000 Time taken: 0.432 seconds, Fetched: 10 row(s) hive> SELECT country, SUM(sum) FROM foreign_aid GROUP BY country; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapred.reduce.tasks= SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hduser/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 13/12/17 15:43:06 WARN conf.Configuration: file:/tmp/hduser/hive_2013-12-17_15-43-02_056_7905750711769655792-1/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 13/12/17 15:43:06 WARN conf.Configuration: file:/tmp/hduser/hive_2013-12-17_15-43-02_056_7905750711769655792-1/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 13/12/17 15:43:06 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 13/12/17 15:43:06 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 13/12/17 15:43:06 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 13/12/17 15:43:06 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 13/12/17 15:43:06 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 13/12/17 15:43:06 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 13/12/17 15:43:06 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative Execution log at: /tmp/hduser/.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2013-12-17 15:43:11,508 null map = 0%, reduce = 0% 2013-12-17 15:43:15,225 null map = 100%, reduce = 100% Ended Job = job_local1229925622_0001 Execution completed successfully Mapred Local Task Succeeded . Convert the Join into MapJoin OK Afghanistan 5398420000 Africa Regional Office (DOS) 160550000 Africa Regional Office (USAID) 231685000 African Union (DOS) 1760000 Albania 45367000 Algeria 13744000 Ambassador-at-Large for Global Women’s Issues (DOS) 5000000 Angola 132593000 Argentina 2745000 Armenia 105965000 Asia Middle East Regional Office (USAID) 52980000 Azerbaijan 63004000 Bahrain 37254000 Bangladesh 470835000 Barbadas 841000 Barbados and Eastern Caribbean (DOS & USAID) 73522000 Belarus 22072000 Belize 6374000 Benin 58015000 Bolivia 61539000 Bosnia and Herzegovina 99999000 Botswana 138869000 Brazil 36845000 Bulgaria 22851000 Bureau for Democracy NULL Bureau for Economic Growth NULL Bureau for Energy Resources (DOS) 9000000 Bureau for Food Security (USAID) 594100000 Bureau for Global Health (USAID) 1625573000 Bureau for Policy NULL Bureau of Arms Control NULL Bureau of Democracy NULL Bureau of International Narcotics and Law Enforcement Affairs (DOS) 392741000 Bureau of International Security and Nonproliferation (DOS) 467140000 Bureau of Oceans and International Environmental and Scientific Affairs (DOS) 240616000 Bureau of Political-Military Affairs (DOS) 425648000 Bureau of Population NULL Burkina Faso 56367000 Burma 83800000 Burundi 79006000 Cambodia 170147000 Cameroon 27646000 Cape Verde 1257000 Central Africa Regional Office (USAID) 45798000 Central African Republic 10268000 Central America Regional Office (USAID) 61600000 Central Asia Regional Office (USAID) 35832000 Chad 91581000 Chile 2455000 China 27150000 Colombia 843218000 Comoros 252000 Costa Rica 5255000 Cote d'Ivoire 293168000 Croatia 9546000 Cuba 40000000 Cyprus 7000000 Czech Republic 14892000 Democratic Republic of Congo 516246000 Djibouti 15013000 Dominican Republic 64126000 East Africa Regional Office (USAID) 125709000 Ecuador 58891000 Educational and Cultural Affairs (DOS) 5000000 Egypt 3213739000 El Salvador 72263000 Estonia 7262000 Ethiopia 1315017000 Eurasia Regional Office (DOS & USAID) 116915000 Europe Regional Office (DOS & USAID) 37217000 Foreign Assistance Program Evaluation 600000 Gabon 612000 Georgia 236853000 Ghana 378928000 Greece 202000 Guatemala 196109000 Guinea 37257000 Guinea-Bissau 25000 Guyana 24909000 Haiti 757177000 Honduras 133242000 Hungary 3847000 India 250762000 Indonesia 432387000 International Fund for Ireland (USAID) 5000000 International Organizations (DOS) - ICAO International Civil Aviation Organization 1881000 International Organizations (DOS) - IDLO International Development Law Organization 1188000 International Organizations (DOS) - IMO International Maritime Organization 792000 International Organizations (DOS) - International Chemicals and Toxins Programs 7260000 International Organizations (DOS) - International Conservation Programs 15500000 International Organizations (DOS) - International Panel on Climate Change / UN Framework Convention on Climate Change 23500000 International Organizations (DOS) - Montreal Protocol Multilateral Fund 56232000 International Organizations (DOS) - Multilateral Action Initiatives 2000000 International Organizations (DOS) - OAS Development Assistance 8250000 International Organizations (DOS) - OAS Fund for Strengthening Democracy 7440000 International Organizations (DOS) - UN OCHA UN Office for the Coordination of Humanitarian Affairs 5940000 International Organizations (DOS) - UN Voluntary Funds for Technical Cooperation in the Field of Human Rights 2772000 International Organizations (DOS) - UN Women (formerly UNIFEM) 8000000 International Organizations (DOS) - UN Women Trust Fund (formerly UNIFEM Trust Fund) 7500000 International Organizations (DOS) - UN-HABITAT UN Human Settlements Program 3800000 International Organizations (DOS) - UNCDF UN Capital Development Fund 1905000 International Organizations (DOS) - UNDF UN Democracy Fund 9510000 International Organizations (DOS) - UNDP UN Development Program 153535000 International Organizations (DOS) - UNEP UN Environment Program 15400000 International Organizations (DOS) - UNESCO/ICSECA International Contributions for Scientific NULL International Organizations (DOS) - UNFPA UN Population Fund 77700000 International Organizations (DOS) - UNHCHR UN High Commissioner for Human Rights 5000000 International Organizations (DOS) - UNICEF UN Children's Fund 258355000 International Organizations (DOS) - UNVFVT UN Voluntary Fund for Victims of Torture 11700000 International Organizations (DOS) - WMO World Meteorological Organization 4180000 International Organizations (DOS) - WTO Technical Assistance 2290000 International Organizations and Development Institutions (US Treasury) - African Development Bank (AfDB) 32417720 International Organizations and Development Institutions (US Treasury) - African Development Fund (AfDF) 172500000 International Organizations and Development Institutions (US Treasury) - Asian Development Bank (AsDB) 106586000 International Organizations and Development Institutions (US Treasury) - Asian Development Fund (AsDF) 100000000 International Organizations and Development Institutions (US Treasury) - Clean Technology Fund (CTF) 184630000 International Organizations and Development Institutions (US Treasury) - Global Agriculture and Food Security Program (GAFSP) 135000000 International Organizations and Development Institutions (US Treasury) - Global Environment Facility (GEF) 89820000 International Organizations and Development Institutions (US Treasury) - Inter-American Development Bank (IDB and FSO) 75000000 International Organizations and Development Institutions (US Treasury) - Inter-American Investment Corporation (IIC) 4670000 International Organizations and Development Institutions (US Treasury) - International Bank for Reconstruction and Development (IBRD) 117364344 International Organizations and Development Institutions (US Treasury) - International Development Association (IDA) 1325000000 International Organizations and Development Institutions (US Treasury) - International Fund for Agricultural Development (IFAD) 30000000 International Organizations and Development Institutions (US Treasury) - Multilateral Investment Fund (MIF) 25000000 International Organizations and Development Institutions (US Treasury) - Strategic Climate Funds (SCF) 49900000 Iraq 2630487000 Israel 6150000000 Jamaica 18611000 Jordan 1451600000 Kazakhstan 135491000 Kenya 1259768000 Kosovo 134182000 Kyrgyz Republic 120515000 Laos 16904000 Latin America and Caribbean Regional Office (USAID) 90688000 Latvia 7027000 Lebanon 429931000 Lesotho 44550000 Liberia 422469000 Libya 7046000 Lithuania 7525000 Macedonia 38977000 Madagascar 148273000 Malawi 385345000 Malaysia 4779000 Maldives 6633000 Mali 407401000 Malta 300000 Marshall Islands 2096000 Mauritania 18435000 Mauritius 265000 Mexico 747054000 Micronesia 1992000 Middle East Multilaterals (DOS) 1500000 Middle East Partnership Initiative (DOS) 70000000 Middle East Regional Office (USAID) 5000000 Middle East Regional Office Cooperation (DOS) 5000000 Middle East Response Fund 5000000 Moldova 60600000 Mongolia 17675000 Montenegro 12765000 Morocco 84891000 Mozambique 772019000 Multilateral Food Security Programs 14600000 Multinational Force and Observers (DOS) 28000000 Namibia 190595000 Near East Regional Democracy (DOS) 35000000 Near East Regional Office (DOS) 142000000 Nepal 189293000 Nicaragua 43067000 Niger 73954000 Nigeria 1310419000 Office of Development Partners (USAID) 44124000 Office of Innovation and Development Alliances (USAID) 86418000 Office of the Coordinator for Counterterrorism (DOS) 260291000 Office of the Global AIDS Coordinator (DOS) 3747032000 Office to Monitor and Combat Trafficking In Persons (DOS) 39528000 Oman 23788000 Pakistan 3688232000 Panama 10208000 Papua New Guinea 12500000 Paraguay 12633000 Peru 194510000 Philippines 321883000 Poland 66465000 Pooled Funding 73000000 Portugal 125000 Regional Office Development Mission-Asia (USAID) 124480000 Republic of Congo 191000 Romania 28604000 Russia 166485000 Rwanda 438475000 S/GPI - Special Representative for Global Partnerships 3000000 S/SRMC - Special Representative to Muslim Communities 3000000 Samoa 155000 Sao Tome and Principe 298000 Saudi Arabia 9000 Senegal 230702000 Serbia 77887000 Seychelles 235000 Sierra Leone 40375000 Singapore 500000 Slovak Republic 4153000 Slovenia 2319000 Somalia 385059000 South Africa 1104865000 South America Regional Office (USAID) 21530000 South Asia Regional Office (USAID) 6050000 South Sudan 619577000 South and Central Asia Regional Office (DOS) 17048000 Southern Africa Regional Office (USAID) 61200000 Sri Lanka 35599000 Sudan 196024000 Suriname 661000 Swaziland 77256000 Syrian Arab Republic 55500000 Taiwan 500000 Tajikistan 114399000 Tanzania 1054710000 Thailand 26509000 The Bahamas 733000 The Gambia 231000 Timor-Leste 32648000 Togo 1028000 Trans-Sahara Counter-Terrorism Partnership (DOS) 4500000 Trinidad and Tobago 649000 Tunisia 95912000 Turkey 11147000 Turkmenistan 19474000 U.S. Department of Defense - World-Wide 103830000 U.S. Department of the Treasury - Office of Technical Assistance 27000000 USAID Capital Investment Fund 318900000 USAID Development Credit Authority Admin 16600000 USAID Forward: Program Effectiveness Initiatives 71773000 USAID Inspector General Operating Expense 102500000 USAID Operating Expense 2850720000 Uganda 988628000 Ukraine 287225000 Unallocated Earmarks 64922000 Uruguay 1598000 Uzbekistan 56212000 Venezuela 11000000 Vietnam 233481000 West Africa Regional Office (USAID) 168925000 West Bank and Gaza 1023656000 Western Hemisphere Regional Office (DOS) 416700000 Worldwide 26707979 Yemen 255372000 Zambia 713595000 Zimbabwe 245074000 Time taken: 13.766 seconds, Fetched: 250 row(s) hive> What is hadoop streaming? How is it relevant to our python examples? Hadoop Streamin API framwork to allow one to write a map and reduce hadoop function in other languages. We are then able to run a python script to leverage this API framwork and run map and reduce functions outside of java. What is Hbase and what is the relationship to hdfs? Describe when it would be used. HBase is a distributed scalable big data store - column databasae that runs on top of HDFS. HBase is used for random, realtime, read-write access for very large tables. HBase does not contain a SQL language like most databases. One can use HBase to create a REST application that requires the avaiabliity of sparse data. In addition, one can use Avro service, an Apache project, to help applications read and write data to file in an effieent way. Avro also allows for versioning if an applications schema changes the file can change with the application. Compare cassandra and Hbase. What are the differences, similarities, and what is the best use of each? HBase has a wide Infrastructure using Zookeeper, NameNode, HDFS.  Its said that organizations who are will to deploy a Hadoop cluster will be comfortable with leveraging Hadoop knowledge by using HBase. Cassandra infrastructure and operations are different than Hadoop, the general knowledge requirements are different than Hadoop.  However, for analytics, many Cassandra deployments use Cassandra + Storm (which uses Zookeeper), and/or Cassandra + Hadoop. Hbase support out-of-the-box simple aggregations in HBase - sum, min, max, ave, etc.. Other aggregations can be built by defining java-classes to perform the aggregation. On the other hand aggregations in Cassandra are not supported  - client must provide custom aggregations.  When the aggregation requirement spans multiple rows, Random Partitioning makes aggregations very difficult for the client.   Recommendation is to use Storm or Hadoop for aggregations.