hduser@benjamin-VirtualBox:~/data$ hduser@benjamin-VirtualBox:~/data$ hduser@benjamin-VirtualBox:~/data$ /usr/local/hadoop/bin/hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar -D mapred.reduce.tasks=4 -file ~/data/smplMapper.py -mapper smplMapper.py -file ~/data/smplReducer2.py -reducer smplReducer2.py -input data/DataSet.txt -input data/FIPS_CountyName.txt -output benjamin-problem3-output-reducer2 -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner -jobconf stream.map.output.field.separator=^ -jobconf stream.num.map.output.key.fields=4 -jobconf map.output.key.field.separator=^ -jobconf num.key.fields.for.partition=1 13/12/17 17:03:54 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead. 13/12/17 17:03:56 WARN streaming.StreamJob: -jobconf option is deprecated, please use -D instead. 13/12/17 17:03:56 INFO Configuration.deprecation: map.output.key.field.separator is deprecated. Instead, use mapreduce.map.output.key.field.separator packageJobJar: [/home/hduser/data/smplMapper.py, /home/hduser/data/smplReducer2.py] [] /tmp/streamjob2203691415396959200.jar tmpDir=null 13/12/17 17:03:56 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 13/12/17 17:03:56 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 13/12/17 17:03:56 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 13/12/17 17:03:56 INFO mapred.FileInputFormat: Total input paths to process : 2 13/12/17 17:03:56 INFO mapreduce.JobSubmitter: number of splits:2 13/12/17 17:03:56 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name 13/12/17 17:03:56 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 13/12/17 17:03:56 INFO Configuration.deprecation: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes 13/12/17 17:03:56 INFO Configuration.deprecation: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files 13/12/17 17:03:56 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 13/12/17 17:03:56 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class 13/12/17 17:03:56 INFO Configuration.deprecation: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class 13/12/17 17:03:56 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name 13/12/17 17:03:56 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 13/12/17 17:03:56 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 13/12/17 17:03:56 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 13/12/17 17:03:56 INFO Configuration.deprecation: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps 13/12/17 17:03:56 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class 13/12/17 17:03:56 INFO Configuration.deprecation: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class 13/12/17 17:03:56 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir 13/12/17 17:03:56 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1390222897_0001 13/12/17 17:03:57 WARN conf.Configuration: file:/app/hadoop/tmp/mapred/staging/hduser1390222897/.staging/job_local1390222897_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 13/12/17 17:03:57 WARN conf.Configuration: file:/app/hadoop/tmp/mapred/staging/hduser1390222897/.staging/job_local1390222897_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 13/12/17 17:03:57 INFO mapred.LocalDistributedCacheManager: Localized file:/home/hduser/data/smplMapper.py as file:/app/hadoop/tmp/mapred/local/1387256637092/smplMapper.py 13/12/17 17:03:57 INFO mapred.LocalDistributedCacheManager: Localized file:/home/hduser/data/smplReducer2.py as file:/app/hadoop/tmp/mapred/local/1387256637093/smplReducer2.py 13/12/17 17:03:57 INFO Configuration.deprecation: mapred.cache.localFiles is deprecated. Instead, use mapreduce.job.cache.local.files 13/12/17 17:03:57 WARN conf.Configuration: file:/app/hadoop/tmp/mapred/local/localRunner/hduser/job_local1390222897_0001/job_local1390222897_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 13/12/17 17:03:57 WARN conf.Configuration: file:/app/hadoop/tmp/mapred/local/localRunner/hduser/job_local1390222897_0001/job_local1390222897_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 13/12/17 17:03:57 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 13/12/17 17:03:57 INFO mapreduce.Job: Running job: job_local1390222897_0001 13/12/17 17:03:57 INFO mapred.LocalJobRunner: OutputCommitter set in config null 13/12/17 17:03:57 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter 13/12/17 17:03:57 INFO mapred.LocalJobRunner: Waiting for map tasks 13/12/17 17:03:57 INFO mapred.LocalJobRunner: Starting task: attempt_local1390222897_0001_m_000000_0 13/12/17 17:03:57 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 13/12/17 17:03:57 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/user/hduser/data/DataSet.txt:0+819973 13/12/17 17:03:57 INFO mapred.MapTask: numReduceTasks: 1 13/12/17 17:03:57 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 13/12/17 17:03:57 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 13/12/17 17:03:57 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 13/12/17 17:03:57 INFO mapred.MapTask: soft limit at 83886080 13/12/17 17:03:57 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 13/12/17 17:03:57 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 13/12/17 17:03:57 INFO streaming.PipeMapRed: PipeMapRed exec [/home/hduser/data/./smplMapper.py] 13/12/17 17:03:57 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 13/12/17 17:03:57 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name 13/12/17 17:03:57 INFO Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start 13/12/17 17:03:57 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 13/12/17 17:03:57 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 13/12/17 17:03:57 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 13/12/17 17:03:57 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords 13/12/17 17:03:57 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 13/12/17 17:03:57 INFO Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length 13/12/17 17:03:57 INFO Configuration.deprecation: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir 13/12/17 17:03:57 INFO Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir 13/12/17 17:03:57 INFO Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file 13/12/17 17:03:57 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 13/12/17 17:03:57 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/17 17:03:57 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/17 17:03:57 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/17 17:03:57 INFO streaming.PipeMapRed: Records R/W=503/1 13/12/17 17:03:57 INFO streaming.PipeMapRed: R/W/S=1000/253/0 in:NA [rec/s] out:NA [rec/s] 13/12/17 17:03:57 INFO streaming.PipeMapRed: MRErrorThread done 13/12/17 17:03:57 INFO streaming.PipeMapRed: mapRedFinished 13/12/17 17:03:57 INFO mapred.LocalJobRunner: 13/12/17 17:03:57 INFO mapred.MapTask: Starting flush of map output 13/12/17 17:03:57 INFO mapred.MapTask: Spilling map output 13/12/17 17:03:57 INFO mapred.MapTask: bufstart = 0; bufend = 53344; bufvoid = 104857600 13/12/17 17:03:57 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26201824(104807296); length = 12573/6553600 13/12/17 17:03:57 INFO mapred.MapTask: Finished spill 0 13/12/17 17:03:57 INFO mapred.Task: Task:attempt_local1390222897_0001_m_000000_0 is done. And is in the process of committing 13/12/17 17:03:57 INFO mapred.LocalJobRunner: Records R/W=503/1 13/12/17 17:03:57 INFO mapred.Task: Task 'attempt_local1390222897_0001_m_000000_0' done. 13/12/17 17:03:57 INFO mapred.LocalJobRunner: Finishing task: attempt_local1390222897_0001_m_000000_0 13/12/17 17:03:57 INFO mapred.LocalJobRunner: Starting task: attempt_local1390222897_0001_m_000001_0 13/12/17 17:03:57 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 13/12/17 17:03:57 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/user/hduser/data/FIPS_CountyName.txt:0+79472 13/12/17 17:03:57 INFO mapred.MapTask: numReduceTasks: 1 13/12/17 17:03:57 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 13/12/17 17:03:57 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 13/12/17 17:03:57 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 13/12/17 17:03:57 INFO mapred.MapTask: soft limit at 83886080 13/12/17 17:03:57 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 13/12/17 17:03:57 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 13/12/17 17:03:57 INFO streaming.PipeMapRed: PipeMapRed exec [/home/hduser/data/./smplMapper.py] 13/12/17 17:03:57 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/17 17:03:57 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/17 17:03:57 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/17 17:03:57 INFO streaming.PipeMapRed: R/W/S=1000/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/17 17:03:57 INFO streaming.PipeMapRed: Records R/W=3195/1 13/12/17 17:03:57 INFO streaming.PipeMapRed: MRErrorThread done 13/12/17 17:03:57 INFO streaming.PipeMapRed: mapRedFinished 13/12/17 17:03:57 INFO mapred.LocalJobRunner: 13/12/17 17:03:57 INFO mapred.MapTask: Starting flush of map output 13/12/17 17:03:57 INFO mapred.MapTask: Spilling map output 13/12/17 17:03:57 INFO mapred.MapTask: bufstart = 0; bufend = 86018; bufvoid = 104857600 13/12/17 17:03:57 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26201620(104806480); length = 12777/6553600 13/12/17 17:03:57 INFO mapred.MapTask: Finished spill 0 13/12/17 17:03:57 INFO mapred.Task: Task:attempt_local1390222897_0001_m_000001_0 is done. And is in the process of committing 13/12/17 17:03:57 INFO mapred.LocalJobRunner: Records R/W=3195/1 13/12/17 17:03:57 INFO mapred.Task: Task 'attempt_local1390222897_0001_m_000001_0' done. 13/12/17 17:03:57 INFO mapred.LocalJobRunner: Finishing task: attempt_local1390222897_0001_m_000001_0 13/12/17 17:03:57 INFO mapred.LocalJobRunner: Map task executor complete. 13/12/17 17:03:57 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 13/12/17 17:03:57 INFO mapred.Merger: Merging 2 sorted segments 13/12/17 17:03:58 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 152000 bytes 13/12/17 17:03:58 INFO mapred.LocalJobRunner: 13/12/17 17:03:58 INFO streaming.PipeMapRed: PipeMapRed exec [/home/hduser/data/./smplReducer2.py] 13/12/17 17:03:58 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 13/12/17 17:03:58 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 13/12/17 17:03:58 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/17 17:03:58 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/17 17:03:58 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/17 17:03:58 INFO streaming.PipeMapRed: R/W/S=1000/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/17 17:03:58 INFO streaming.PipeMapRed: Records R/W=6339/1 13/12/17 17:03:58 INFO streaming.PipeMapRed: MRErrorThread done 13/12/17 17:03:58 INFO streaming.PipeMapRed: mapRedFinished 13/12/17 17:03:58 INFO mapred.Task: Task:attempt_local1390222897_0001_r_000000_0 is done. And is in the process of committing 13/12/17 17:03:58 INFO mapred.LocalJobRunner: 13/12/17 17:03:58 INFO mapred.Task: Task attempt_local1390222897_0001_r_000000_0 is allowed to commit now 13/12/17 17:03:58 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1390222897_0001_r_000000_0' to hdfs://localhost:54310/user/hduser/benjamin-problem3-output-reducer2/_temporary/0/task_local1390222897_0001_r_000000 13/12/17 17:03:58 INFO mapred.LocalJobRunner: Records R/W=6339/1 > reduce 13/12/17 17:03:58 INFO mapred.Task: Task 'attempt_local1390222897_0001_r_000000_0' done. 13/12/17 17:03:58 INFO mapreduce.Job: Job job_local1390222897_0001 running in uber mode : false 13/12/17 17:03:58 INFO mapreduce.Job: map 100% reduce 100% 13/12/17 17:03:58 INFO mapreduce.Job: Job job_local1390222897_0001 completed successfully 13/12/17 17:03:58 INFO mapreduce.Job: Counters: 32 File System Counters FILE: Number of bytes read=167311 FILE: Number of bytes written=966665 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2618863 HDFS: Number of bytes written=565 HDFS: Number of read operations=28 HDFS: Number of large read operations=0 HDFS: Number of write operations=5 Map-Reduce Framework Map input records=6391 Map output records=6339 Map output bytes=139362 Map output materialized bytes=152052 Input split bytes=214 Combine input records=0 Combine output records=0 Reduce input groups=6339 Reduce shuffle bytes=0 Reduce input records=6339 Reduce output records=52 Spilled Records=12678 Shuffled Maps =0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=86 CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 Total committed heap usage (bytes)=457912320 File Input Format Counters Bytes Read=899445 File Output Format Counters Bytes Written=565 13/12/17 17:03:58 INFO streaming.StreamJob: Output directory: benjamin-problem3-output-reducer2 hduser@benjamin-VirtualBox:~/data$