hduser@benjamin-VirtualBox:/usr/local/hadoop$ hduser@benjamin-VirtualBox:/usr/local/hadoop$ hduser@benjamin-VirtualBox:/usr/local/hadoop$ hduser@benjamin-VirtualBox:/usr/local/hadoop$ bin/hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar \ > -file /home/hduser/mapper.py -mapper /home/hduser/mapper.py \ > -file /home/hduser/reducer.py -reducer /home/hduser/reducer.py \ > -input /user/hduser/benjamin-gutenberg/* -output /user/hduser/benjamin-gutenberg-output 13/12/16 19:00:11 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead. packageJobJar: [/home/hduser/mapper.py, /home/hduser/reducer.py] [] /tmp/streamjob8321535884138731280.jar tmpDir=null 13/12/16 19:00:13 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 13/12/16 19:00:13 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 13/12/16 19:00:13 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 13/12/16 19:00:14 INFO mapred.FileInputFormat: Total input paths to process : 3 13/12/16 19:00:14 INFO mapreduce.JobSubmitter: number of splits:3 13/12/16 19:00:14 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir 13/12/16 19:00:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1026659806_0001 13/12/16 19:00:14 WARN conf.Configuration: file:/app/hadoop/tmp/mapred/staging/hduser1026659806/.staging/job_local1026659806_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 13/12/16 19:00:14 WARN conf.Configuration: file:/app/hadoop/tmp/mapred/staging/hduser1026659806/.staging/job_local1026659806_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 13/12/16 19:00:14 INFO mapred.LocalDistributedCacheManager: Localized file:/home/hduser/mapper.py as file:/app/hadoop/tmp/mapred/local/1387177214471/mapper.py 13/12/16 19:00:14 INFO mapred.LocalDistributedCacheManager: Localized file:/home/hduser/reducer.py as file:/app/hadoop/tmp/mapred/local/1387177214472/reducer.py 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.cache.localFiles is deprecated. Instead, use mapreduce.job.cache.local.files 13/12/16 19:00:14 WARN conf.Configuration: file:/app/hadoop/tmp/mapred/local/localRunner/hduser/job_local1026659806_0001/job_local1026659806_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 13/12/16 19:00:14 WARN conf.Configuration: file:/app/hadoop/tmp/mapred/local/localRunner/hduser/job_local1026659806_0001/job_local1026659806_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 13/12/16 19:00:14 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 13/12/16 19:00:14 INFO mapreduce.Job: Running job: job_local1026659806_0001 13/12/16 19:00:14 INFO mapred.LocalJobRunner: OutputCommitter set in config null 13/12/16 19:00:14 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter 13/12/16 19:00:14 INFO mapred.LocalJobRunner: Waiting for map tasks 13/12/16 19:00:14 INFO mapred.LocalJobRunner: Starting task: attempt_local1026659806_0001_m_000000_0 13/12/16 19:00:14 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 13/12/16 19:00:14 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/user/hduser/benjamin-gutenberg/benjamin-4300.txt:0+1573150 13/12/16 19:00:14 INFO mapred.MapTask: numReduceTasks: 1 13/12/16 19:00:14 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 13/12/16 19:00:14 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 13/12/16 19:00:14 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 13/12/16 19:00:14 INFO mapred.MapTask: soft limit at 83886080 13/12/16 19:00:14 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 13/12/16 19:00:14 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 13/12/16 19:00:14 INFO streaming.PipeMapRed: PipeMapRed exec [/usr/local/hadoop/./mapper.py] 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 13/12/16 19:00:14 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name 13/12/16 19:00:14 INFO Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 13/12/16 19:00:14 INFO Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir 13/12/16 19:00:14 INFO Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file 13/12/16 19:00:14 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 13/12/16 19:00:15 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:15 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:15 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:15 INFO streaming.PipeMapRed: R/W/S=1000/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:15 INFO streaming.PipeMapRed: Records R/W=3166/1 13/12/16 19:00:15 INFO streaming.PipeMapRed: R/W/S=10000/49910/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:15 INFO streaming.PipeMapRed: MRErrorThread done 13/12/16 19:00:15 INFO streaming.PipeMapRed: mapRedFinished 13/12/16 19:00:15 INFO mapred.LocalJobRunner: 13/12/16 19:00:15 INFO mapred.MapTask: Starting flush of map output 13/12/16 19:00:15 INFO mapred.MapTask: Spilling map output 13/12/16 19:00:15 INFO mapred.MapTask: bufstart = 0; bufend = 2065931; bufvoid = 104857600 13/12/16 19:00:15 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 25142496(100569984); length = 1071901/6553600 13/12/16 19:00:15 INFO mapreduce.Job: Job job_local1026659806_0001 running in uber mode : false 13/12/16 19:00:15 INFO mapreduce.Job: map 0% reduce 0% 13/12/16 19:00:16 INFO mapred.MapTask: Finished spill 0 13/12/16 19:00:16 INFO mapred.Task: Task:attempt_local1026659806_0001_m_000000_0 is done. And is in the process of committing 13/12/16 19:00:16 INFO mapred.LocalJobRunner: Records R/W=3166/1 13/12/16 19:00:16 INFO mapred.Task: Task 'attempt_local1026659806_0001_m_000000_0' done. 13/12/16 19:00:16 INFO mapred.LocalJobRunner: Finishing task: attempt_local1026659806_0001_m_000000_0 13/12/16 19:00:16 INFO mapred.LocalJobRunner: Starting task: attempt_local1026659806_0001_m_000001_0 13/12/16 19:00:16 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 13/12/16 19:00:16 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/user/hduser/benjamin-gutenberg/benjamin-5000.txt:0+1423803 13/12/16 19:00:16 INFO mapred.MapTask: numReduceTasks: 1 13/12/16 19:00:16 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 13/12/16 19:00:16 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 13/12/16 19:00:16 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 13/12/16 19:00:16 INFO mapred.MapTask: soft limit at 83886080 13/12/16 19:00:16 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 13/12/16 19:00:16 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 13/12/16 19:00:16 INFO streaming.PipeMapRed: PipeMapRed exec [/usr/local/hadoop/./mapper.py] 13/12/16 19:00:16 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:16 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:16 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:16 INFO streaming.PipeMapRed: R/W/S=1000/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:16 INFO streaming.PipeMapRed: Records R/W=2820/1 13/12/16 19:00:16 INFO streaming.PipeMapRed: R/W/S=10000/57210/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:16 INFO mapreduce.Job: map 100% reduce 0% 13/12/16 19:00:16 INFO streaming.PipeMapRed: MRErrorThread done 13/12/16 19:00:16 INFO streaming.PipeMapRed: mapRedFinished 13/12/16 19:00:16 INFO mapred.LocalJobRunner: 13/12/16 19:00:16 INFO mapred.MapTask: Starting flush of map output 13/12/16 19:00:16 INFO mapred.MapTask: Spilling map output 13/12/16 19:00:16 INFO mapred.MapTask: bufstart = 0; bufend = 1884967; bufvoid = 104857600 13/12/16 19:00:16 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 25208992(100835968); length = 1005405/6553600 13/12/16 19:00:17 INFO mapred.MapTask: Finished spill 0 13/12/16 19:00:17 INFO mapred.Task: Task:attempt_local1026659806_0001_m_000001_0 is done. And is in the process of committing 13/12/16 19:00:17 INFO mapred.LocalJobRunner: Records R/W=2820/1 13/12/16 19:00:17 INFO mapred.Task: Task 'attempt_local1026659806_0001_m_000001_0' done. 13/12/16 19:00:17 INFO mapred.LocalJobRunner: Finishing task: attempt_local1026659806_0001_m_000001_0 13/12/16 19:00:17 INFO mapred.LocalJobRunner: Starting task: attempt_local1026659806_0001_m_000002_0 13/12/16 19:00:17 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 13/12/16 19:00:17 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/user/hduser/benjamin-gutenberg/benjamin-20417.txt:0+674570 13/12/16 19:00:17 INFO mapred.MapTask: numReduceTasks: 1 13/12/16 19:00:17 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 13/12/16 19:00:17 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 13/12/16 19:00:17 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 13/12/16 19:00:17 INFO mapred.MapTask: soft limit at 83886080 13/12/16 19:00:17 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 13/12/16 19:00:17 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 13/12/16 19:00:17 INFO streaming.PipeMapRed: PipeMapRed exec [/usr/local/hadoop/./mapper.py] 13/12/16 19:00:17 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:17 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:17 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:17 INFO streaming.PipeMapRed: R/W/S=1000/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:17 INFO streaming.PipeMapRed: Records R/W=2788/1 13/12/16 19:00:17 INFO streaming.PipeMapRed: R/W/S=10000/50512/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:17 INFO streaming.PipeMapRed: MRErrorThread done 13/12/16 19:00:17 INFO streaming.PipeMapRed: mapRedFinished 13/12/16 19:00:17 INFO mapred.LocalJobRunner: 13/12/16 19:00:17 INFO mapred.MapTask: Starting flush of map output 13/12/16 19:00:17 INFO mapred.MapTask: Spilling map output 13/12/16 19:00:17 INFO mapred.MapTask: bufstart = 0; bufend = 866859; bufvoid = 104857600 13/12/16 19:00:17 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 25775024(103100096); length = 439373/6553600 13/12/16 19:00:17 INFO mapreduce.Job: map 67% reduce 0% 13/12/16 19:00:17 INFO mapred.MapTask: Finished spill 0 13/12/16 19:00:17 INFO mapred.Task: Task:attempt_local1026659806_0001_m_000002_0 is done. And is in the process of committing 13/12/16 19:00:17 INFO mapred.LocalJobRunner: Records R/W=2788/1 13/12/16 19:00:17 INFO mapred.Task: Task 'attempt_local1026659806_0001_m_000002_0' done. 13/12/16 19:00:17 INFO mapred.LocalJobRunner: Finishing task: attempt_local1026659806_0001_m_000002_0 13/12/16 19:00:17 INFO mapred.LocalJobRunner: Map task executor complete. 13/12/16 19:00:17 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 13/12/16 19:00:17 INFO mapred.Merger: Merging 3 sorted segments 13/12/16 19:00:17 INFO mapred.Merger: Down to the last merge-pass, with 3 segments left of total size: 6076082 bytes 13/12/16 19:00:17 INFO mapred.LocalJobRunner: 13/12/16 19:00:17 INFO streaming.PipeMapRed: PipeMapRed exec [/usr/local/hadoop/./reducer.py] 13/12/16 19:00:17 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 13/12/16 19:00:17 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 13/12/16 19:00:17 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:17 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:17 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:17 INFO streaming.PipeMapRed: R/W/S=1000/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:18 INFO streaming.PipeMapRed: R/W/S=10000/0/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:18 INFO streaming.PipeMapRed: Records R/W=17415/1 13/12/16 19:00:18 INFO streaming.PipeMapRed: R/W/S=100000/25634/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:18 INFO streaming.PipeMapRed: R/W/S=200000/35561/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:18 INFO mapreduce.Job: map 100% reduce 0% 13/12/16 19:00:18 INFO streaming.PipeMapRed: R/W/S=300000/50992/0 in:NA [rec/s] out:NA [rec/s] 13/12/16 19:00:19 INFO streaming.PipeMapRed: R/W/S=400000/61049/0 in:400000=400000/1 [rec/s] out:61049=61049/1 [rec/s] 13/12/16 19:00:19 INFO streaming.PipeMapRed: R/W/S=500000/74044/0 in:500000=500000/1 [rec/s] out:74044=74044/1 [rec/s] 13/12/16 19:00:19 INFO streaming.PipeMapRed: R/W/S=600000/79582/0 in:600000=600000/1 [rec/s] out:79582=79582/1 [rec/s] 13/12/16 19:00:19 INFO streaming.PipeMapRed: MRErrorThread done 13/12/16 19:00:19 INFO streaming.PipeMapRed: mapRedFinished 13/12/16 19:00:19 INFO mapred.Task: Task:attempt_local1026659806_0001_r_000000_0 is done. And is in the process of committing 13/12/16 19:00:19 INFO mapred.LocalJobRunner: 13/12/16 19:00:19 INFO mapred.Task: Task attempt_local1026659806_0001_r_000000_0 is allowed to commit now 13/12/16 19:00:19 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1026659806_0001_r_000000_0' to hdfs://localhost:54310/user/hduser/benjamin-gutenberg-output/_temporary/0/task_local1026659806_0001_r_000000 13/12/16 19:00:19 INFO mapred.LocalJobRunner: Records R/W=17415/1 > reduce 13/12/16 19:00:19 INFO mapred.Task: Task 'attempt_local1026659806_0001_r_000000_0' done. 13/12/16 19:00:20 INFO mapreduce.Job: map 100% reduce 100% 13/12/16 19:00:20 INFO mapreduce.Job: Job job_local1026659806_0001 completed successfully 13/12/16 19:00:20 INFO mapreduce.Job: Counters: 32 File System Counters FILE: Number of bytes read=6090440 FILE: Number of bytes written=20530549 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=11913149 HDFS: Number of bytes written=880838 HDFS: Number of read operations=57 HDFS: Number of large read operations=0 HDFS: Number of write operations=6 Map-Reduce Framework Map input records=77931 Map output records=629172 Map output bytes=4817757 Map output materialized bytes=6076119 Input split bytes=370 Combine input records=0 Combine output records=0 Reduce input groups=82335 Reduce shuffle bytes=0 Reduce input records=629172 Reduce output records=82335 Spilled Records=1258344 Shuffled Maps =0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=128 CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 Total committed heap usage (bytes)=805806080 File Input Format Counters Bytes Read=3671523 File Output Format Counters Bytes Written=880838 13/12/16 19:00:20 INFO streaming.StreamJob: Output directory: /user/hduser/benjamin-gutenberg-output hduser@benjamin-VirtualBox:/usr/local/hadoop$ hduser@benjamin-VirtualBox:/usr/local/hadoop$ hduser@benjamin-VirtualBox:/usr/local/hadoop$ hduser@benjamin-VirtualBox:/usr/local/hadoop$