如果数据太大直接用dataframe转list内存会不够,所以可以通过foreachPartition遍历读取
System.setProperty("hadoop.home.dir","h:\\hadoop2.3.7");string mastor="local"string name="wordcount"+system.currentTimeMillis()sparkSeesion spark=sparkSeesion.builder().appName(neme).master(mastor).getOrCreate();Datadataset=spark.read().json("src/j.json") Dataset
jsons=dataset.toJSON(); JavaRDD rdd=json.javaRDD(); rdd.foreachPartition(new VoidFunction >() { @Override public void call(Iterator iter) throws Exception { while(iter.hasNext()) { String next=iter.next(); System.out.println("获取"+next); } } });
参考https://blog.csdn.net/wyqwilliam/article/details/81142324