PySpark - SparkContext: Error initializing SparkContext File does not exist
PySpark - SparkContext: Error initializing SparkContext File does not exist
I have small piece code in PySpark, but I keep getting errors. I'm new to this so im not sure where to start.
from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName("Open json").setMaster("local[3]")
sc = SparkContext(conf = conf)
print("Done")
I ran this in cmd with the command :
spark-submit .PySparkOpen.py
I then get the following error statement:
C:UsersAbdullahDocumentsMaster Thesis>spark-submit
.PySparkOpen.py
18/06/30 15:21:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-Java classes where applicable
18/06/30 15:22:01 ERROR SparkContext: Error initializing SparkContext. java.io.FileNotFoundException: File file:/C:/Users/Abdullah/Documents/Master%20Thesis/PySpark/Open.py does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1529)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1499)
at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:461)
at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:461)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:461)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Unknown Source) Traceback (most recent call last): File "C:/Users/Abdullah/Documents/Master Thesis/./PySpark/Open.py", line 12, i n <module>
sc = SparkContext(conf = conf) File "C:apache-sparkspark-2.2.0-bin-hadoop2.7pythonlibpyspark.zippyspark
context.py", line 118, in __init__ File
"C:apache-sparkspark-2.2.0-bin-hadoop2.7pythonlibpyspark.zippyspark
context.py", line 180, in _do_init File
"C:apache-sparkspark-2.2.0-bin-hadoop2.7pythonlibpyspark.zippyspark
context.py", line 282, in _initialize_context File
"C:apache-sparkspark-2.2.0-bin-hadoop2.7pythonlibpy4j-0.10.7-src.zip
py4jjava_gateway.py", line 1525, in __call__ File
"C:apache-sparkspark-2.2.0-bin-hadoop2.7pythonlibpy4j-0.10.7-src.zip
py4jprotocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
None.org.apache.spa rk.api.java.JavaSparkContext. :
java.io.FileNotFoundException: File
file:/C:/Users/Abdullah/Documents/Master%2 0Thesis/PySpark/Open.py
does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLo
calFileSystem.java:611)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1529)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1499)
at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:461)
at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:461)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:461)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Unknown Source)
PySpark/Open.py does not exist
Can you show the output of running just pyspark command?
– cricket_007
Jun 30 at 13:41
The last part in the yellow box is the output of the pyspark command.
– TheNinjaKing
Jun 30 at 13:54
Trying to reformat your output... Anyways, all that says is it cannot find your file. Please show in your question the full path to it
– cricket_007
Jun 30 at 14:02
OMG!! Thats it! It totally fixed it. THANK YOU SO MUCH !!!!! I was working on this for two days. One. Stupid. Space. I can't belief this.
– TheNinjaKing
Jun 30 at 15:27
1 Answer
1
As per your logs you are trying to run Apache Spark on window machine.
You need to add win util and add path in env variable
Download the executable winutils from the Hortonworks repository, or from Amazon AWS platform or github winutils.
Create a directory where you place the executable winutils.exe. For example, C:SparkDevx64. Add the environment variable %HADOOP_HOME% which points to this directory, then add %HADOOP_HOME%bin to PATH.
Thanks for answering, but I already said the enviroment variables up in the way you described. I also put the winutils file in the right directory, but I keep getting the same error. Should I maybe delete everything and install spark and hadoop from scratch?
– TheNinjaKing
Jun 30 at 14:50
Check your path,go to cmd and type "path" , is winutil showing on path? , this is common error for windows user
– vaquar khan
Jun 30 at 14:52
I have the directory C:hadoopbin and inside the bin folder I have the winutils.exe file. Is this correct?
– TheNinjaKing
Jun 30 at 15:17
Yes ,path is correct check env variable hadoop home and path,also check winutil not corrupted
– vaquar khan
Jun 30 at 15:26
Nevermind. I had a space in the name of my directory that totally messed thing up. Thanks anyways!
– TheNinjaKing
Jun 30 at 15:27
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
PySpark/Open.py does not exist
...– cricket_007
Jun 30 at 13:36