file not found while running pyspark program - python

im new to pyspark and i want to lunch a pyspark program in standalone cluster, i followed the steps on this tutorial and i lunched my program using this command:
bin\spark-submit examples\src\main\python\LSI_MapReduce\LSI.py
here is the part of my code where the error is happening:
# load the dataset
rows = np.loadtxt('first.txt') <----- here
rows = sc.parallelize(rows)
mat = RowMatrix(rows)
# compute SVD
svd = mat.computeSVD(20, computeU=True)
the first steps of my code did run fine and i got this error:
in the line 200 FileNotFoundError: first.txt not found.
LSI_MapReducefolder has a file named first.txt at the same place as LSI.py
when i run my program on VScode it works perfectly.
how can i fix this error ?
i highely would appriciate any help .

Python, via Numpy, (not Spark) is trying to read the file from where you run your Python interpreter,
The word count example in the link reads the README.md file next to the bin folder, so if that's where you start the command, that's where your file needs to be. Otherwise, cd down into the example folder where your file exists
Also, Spark can read text files or csv files itself; so you shouldn't need numpy to do that

Related

how does jenkins recognize paths inside a python script?

I have a python script that points to some file names and log files and I have jenkins to run the script, when run locally from my system the code works fine.
The way I access my folders in python:
folder_artifacts_data = 'C:/Users/Rhea/OneDrive -Area/Rhea/Metrics_Configuration/Artifacts/'
path_to_log_file ='C:/Users/Rhea/OneDrive -Area/Rhea/Metrics_Configuration/Logfiles/Logfile.log'
but when I try to run the same using jenkins, I get the following error:
No such file or directory:
/opt/jenkins/workspace/confluencetest_scheduled/C:/Users/Rhea/OneDrive -Area/Rhea/Metrics_Configuration/Logfiles/Logfile.log
Now, I tried different file paths and used r-strings
folder_artifacts_data = r'C:/Users/Rhea/OneDrive -Area/Rhea/Metrics_Configuration/Artifacts/'
path_to_log_file =r'C:/Users/Rhea/OneDrive -Area/Rhea/Metrics_Configuration/Logfiles/Logfile.log'
I see that jenkins has accepted the log file, because I see the logs written, but the moment it reaches folder_artifacts_data it throws the error that the file path do not exist.
Could someone help?
Update
Now I have added relative paths, like:
folder_artifacts_data0 = 'C:/Users/Rhea/OneDrive -Area/Rhea/Metrics_Configuration/Artifacts/'
folder_artifacts_data = os.path.relpath(folder_artifacts_data0)
path_to_log_file0 ='C:/Users/Rhea/OneDrive -Area/Rhea/Metrics_Configuration/Logfiles/Logfile.log'
path_to_log_file = os.path.relpath(path_to_log_file0)
that outputs paths like:
..............\Rhea\OneDrive -Area\Rhea\Metrics_Configuration\Artifacts
and
..............\Rhea\OneDrive -Area\Rhea\Metrics_Configuration\Logfiles\Logfile.log
this works well in my local, again I get
No such file or directory:
/opt/jenkins/workspace/confluencetest_scheduled/C:/Users/Rhea/OneDrive -Area/Rhea/Metrics_Configuration/Logfiles/Logfile.log while running from jenkins.

Bash Bad Substitution Error: Trying to Execute Python Script

I'm following the tutorial here to convert MacJournal journal entries to DayOne journal entries: https://basilsalad.com/how-to/migrate-mac-journal-day-one/
And I am having issues on the very last part.
I moved my MacJournal exported text file to the same directory as MacJournalToDayOne.py so in terminal, when I run the command "ls" it lists the python file and the MacJournal file which is "dreams.txt" in the same directory.
Everything works fine up until this part here:
./MacJournalToDayOne.py "${export-file}" --journal="${journal-name}"
When I add in the file names and the journal I want to import into, I run it as:
./MacJournalToDayOne.py "${dreams.txt}" --journal="${MJImport}"
Which gives the error: -bash: ${dreams.txt}: bad substitution
What am I doing wrong here?

FileNotFound Error in python when running .py file from visual basic

I was trying to make a assistant which can perform simple task like shutting down the computer etc. For this i chose python and visual basic..... visual basic for displaying(frontend applicaton) and python for performing tasks(backend application). So i create a py file named main.py and created a folder named query and in it created a file named query.jarvis which can simply be opened as a text file. The vb(visual basic) program just write text into query.jarvis and then run the main.py file. When I run it manually by double clicking the main.py file it works fine(like in query was "shutdown" and after running main.py file by double clicking my computer shutdown) but when I try to run it from vb it shows the error file not found query\query.jarvis . I even tried to convert py file to exe by pyinstaller but it again showed the same error but only when I run it from vb.
*main.py()
def check(q):
#here was performing task according to query
f=open("query\query.jarvis")
#here the error occured
x=f.readlines()
d=x[0]
d=d.strip()
q=d.lower()
check(q)*
*vb.net
objWriter123.Close()
Dim objWriter As New System.IO.StreamWriter(moddir + "query\query.jarvis")
#here moddir is the directory of main.py file
objWriter.Write(UserQuery.Text)
objWriter.Close()
UserQuery.Text = ""
Process.Start(moddir + "main.py", AppWinStyle.MinimizedNoFocus)*
process.start sometimes has weird outcomes. i usually manage to fix it by adding explorer.exe into the mix.
Process.Start("explorer.exe", moddir & "\main.py")
also note this extra backslash, that you might have missed. in \main.py
note
in vb you concat strings with the & symbol instead of the + symbol.

ValueError: need more than 0 values to unpack (Python 2)

I am trying to replicate another researcher's findings by using the Python file that he added as a supplement to his paper. It is the first time I am diving into Python, so the error might be extremely simple to fix, yet after two days I haven't still. For context, in the Readme file there's the following instruction:
"To run the script, make sure Python2 is installed. Put all files into one folder designated as “cf_dir”.
In the script I get an error at the following lines:
if __name__ == '__main__':
cf_dir, cf_file, cf_phys_file = sys.argv[1:4]
os.chdir(cf_dir)
cf = pd.read_csv(cf_file)
cf_phys = pd.read_csv(cf_phys_file)
ValueError: need more than 0 values to unpack
The "cf_file" and "cf_phys_file" are two major components of all files that are in the one folder named "cf_dir". The "cf_phys_file" relates only to two survey question's (Q22 and Q23), and the "cf_file" includes all other questions 1-21. Now it seems that the code is meant to retrieve those two files from the directory? Only for the "cf_phys_file" the columns 1:4 are needed. The current working directory is already set at the right location.
The path where I located "cf_dir" is as follows:
C:\Users\Marc-Marijn Ossel\Documents\RSM\Thesis\Data\Suitable for ML\Data en Artikelen\Per task Suitability for Machine Learning score readme\cf_dir
Alternative option in readme file,
In the readme file there's this option, but also here I cannot understand how to direct the path to the right location:
"Run the following command in an open terminal (substituting for file names
below): python cfProcessor_AEAPnP.py cf_dir cf_file cf_phys_file task_file jobTaskRatingFile
jobDataFile OESfile
This should generate the data and plots as necessary."
When I run that in "Command Prompt", I get the following error, and I am not sure how to set the working directory correctly.
- python: can't open file 'cfProcessor_AEAPnP.py': [Errno 2] No such file or directory
Thanks for the reading, and I hope there's someone who could help me!
Best regards & stay safe out there during Corona!!
Marc
cf_dir, cf_file, cf_phys_file = sys.argv[1:4]
means, the python file expects few arguments when called.
In order to run
python cfProcessor_AEAPnP.py cf_dir cf_file cf_phys_file task_file jobTaskRatingFile jobDataFile OESfile
the command prompt should be in that folder.
So, open command prompt and type
cd path_to_the_folder_where_ur_python_file_is_located
Now, you would have reached the path of the python file.
Also, make sure you give full path in double quotes for the arguments.

Error using numpy.loadtxt

I ran across this problem when trying to run code found in this answer to a question about loading Salome from a Python script (Salome is a 3D modeling program). The part of the code relevant to my problem was in creating and re-opening a .txt file. When attempting to open the file, I was getting an error that said there was no such file/directory as that file.
Then I tried just using savetxt() for just some random numpy array (with the directory being my desktop, acheived using os.chdir()), and no file was saved to my desktop, as far as I could tell. Then, to test if the file had been created somewhere without me noticing, I tried using loadtxt() to find it, and I got the same error saying there was no file or directory named MyFile.txt.
Here's my code:
import os
import numpy as np
os.chdir('C:\\Users\\Brahm\\Desktop')
np.savetxt('stuff',npa([7,8]))
np.loadtxt('stuff.txt')
I also tried without quotation marks around stuff in the savetxt line
Is this a bug, or am I doing something incorrectly?
In your program, you are saving your array using -
np.savetxt('stuff',npa([7,8]))
The file name is 'stuff' , not 'stuff.txt' (Please note the difference). Then you are trying to load - np.loadtxt('stuff.txt') . This will not work, because you created file as - stuff , not stuff.txt .
Either store to stuff.txt using -
np.savetxt('stuff.txt',npa([7,8]))
Or load from stuff -
np.loadtxt('stuff')

Categories