I am trying to modify my file after fetching it from HDFS using pyspark and then i want to save it in HDFS for that i have written below code.
Code:
import subprocess
from subprocess import Popen, PIPE
from pyspark import SparkContext
cat = sc.textFile("/user/root/parsed.txt")
hrk = "#"
for line in cat.collect():
if (code == "ID"):
line =line.strip() + "|"+hrk
line.saveAsTextFile("/user/root/testsprk")
print(line)
But when i run the code i am getting below error.
Error:
Traceback (most recent call last):
File "<stdin>", line 30, in <module>
AttributeError: 'unicode' object has no attribute 'saveAsTextFile'
I know there is some issue with my line variable but i am not able to fix it.
It because you are collecting all data, it means that collection is not RDD, but normal list and line is just one string.
You shouldn't collect all data on driver. Instead, use RDD.map and then RDD.saveAsTextFile
def add_hrk_on_id(line):
if (code == "ID"):
return line.strip() + "|"+hrk
else
return line
cat.map(add_hrk_on_id).saveAsTextFile(path)
Related
While the following code snippet of a python program generates a nice xml file:
from proc import create_SRXML
create_SRXML.create_xml()
This following does not generate the xml file (the line "import pumoni.visu.renders as visua" is spoiling the job)
from proc import create_SRXML
import pumoni.visu.renders as visua
create_SRXML.create_xml()
and getting error log as follows:
Traceback (most recent call last):
File "processor/analyze_all.py", line 26, in <module>
create_SRXML.create_xml()
File "/datas/repo/work/pul/process/create_SRXML.py", line 13, in
create_xml
b1 = pulvii.SubElement(m1, "element",len="4",
name="FileMetaInformationGroupLength", tag="0002,0000", vm="1",
vr="UL")
TypeError: SubElement() got multiple values for argument 'tag'
May I know what the problem here is?
Unable to import CsvFileSource but able to install beam_utils.
I need this import to run the Cloud data flow program.
Code has -
from beam_utils.sources import CsvFileSource
Error message :
>>> from beam_utils.sources import CsvFileSource
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/vk/.local/lib/python2.7/site-packages/beam_utils/sources.py", line 14, in <module>
class JsonLinesFileSource(beam.io.filebasedsource.FileBasedSource):
File "/home/vk/.local/lib/python2.7/site-packages/beam_utils/sources.py", line 17, in JsonLinesFileSource
compression_type=fileio.CompressionTypes.AUTO,
AttributeError: 'module' object has no attribute 'CompressionTypes'
>>>
I even tried import using python3.
any idea, how I can bypass.
Apache beam has recently updated and changed a few of their methods and attributes.
In particular, the fileio class is now filesystem. If you want to quickfix this, you can edit beam_utils source (python_home\lib\site-packages\beam_utils\sources.py) and replace 'fileio' for 'filesystem'. It should work ;)
If you take a look at the github repo (https://github.com/pabloem/beam_utils/blob/master/beam_utils/sources.py), the changes are already there. I guess it's a matter of time until they're added to pip!
I'm also getting the same error. I'm attemping to load csv into a dictionary and then write it to local (eventually to BQ).
argv = [
'--project={0}'.format(PROJECT),
'--staging_location=gs://{0}/'.format(BUCKET),
'--temp_location=gs://{0}/'.format(BUCKET),
#'--runner=DataflowRunner'
'--runner=DirectRunner'
]
p= beam.Pipeline(argv=argv)
rows = p | 'ReadCSV' >> beam.io.Read(CsvFileSource('gs://blahblah/file.csv')) | 'Write to file' >> beam.io.WriteToText('s
trings', file_name_suffix='.txt')```
-[snip]-
Apache Beam will soon support Python 3 only.
'You are using Apache Beam with Python 2. '
Traceback (most recent call last):
File "avg-ecom-rating.py", line 5, in <module>
from beam_utils.sources import CsvFileSource
File "/home/dlemon/env/local/lib/python2.7/site-packages/beam_utils/sources.py", line 14, in <module>
class JsonLinesFileSource(beam.io.filebasedsource.FileBasedSource):
File "/home/dlemon/env/local/lib/python2.7/site-packages/beam_utils/sources.py", line 17, in JsonLinesFileSource
compression_type=fileio.CompressionTypes.AUTO,
AttributeError: 'module' object has no attribute 'CompressionTypes'
When I run the following code:
import shelve
input = open("input.txt",)
shelveFile = shelve.open("myData")
shelveFile["inputFile"] = input
input.close()
shelveFile.close()
I expect the shelve file myData to hold the file object input. Instead, running the code produces the following error:
Traceback (most recent call last):
File "/Users/ashutoshmishra/Documents/Sandbox/Sandbox3.py", line 5, in <module>
shelveFile["inputFile"] = input
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/shelve.py", line 124, in __setitem__
p.dump(value)
TypeError: cannot serialize '_io.TextIOWrapper' object
I was wondering why I could not save the file object input to the shelve file myData?
The following answer is taken from #DanD.'s comment above.
Read the file: shelveFile["inputFile"] = input.read()
Trying to setup client Python script in my Windows system and finally i got stuck up with the below error, Tried in the Google and myself of about 4 hrs. But not able find out the solution. Since i am very new to the Python, I could not able to find out the solution.
Please have a look at below code and its error, So you may have solution for me,
Error:
C:\Python26>python C:\xampp\htdocs\cequel-dev\mbtools\main_inject.py
Traceback (most recent call last):
File "C:\xampp\htdocs\cequel-dev\mbtools\main_inject.py", line 12, in <module>
import injectdir
File "C:\xampp\htdocs\cequel-dev\mbtools\injectdir\__init__.py", line 10, in <module>
import action
File "C:\xampp\htdocs\cequel-dev\mbtools\injectdir\action\__init__.py", line 1, in <module>
from command import list
File "C:\xampp\htdocs\cequel-dev\mbtools\injectdir\action\command.py", line 28, in <module>
action_list[action.name]=action
AttributeError: 'module' object has no attribute 'name'
Code: (Line no: 19 to 28)
try:
action_list={}
for file in filenames:
if file.endswith('.py') and file != '__init__.py' and file != 'command.py':
#Import the file as a module action_imp
exec "import {0} as action_imp".format(file[0:-3])
#Get the action object from action_imp. Name is a required method for all actions
action=action_imp
#Put the file name in file_name
action.file_name=file[0:-3]
action_list[action.name]=action
except:
This seems like a array attribute error. So i have tried with if condition, But no luck so far.
So i have stuck with the last line ( "action_list[action.name]=action" ). Please let me know if you have any suggestions or any quick solution to suppress the error in the for loop.
Thanks.
As the comment after the exec statement says, every action has to have a name.
Ensure that every file listed in filenames (except __init__.py and command.py) contains a variable name.
Alternatively, you can suppress the error by replacing line 28 with:
try:
action_list[action.name]=action
except AttributeError:
print "Could not register action", action.file_name
My python script reads JSON information from a website, stores it in a file for processing, and should clean it in the end.
This was working without issues in other scripts, but for some reason, os.remove fails to delete the file in the end:
import urllib2, json
import os, sys, argparse
ref_list_tmpfile = '/tmp/reference.%s.txt' % os.getpid()
ref_list_response=urllib2.urlopen('http://localhost:11111/api/reference').read()
with open(ref_list_tmpfile,'w') as outfile:
outfile.write(ref_list_response)
ref_list_data=open(ref_list_tmpfile)
reference_list = json.load(ref_list_data)
ref_list_data.close()
.
.
.
.
os.remove(ref_list_tmpfile)
The main logic works well, but the error i'm getting refers to the last command (os.remove) and the file is not deleted:
Traceback (most recent call last):
File "./vm_creator.py", line 58, in <module>
os.remove(ref_list_tmpfile)
AttributeError: 'unicode' object has no attribute 'remove'
Any ideas?
You've redefined os to be a string, somewhere in the code you've snipped.