I have parsed a web page and written all the links into a csv file; when I try to read these links from the csv I am getting this:
[['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t8\t5\t8\t7\t7\t8\t0\t1\t1'],
['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tP\tr\ti\tm\te\t-\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t6\t7\t6\t8\t8\t2\t0\t1\t1']]
\t is coming after each and every letter ,i have tried this to remove \t from the result but no luck
Here is my code
out=open("categories.csv","rb")
data=csv.reader(out)
new_data=[[row[1]] for row in data]
new_data = new_data.strip('\t\n\r')
print new_data
This is giving an error
AttributeError: 'list' object has no attribute 'strip'
You can use the re.sub function for easy substition in strings:
import re
string = "[['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tI\tn\ts \tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t8\t5\t8\t7\t7\t8\t0\t1\t1'], ['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tP\tr\ti\tm\te\t-\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t6\t7\t6\t8\t8\t2\t0\t1\t1']]"
new_string = re.sub(r'\t', '', string)
print new_string
======= OUTPUT:
[['http://www.amazon.com/Instant-Video/b?ie=UTF8&node=2858778011'], ['http://www.amazon.com/Prime-Instant-Video/b?ie=UTF8&node=2676882011']]
Notice that the strip method only remove blank characters from both end of string.
Try the below approach:
out=open("categories.csv","rb")
data=csv.reader(out)
new_data=[[''.join(row[0].strip().split('\t'))] for row in data]
print new_data
as a kludge solution:
x = [['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t8\t5\t8\t7\t7\t8\t0\t1\t1'],['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tP\tr\ti\tm\te\t-\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t6\t7\t6\t8\t8\t2\t0\t1\t1']]
for y in x:
for z in y:
print("".join(z.split('\t')))
Returns:
> http://www.amazon.com/Instant-Video/b?ie=UTF8&node=2858778011
> http://www.amazon.com/Prime-Instant-Video/b?ie=UTF8&node=2676882011
you need to index to the string then do a simple replace
string = [[...],[...]...]
lst = []
for ylst in string:
for ln in ylst:
lst.append(ln.replace('\t',''))
lst will contain each line without the '\t's
Related
I have some data
data = ['vancouver', 'paris', 'montreal']
string = 'montrealvancouverparis'
I am checking only those strings in the data found in a string only. I am intending to find the words of data one by one in the string variable. So, that in the end, it will give me the empty string meaning all words of data found in the string variable.
So far, I have tried this but no luck
found_words = []
for i in range(len(data)):
if data[i] in string:
found_words.append(string.replace(data[i], ''))
found_words
Output: ['montrealparis', 'montrealvancouver', 'vancouverparis']
Desired Output: string = ''
Thanks
Your variables are a mess – What is var ? Looks like it should be data[i] instead of 'var[i]'. And data_new is not defined either. This should work:
data = ['vancouver', 'paris', 'montreal']
string = 'montrealvancouverparis'
buffer = []
for i in range(len(data)):
if data[i] in string:
buffer.append(string.replace(data[i], ''))
print(buffer)
['montrealparis', 'montrealvancouver', 'vancouverparis']
This give you the output you stated. If you instead want to end up with the empty string, you need to update the string you are searching each time to remove the data that was found previously:
for i in range(len(data)):
if data[i] in string:
string = string.replace(data[i], '')
print(f"Final String: '{string}'")
Final String: ''
IIUC, you can just use list comprehension:
>>> [word for word in data if word in string]
['vancouver', 'paris', 'montreal']
data = ['vancouver', 'paris', 'montreal']
string = 'montrealvancouverparis'
found_words = []
for i in range(len(data)):
if data[i] in string:
found_words.append(data[i])
found_words
I need to write a list to a text file named accounts.txt in the following format:
kieranc,conyers,asdsd,pop
ethand,day,sadads,dubstep
However, it ends up like the following with brackets:
['kieranc', 'conyers', 'asdsd', 'pop\n']['ethand', 'day', 'sadads', 'dubstep']
Here is my code (accreplace is a list):
accreplace = [['kieranc', 'conyers', 'asdsd', 'pop\n'],['ethand', 'day', 'sadads', 'dubstep']]
acc = open("accounts.txt", "w")
for x in accreplace:
acc.write(str(x))
Since each element in accreplace is a list, str(x) doesn't help. It just adds quotes around it. To print the list in proper format use the code below:
for x in accreplace:
acc.write(",".join([str(l) for l in x]))
This will convert the list items into a string.
I have .txt file which looks like:
[ -5.44339373e+00 -2.77404404e-01 1.26122094e-01 9.83589873e-01
1.95201179e-01 -4.49866890e-01 -2.06423297e-01 1.04780491e+00]
[ 4.34562117e-01 -1.04469577e-01 2.83633101e-01 1.00452355e-01 -7.12572469e-01 -4.99234705e-01 -1.93152897e-01 1.80787567e-02]
I need to extract all floats from it and put them to list/array
What I've done is this:
A = []
for line in open("general.txt", "r").read().split(" "):
for unit in line.split("]", 3):
A.append(list(map(lambda x: str(x), unit.replace("[", "").replace("]", "").split(" "))))
but A contains elements like [''] or even worse ['3.20973096e-02\n']. These are all strings, but I need floats. How to do that?
Why not use a regular expression?
>>> import re
>>> e = r'(\d+\.\d+e?(?:\+|-)\d{2}?)'
>>> results = re.findall(e, your_string)
['5.44339373e+00',
'2.77404404e-01',
'1.26122094e-01',
'9.83589873e-01',
'1.95201179e-01',
'4.49866890e-01',
'2.06423297e-01',
'1.04780491e+00',
'4.34562117e-01',
'1.04469577e-01',
'2.83633101e-01',
'1.00452355e-01',
'7.12572469e-01',
'4.99234705e-01',
'1.93152897e-01',
'1.80787567e-02']
Now, these are the matched strings, but you can easily convert them to floats:
>>> map(float, re.findall(e, your_string))
[5.44339373,
0.277404404,
0.126122094,
0.983589873,
0.195201179,
0.44986689,
0.206423297,
1.04780491,
0.434562117,
0.104469577,
0.283633101,
0.100452355,
0.712572469,
0.499234705,
0.193152897,
0.0180787567]
Note, the regular expression might need some tweaking, but its a good start.
As a more precise way you can use regex for split the lines :
>>> s="""[ -5.44339373e+00 -2.77404404e-01 1.26122094e-01 9.83589873e-01
... 1.95201179e-01 -4.49866890e-01 -2.06423297e-01 1.04780491e+00]
... [ 4.34562117e-01 -1.04469577e-01 2.83633101e-01 1.00452355e-01 -7.12572469e-01 -4.99234705e-01 -1.93152897e-01 1.80787567e-02] """
>>> print re.split(r'[\s\[\]]+',s)
['', '-5.44339373e+00', '-2.77404404e-01', '1.26122094e-01', '9.83589873e-01', '1.95201179e-01', '-4.49866890e-01', '-2.06423297e-01', '1.04780491e+00', '4.34562117e-01', '-1.04469577e-01', '2.83633101e-01', '1.00452355e-01', '-7.12572469e-01', '-4.99234705e-01', '-1.93152897e-01', '1.80787567e-02', '']
And in this case that you have the data in file you can do :
import re
print re.split(r'[\s\[\]]+',open("general.txt", "r").read())
If you want to get ride of the empty strings in leading and trailing you can just use a list comprehension :
>>> print [i for i in re.split(r'[\s\[\]]*',s) if i]
['-5.44339373e+00', '-2.77404404e-01', '1.26122094e-01', '9.83589873e-01', '1.95201179e-01', '-4.49866890e-01', '-2.06423297e-01', '1.04780491e+00', '4.34562117e-01', '-1.04469577e-01', '2.83633101e-01', '1.00452355e-01', '-7.12572469e-01', '-4.99234705e-01', '-1.93152897e-01', '1.80787567e-02']
let's slurp the file
content = open('data.txt').read()
split on ']'
logical_lines = content.split(']')
strip the '[' and the other stuff
logical_lines = [ll.lstrip(' \n[') for ll in logical_lines]
convert to floats
lol = [map(float,ll.split()) for ll in logical_lines]
Sticking it all in a one-liner
lol=[map(float,l.lstrip(' \n[').split()) for l in open('data.txt').read().split(']')]
I've tested it on the exemplar data we were given and it works...
I need to read a text file of stock quotes and do some processing with each stock data (i.e. a line in the file).
The stock data looks like this :
[class,'STOCK'],[symbol,'AAII'],[open,2.60],[high,2.70],[low,2.53],[close,2.60],[volume,458500],[date,'21-Dec-04'],[openClosePDiff,0.0],[highLowPDiff,0.067],[closeEqualsLow,'false'],[closeEqualsHigh,'false']
How do I split the line into tokens where each token is what is enclosed in the square braces, .i.e. for the above line, the tokens should be "class, 'STOCK'" , "symbol, 'AAII'" etc.
print(re.findall("\[(.*?)\]", inputline))
Or perhaps without regex:
print(inputline[1:-1].split("],["))
Try this code:
#!/usr/bin/env python3
import re
str="[class,'STOCK'],[symbol,'AAII'],[open,2.60],[high,2.70],[low,2.53],[close,2.60],[volume,458500],[date,'21-Dec-04'],[openClosePDiff,0.0],[highLowPDiff,0.067],[closeEqualsLow,'false'],[closeEqualsHigh,'false']"
str = re.sub('^\[','',str)
str = re.sub('\]$','',str)
array = str.split("],[")
for line in array:
print(line)
Start with:
re.findall("[^,]+,[^,]+", a)
This would give you a list of:
[class,'STOCK'], [symbol,'AAII'] and such, then you could cut the brackets.
If you want a functional one liner, use:
map(lambda x: x[1:-1], re.findall("[^,]+,[^,]+", a))
The first part splits every second ,, the map (for each item in the list, use the lambda function..) cuts the brackets.
import re
s = "[class,'STOCK'],[symbol,'AAII'],[open,2.60],[high,2.70],[low,2.53],[close,2.60],[volume,458500],[date,'21-Dec-04'],[openClosePDiff,0.0],[highLowPDiff,0.067],[closeEqualsLow,'false'],[closeEqualsHigh,'false']"
m = re.findall(r"([a-zA-Z0-9]+),([a-zA-Z0-9']+)", s)
d= { x[0]:x[1] for x in m }
print d
you can run the snippet here : http://liveworkspace.org/code/EZGav$35
I have a csv file thru which I want to parse the data to the lists.
So I am using the python csv module to read that
so basically the following:
import csv
fin = csv.reader(open(path,'rb'),delimiter=' ',quotechar='|')
print fin[0]
#gives the following
['"1239","2249.00","1","3","2011-02-20"']
#lets say i do the following
ele = str(fin[0])
ele = ele.strip().split(',')
print ele
#gives me following
['[\'"1239"', '"2249.00"', '"1"', '"3"', '"2011-02-20"\']']
now
ele[0] gives me --> output---> ['"1239"
How do I get rid of that ['
In the end, I want to do is get 1239 and convert it to integer.. ?
Any clues why this is happening
Thanks
Edit:*Never mind.. resolved thanks to the first comment *
Change your delimiter to ',' and you will get a list of those values from the csv reader.
It's because you are converting a list to a string, there is no need to do this. Grab the first element of the list (in this case it is a string) and parse that:
>>> a = ['"1239","2249.00","1","3","2011-02-20"']
>>> a
['"1239","2249.00","1","3","2011-02-20"']
>>> a[0]
'"1239","2249.00","1","3","2011-02-20"'
>>> b = a[0].replace('"', '').split(',')
>>> b[-1]
'2011-02-20'
of course before you do replace and split string methods you should check if the type is string or handle the exception if it isn't.
Also Blahdiblah is correct your delimiter is probably wrong.