Remove string quotes from array in Python - python

I'm trying to get rid of some characters in my array so I'm just left with the x and y coordinates, separated by a comma as follows:
[[316705.77017187304,790526.7469308273]
[321731.20991025254,790958.3493565321]]
I have used zip() to create a tuple of the x and y values (as pairs from a list of strings), which I've then converted to an array using numpy. The array currently looks like this:
[['316705.77017187304,' '790526.7469308273,']
['321731.20991025254,' '790958.3493565321,']]
I need the output to be an array.
I'm pretty stumped about how to get rid of the single quotes and the second comma. I have read that map() can change string to numeric but I can't get it to work.
Thanks in advance

Using 31.2. ast — Abstract Syntax Trees¶
import ast
xll = [['321731.20991025254,' '790958.3493565321,'], ['321731.20991025254,' '790958.3493565321,']]
>>> [ast.literal_eval(xl[0]) for xl in xll]
[(321731.20991025254, 790958.3493565321), (321731.20991025254, 790958.3493565321)]
Above gives list of tuples for list of list, type following:
>>> [list(ast.literal_eval(xl[0])) for xl in xll]
[[321731.20991025254, 790958.3493565321], [321731.20991025254, 790958.3493565321]]
OLD: I think this:
>>> sll
[['316705.770172', '790526.746931'], ['321731.20991', '790958.349357']]
>>> fll = [[float(i) for i in l] for l in sll]
>>> fll
[[316705.770172, 790526.746931], [321731.20991, 790958.349357]]
>>>
old Edit:
>>> xll = [['321731.20991025254,' '790958.3493565321,'], ['321731.20991025254,' '790958.3493565321,']]
>>> [[float(s) for s in xl[0].split(',') if s.strip() != ''] for xl in xll]
[[321731.20991025254, 790958.3493565321], [321731.20991025254, 790958.3493565321]]

Related

List of strings to create a new list of strings without quotes?

Data
crop_list = ['Cotton','Ragi', 'Groundnut', 'Sugarcane', 'Redgram', 'Sunflower', 'Paddy', 'Maize','Jowar']
Now each element is DataFrame
for a in crop_list:
vars()[a] = Data[Data['Crop']== a]
For next line of codes i might need to create a list manually, i.e. dfs
from functools import reduce
dfs =[Cotton,Ragi,Groundnut,Sugarcane,Redgram,Sunflower,Paddy,Maize,Jowar]
df_merged = reduce(lambda a,b: pd.merge(a,b, on='Year'), dfs)
so im asking is there any way to get dunamic list:
Expected output:
Another List with same strings without quotes:
new_crop_list = [Cotton,Ragi, Groundnut, Sugarcane, Redgram, Sunflower, Paddy,Maize,Jowar]
I think this is basically what you meant
crop_list = ["'Cotton'","'Ragi'", "'Groundnut'", "'Sugarcane'", "'Redgram'", "'Sunflower'", "'Paddy'", "'Maize'","'Jowar'"]
Since without quotes, a string is not a string, if this is the case. You can remove the single quotes from the list using the following code
new_list = [ x.replace("'","") for x in crop_list]
The above code will remove single quotes from around the values in the list.
The output will look like
['Cotton', 'Ragi', 'Groundnut', 'Sugarcane', 'Redgram', 'Sunflower', 'Paddy', 'Maize', 'Jowar']
You will still see single quotes in output, since its a list of strings, and the quotes denotes that .
Hope this answers your question

How to turn a list containing strings into a list containing integers (Python)

I am optimizing PyRay (https://github.com/oscr/PyRay) to be a usable Python ray-casting engine, and I am working on a feature that takes a text file and turns it into a list (PyRay uses as a map). But when I use the file as a list, it turns the contents into strings, therefore not usable by PyRay. So my question is: How do I convert a list of strings into integers? Here is my code so far. (I commented the actual code so I can test this)
print("What map file to open?")
mapopen = input(">")
mapload = open(mapopen, "r")
worldMap = [line.split(',') for line in mapload.readlines()]
print(worldMap)
The map file:
1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,
2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,
1,0,2,0,0,3,0,0,0,0,0,0,0,2,3,2,3,0,0,2,
2,0,3,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,1,
1,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,2,
2,3,1,0,0,2,0,0,0,2,3,2,0,0,0,0,0,0,0,1,
1,0,0,0,0,0,0,0,0,1,0,1,0,0,1,2,0,0,0,2,
2,0,0,0,0,0,0,0,0,2,0,2,0,0,2,1,0,0,0,1,
1,0,0,0,0,0,0,0,0,1,3,1,0,0,0,0,0,0,0,2,
2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,
1,0,2,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,2,
2,0,3,0,0,2,0,0,0,0,0,0,0,2,3,2,1,2,0,1,
1,0,0,0,0,3,0,0,0,0,0,0,0,1,0,0,2,0,0,2,
2,3,1,0,0,2,0,0,2,1,3,2,0,2,0,0,3,0,3,1,
1,0,0,0,0,0,0,0,0,3,0,0,0,1,0,0,2,0,0,2,
2,0,0,0,0,0,0,0,0,2,0,0,0,2,3,0,1,2,0,1,
1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,3,0,2,
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,1,
2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,
Please help me, I have been searching all about and I can't find anything.
try this: Did you want a list of lists? or just one big list?
with open(filename, "r") as txtr:
data = txtr.read()
data = txtr.split("/n") # split into list of strings
data = [ list(map(int, x.split(","))) for x in data]
fourth line splits string into list by removing comma, then appliea int() on each element then turns it into a list. It does this for every element in data. I hope it helps.
Here is for just one large list.
with open(filename, "r") as txtr:
data = txtr.readlines() # remove empty lines in your file!
data = ",".join(data) # turns it into a large string
data = data.split(",") # now you have a list of strings
data = list(map(int, data)) # applies int() to each element in data.
Look into the map built-in function in python.
L=['1', '2', '3']
map = map(int, L)
for el in map:
print(el)
>>> 1
... 2
... 3
As per you question, please find below a way you can change list of strings to list of integers (or integers if you use list index to get the integer value). Hope this helps.
myStrList = ["1","2","\n","3"]
global myNewIntList
myNewIntList = []
for x in myStrList:
if(x != "\n"):
y = int(x)
myNewIntList.append(y)
print(myNewIntList)

Creating RDD from input data with repeated delimiters - Spark

I have input data as key value pairs with pipe delimitation as below, some of values contain delimiters in its fields.
key1:value1|key2:val:ue2|key3:valu||e3
key1:value4|key2:value5|key3:value6
Expected output is below.
value1|val:ue2|valu||e3
value4|value5|value6
i tried as below to create RDD,
rdd=sc.textFile("path").map(lambda l: [x.split(":")[1] for x in l.split("|")]).map(tuple)
Above mapping works when we don't have these delimiters in the input value fields as below.
key1:value1|key2:value2|key3:value3
key1:value4|key2:value5|key3:value6
And also i tried regex as below,
rdd=sc.textFile("path").map(lambda l: [x.split(":")[1] for x in l.split("((?<!\|)\|(?!\|))")]).map(tuple)
Input data without delimiters
key1:value1|key2:value2|key3:value3
key1:value4|key2:value5|key3:value6
>>> rdd=sc.textFile("testcwp").map(lambda l: [x.split(":")[1] for x in l.split("|")])
>>> rdd.collect()
[(u'value1', u'value2', u'value3'), (u'value4', u'value5', u'value6')]
Input data with delimiters
key1:value1|key2:val:ue2|key3:valu||e3
key1:value4|key2:value5|key3:value6
Without regex
>>> rdd=sc.textFile("testcwp").map(lambda l: [x.split(":")[1] for x in l.split("|")]).map(tuple)
>>> rdd.collect()
Error: IndexError: list index out of range
with regex
>>> rdd=sc.textFile("testcwp").map(lambda l: [x.split(":")[1] for x in l.split("((?<!\|)\|(?!\|))")).map(tuple)
>>> rdd.collect()
[(u'value1|key2'), (u'value4|key2')]
How can i achieve below result from the input?
[(u'value1', u'val:ue2', u'valu||e3'), (u'value4', u'value5', u'value6')]
From this i will create dataframe do some processing.
Any suggestions from pure python also welcome. Thanks in Advance!
Here is the solution:
The main issue is l.split() works for fixed delimiter only.
rdd=sc.textFile("testcwp").map(lambda l: [x.split(":")[1:] for x in re.split("((?<!\|)\|(?!\|))",l)]).map(tuple)
>>> rdd.collect()
[([u'value1'], [u'val', u'ue2'], [u'val||ue3']), ([u'value4'], [u'value5'], [u'value6'])]
Following RDD concatenates elements inside lists,
>>> rdd2=rdd.map(lambda l: ['|'.join(x) for x in l]).map(tuple)
>>> rdd2.collect()
[(u'value1', u'value2', u'val||ue3'), (u'value4', u'value5', u'value6')]

Extracting float numbers from file using python

I have .txt file which looks like:
[ -5.44339373e+00 -2.77404404e-01 1.26122094e-01 9.83589873e-01
1.95201179e-01 -4.49866890e-01 -2.06423297e-01 1.04780491e+00]
[ 4.34562117e-01 -1.04469577e-01 2.83633101e-01 1.00452355e-01 -7.12572469e-01 -4.99234705e-01 -1.93152897e-01 1.80787567e-02]
I need to extract all floats from it and put them to list/array
What I've done is this:
A = []
for line in open("general.txt", "r").read().split(" "):
for unit in line.split("]", 3):
A.append(list(map(lambda x: str(x), unit.replace("[", "").replace("]", "").split(" "))))
but A contains elements like [''] or even worse ['3.20973096e-02\n']. These are all strings, but I need floats. How to do that?
Why not use a regular expression?
>>> import re
>>> e = r'(\d+\.\d+e?(?:\+|-)\d{2}?)'
>>> results = re.findall(e, your_string)
['5.44339373e+00',
'2.77404404e-01',
'1.26122094e-01',
'9.83589873e-01',
'1.95201179e-01',
'4.49866890e-01',
'2.06423297e-01',
'1.04780491e+00',
'4.34562117e-01',
'1.04469577e-01',
'2.83633101e-01',
'1.00452355e-01',
'7.12572469e-01',
'4.99234705e-01',
'1.93152897e-01',
'1.80787567e-02']
Now, these are the matched strings, but you can easily convert them to floats:
>>> map(float, re.findall(e, your_string))
[5.44339373,
0.277404404,
0.126122094,
0.983589873,
0.195201179,
0.44986689,
0.206423297,
1.04780491,
0.434562117,
0.104469577,
0.283633101,
0.100452355,
0.712572469,
0.499234705,
0.193152897,
0.0180787567]
Note, the regular expression might need some tweaking, but its a good start.
As a more precise way you can use regex for split the lines :
>>> s="""[ -5.44339373e+00 -2.77404404e-01 1.26122094e-01 9.83589873e-01
... 1.95201179e-01 -4.49866890e-01 -2.06423297e-01 1.04780491e+00]
... [ 4.34562117e-01 -1.04469577e-01 2.83633101e-01 1.00452355e-01 -7.12572469e-01 -4.99234705e-01 -1.93152897e-01 1.80787567e-02] """
>>> print re.split(r'[\s\[\]]+',s)
['', '-5.44339373e+00', '-2.77404404e-01', '1.26122094e-01', '9.83589873e-01', '1.95201179e-01', '-4.49866890e-01', '-2.06423297e-01', '1.04780491e+00', '4.34562117e-01', '-1.04469577e-01', '2.83633101e-01', '1.00452355e-01', '-7.12572469e-01', '-4.99234705e-01', '-1.93152897e-01', '1.80787567e-02', '']
And in this case that you have the data in file you can do :
import re
print re.split(r'[\s\[\]]+',open("general.txt", "r").read())
If you want to get ride of the empty strings in leading and trailing you can just use a list comprehension :
>>> print [i for i in re.split(r'[\s\[\]]*',s) if i]
['-5.44339373e+00', '-2.77404404e-01', '1.26122094e-01', '9.83589873e-01', '1.95201179e-01', '-4.49866890e-01', '-2.06423297e-01', '1.04780491e+00', '4.34562117e-01', '-1.04469577e-01', '2.83633101e-01', '1.00452355e-01', '-7.12572469e-01', '-4.99234705e-01', '-1.93152897e-01', '1.80787567e-02']
let's slurp the file
content = open('data.txt').read()
split on ']'
logical_lines = content.split(']')
strip the '[' and the other stuff
logical_lines = [ll.lstrip(' \n[') for ll in logical_lines]
convert to floats
lol = [map(float,ll.split()) for ll in logical_lines]
Sticking it all in a one-liner
lol=[map(float,l.lstrip(' \n[').split()) for l in open('data.txt').read().split(']')]
I've tested it on the exemplar data we were given and it works...

python parse csv to lists

I have a csv file thru which I want to parse the data to the lists.
So I am using the python csv module to read that
so basically the following:
import csv
fin = csv.reader(open(path,'rb'),delimiter=' ',quotechar='|')
print fin[0]
#gives the following
['"1239","2249.00","1","3","2011-02-20"']
#lets say i do the following
ele = str(fin[0])
ele = ele.strip().split(',')
print ele
#gives me following
['[\'"1239"', '"2249.00"', '"1"', '"3"', '"2011-02-20"\']']
now
ele[0] gives me --> output---> ['"1239"
How do I get rid of that ['
In the end, I want to do is get 1239 and convert it to integer.. ?
Any clues why this is happening
Thanks
Edit:*Never mind.. resolved thanks to the first comment *
Change your delimiter to ',' and you will get a list of those values from the csv reader.
It's because you are converting a list to a string, there is no need to do this. Grab the first element of the list (in this case it is a string) and parse that:
>>> a = ['"1239","2249.00","1","3","2011-02-20"']
>>> a
['"1239","2249.00","1","3","2011-02-20"']
>>> a[0]
'"1239","2249.00","1","3","2011-02-20"'
>>> b = a[0].replace('"', '').split(',')
>>> b[-1]
'2011-02-20'
of course before you do replace and split string methods you should check if the type is string or handle the exception if it isn't.
Also Blahdiblah is correct your delimiter is probably wrong.

Categories