Remove string quotes from array in Python - python
I'm trying to get rid of some characters in my array so I'm just left with the x and y coordinates, separated by a comma as follows:
[[316705.77017187304,790526.7469308273]
[321731.20991025254,790958.3493565321]]
I have used zip() to create a tuple of the x and y values (as pairs from a list of strings), which I've then converted to an array using numpy. The array currently looks like this:
[['316705.77017187304,' '790526.7469308273,']
['321731.20991025254,' '790958.3493565321,']]
I need the output to be an array.
I'm pretty stumped about how to get rid of the single quotes and the second comma. I have read that map() can change string to numeric but I can't get it to work.
Thanks in advance
Using 31.2. ast — Abstract Syntax Trees¶
import ast
xll = [['321731.20991025254,' '790958.3493565321,'], ['321731.20991025254,' '790958.3493565321,']]
>>> [ast.literal_eval(xl[0]) for xl in xll]
[(321731.20991025254, 790958.3493565321), (321731.20991025254, 790958.3493565321)]
Above gives list of tuples for list of list, type following:
>>> [list(ast.literal_eval(xl[0])) for xl in xll]
[[321731.20991025254, 790958.3493565321], [321731.20991025254, 790958.3493565321]]
OLD: I think this:
>>> sll
[['316705.770172', '790526.746931'], ['321731.20991', '790958.349357']]
>>> fll = [[float(i) for i in l] for l in sll]
>>> fll
[[316705.770172, 790526.746931], [321731.20991, 790958.349357]]
>>>
old Edit:
>>> xll = [['321731.20991025254,' '790958.3493565321,'], ['321731.20991025254,' '790958.3493565321,']]
>>> [[float(s) for s in xl[0].split(',') if s.strip() != ''] for xl in xll]
[[321731.20991025254, 790958.3493565321], [321731.20991025254, 790958.3493565321]]
Related
List of strings to create a new list of strings without quotes?
Data crop_list = ['Cotton','Ragi', 'Groundnut', 'Sugarcane', 'Redgram', 'Sunflower', 'Paddy', 'Maize','Jowar'] Now each element is DataFrame for a in crop_list: vars()[a] = Data[Data['Crop']== a] For next line of codes i might need to create a list manually, i.e. dfs from functools import reduce dfs =[Cotton,Ragi,Groundnut,Sugarcane,Redgram,Sunflower,Paddy,Maize,Jowar] df_merged = reduce(lambda a,b: pd.merge(a,b, on='Year'), dfs) so im asking is there any way to get dunamic list: Expected output: Another List with same strings without quotes: new_crop_list = [Cotton,Ragi, Groundnut, Sugarcane, Redgram, Sunflower, Paddy,Maize,Jowar]
I think this is basically what you meant crop_list = ["'Cotton'","'Ragi'", "'Groundnut'", "'Sugarcane'", "'Redgram'", "'Sunflower'", "'Paddy'", "'Maize'","'Jowar'"] Since without quotes, a string is not a string, if this is the case. You can remove the single quotes from the list using the following code new_list = [ x.replace("'","") for x in crop_list] The above code will remove single quotes from around the values in the list. The output will look like ['Cotton', 'Ragi', 'Groundnut', 'Sugarcane', 'Redgram', 'Sunflower', 'Paddy', 'Maize', 'Jowar'] You will still see single quotes in output, since its a list of strings, and the quotes denotes that . Hope this answers your question
How to turn a list containing strings into a list containing integers (Python)
I am optimizing PyRay (https://github.com/oscr/PyRay) to be a usable Python ray-casting engine, and I am working on a feature that takes a text file and turns it into a list (PyRay uses as a map). But when I use the file as a list, it turns the contents into strings, therefore not usable by PyRay. So my question is: How do I convert a list of strings into integers? Here is my code so far. (I commented the actual code so I can test this) print("What map file to open?") mapopen = input(">") mapload = open(mapopen, "r") worldMap = [line.split(',') for line in mapload.readlines()] print(worldMap) The map file: 1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2, 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2, 2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1, 1,0,2,0,0,3,0,0,0,0,0,0,0,2,3,2,3,0,0,2, 2,0,3,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,1, 1,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,2, 2,3,1,0,0,2,0,0,0,2,3,2,0,0,0,0,0,0,0,1, 1,0,0,0,0,0,0,0,0,1,0,1,0,0,1,2,0,0,0,2, 2,0,0,0,0,0,0,0,0,2,0,2,0,0,2,1,0,0,0,1, 1,0,0,0,0,0,0,0,0,1,3,1,0,0,0,0,0,0,0,2, 2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1, 1,0,2,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,2, 2,0,3,0,0,2,0,0,0,0,0,0,0,2,3,2,1,2,0,1, 1,0,0,0,0,3,0,0,0,0,0,0,0,1,0,0,2,0,0,2, 2,3,1,0,0,2,0,0,2,1,3,2,0,2,0,0,3,0,3,1, 1,0,0,0,0,0,0,0,0,3,0,0,0,1,0,0,2,0,0,2, 2,0,0,0,0,0,0,0,0,2,0,0,0,2,3,0,1,2,0,1, 1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,3,0,2, 2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,1, 2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1, Please help me, I have been searching all about and I can't find anything.
try this: Did you want a list of lists? or just one big list? with open(filename, "r") as txtr: data = txtr.read() data = txtr.split("/n") # split into list of strings data = [ list(map(int, x.split(","))) for x in data] fourth line splits string into list by removing comma, then appliea int() on each element then turns it into a list. It does this for every element in data. I hope it helps. Here is for just one large list. with open(filename, "r") as txtr: data = txtr.readlines() # remove empty lines in your file! data = ",".join(data) # turns it into a large string data = data.split(",") # now you have a list of strings data = list(map(int, data)) # applies int() to each element in data.
Look into the map built-in function in python. L=['1', '2', '3'] map = map(int, L) for el in map: print(el) >>> 1 ... 2 ... 3
As per you question, please find below a way you can change list of strings to list of integers (or integers if you use list index to get the integer value). Hope this helps. myStrList = ["1","2","\n","3"] global myNewIntList myNewIntList = [] for x in myStrList: if(x != "\n"): y = int(x) myNewIntList.append(y) print(myNewIntList)
Creating RDD from input data with repeated delimiters - Spark
I have input data as key value pairs with pipe delimitation as below, some of values contain delimiters in its fields. key1:value1|key2:val:ue2|key3:valu||e3 key1:value4|key2:value5|key3:value6 Expected output is below. value1|val:ue2|valu||e3 value4|value5|value6 i tried as below to create RDD, rdd=sc.textFile("path").map(lambda l: [x.split(":")[1] for x in l.split("|")]).map(tuple) Above mapping works when we don't have these delimiters in the input value fields as below. key1:value1|key2:value2|key3:value3 key1:value4|key2:value5|key3:value6 And also i tried regex as below, rdd=sc.textFile("path").map(lambda l: [x.split(":")[1] for x in l.split("((?<!\|)\|(?!\|))")]).map(tuple) Input data without delimiters key1:value1|key2:value2|key3:value3 key1:value4|key2:value5|key3:value6 >>> rdd=sc.textFile("testcwp").map(lambda l: [x.split(":")[1] for x in l.split("|")]) >>> rdd.collect() [(u'value1', u'value2', u'value3'), (u'value4', u'value5', u'value6')] Input data with delimiters key1:value1|key2:val:ue2|key3:valu||e3 key1:value4|key2:value5|key3:value6 Without regex >>> rdd=sc.textFile("testcwp").map(lambda l: [x.split(":")[1] for x in l.split("|")]).map(tuple) >>> rdd.collect() Error: IndexError: list index out of range with regex >>> rdd=sc.textFile("testcwp").map(lambda l: [x.split(":")[1] for x in l.split("((?<!\|)\|(?!\|))")).map(tuple) >>> rdd.collect() [(u'value1|key2'), (u'value4|key2')] How can i achieve below result from the input? [(u'value1', u'val:ue2', u'valu||e3'), (u'value4', u'value5', u'value6')] From this i will create dataframe do some processing. Any suggestions from pure python also welcome. Thanks in Advance!
Here is the solution: The main issue is l.split() works for fixed delimiter only. rdd=sc.textFile("testcwp").map(lambda l: [x.split(":")[1:] for x in re.split("((?<!\|)\|(?!\|))",l)]).map(tuple) >>> rdd.collect() [([u'value1'], [u'val', u'ue2'], [u'val||ue3']), ([u'value4'], [u'value5'], [u'value6'])] Following RDD concatenates elements inside lists, >>> rdd2=rdd.map(lambda l: ['|'.join(x) for x in l]).map(tuple) >>> rdd2.collect() [(u'value1', u'value2', u'val||ue3'), (u'value4', u'value5', u'value6')]
Extracting float numbers from file using python
I have .txt file which looks like: [ -5.44339373e+00 -2.77404404e-01 1.26122094e-01 9.83589873e-01 1.95201179e-01 -4.49866890e-01 -2.06423297e-01 1.04780491e+00] [ 4.34562117e-01 -1.04469577e-01 2.83633101e-01 1.00452355e-01 -7.12572469e-01 -4.99234705e-01 -1.93152897e-01 1.80787567e-02] I need to extract all floats from it and put them to list/array What I've done is this: A = [] for line in open("general.txt", "r").read().split(" "): for unit in line.split("]", 3): A.append(list(map(lambda x: str(x), unit.replace("[", "").replace("]", "").split(" ")))) but A contains elements like [''] or even worse ['3.20973096e-02\n']. These are all strings, but I need floats. How to do that?
Why not use a regular expression? >>> import re >>> e = r'(\d+\.\d+e?(?:\+|-)\d{2}?)' >>> results = re.findall(e, your_string) ['5.44339373e+00', '2.77404404e-01', '1.26122094e-01', '9.83589873e-01', '1.95201179e-01', '4.49866890e-01', '2.06423297e-01', '1.04780491e+00', '4.34562117e-01', '1.04469577e-01', '2.83633101e-01', '1.00452355e-01', '7.12572469e-01', '4.99234705e-01', '1.93152897e-01', '1.80787567e-02'] Now, these are the matched strings, but you can easily convert them to floats: >>> map(float, re.findall(e, your_string)) [5.44339373, 0.277404404, 0.126122094, 0.983589873, 0.195201179, 0.44986689, 0.206423297, 1.04780491, 0.434562117, 0.104469577, 0.283633101, 0.100452355, 0.712572469, 0.499234705, 0.193152897, 0.0180787567] Note, the regular expression might need some tweaking, but its a good start.
As a more precise way you can use regex for split the lines : >>> s="""[ -5.44339373e+00 -2.77404404e-01 1.26122094e-01 9.83589873e-01 ... 1.95201179e-01 -4.49866890e-01 -2.06423297e-01 1.04780491e+00] ... [ 4.34562117e-01 -1.04469577e-01 2.83633101e-01 1.00452355e-01 -7.12572469e-01 -4.99234705e-01 -1.93152897e-01 1.80787567e-02] """ >>> print re.split(r'[\s\[\]]+',s) ['', '-5.44339373e+00', '-2.77404404e-01', '1.26122094e-01', '9.83589873e-01', '1.95201179e-01', '-4.49866890e-01', '-2.06423297e-01', '1.04780491e+00', '4.34562117e-01', '-1.04469577e-01', '2.83633101e-01', '1.00452355e-01', '-7.12572469e-01', '-4.99234705e-01', '-1.93152897e-01', '1.80787567e-02', ''] And in this case that you have the data in file you can do : import re print re.split(r'[\s\[\]]+',open("general.txt", "r").read()) If you want to get ride of the empty strings in leading and trailing you can just use a list comprehension : >>> print [i for i in re.split(r'[\s\[\]]*',s) if i] ['-5.44339373e+00', '-2.77404404e-01', '1.26122094e-01', '9.83589873e-01', '1.95201179e-01', '-4.49866890e-01', '-2.06423297e-01', '1.04780491e+00', '4.34562117e-01', '-1.04469577e-01', '2.83633101e-01', '1.00452355e-01', '-7.12572469e-01', '-4.99234705e-01', '-1.93152897e-01', '1.80787567e-02']
let's slurp the file content = open('data.txt').read() split on ']' logical_lines = content.split(']') strip the '[' and the other stuff logical_lines = [ll.lstrip(' \n[') for ll in logical_lines] convert to floats lol = [map(float,ll.split()) for ll in logical_lines] Sticking it all in a one-liner lol=[map(float,l.lstrip(' \n[').split()) for l in open('data.txt').read().split(']')] I've tested it on the exemplar data we were given and it works...
python parse csv to lists
I have a csv file thru which I want to parse the data to the lists. So I am using the python csv module to read that so basically the following: import csv fin = csv.reader(open(path,'rb'),delimiter=' ',quotechar='|') print fin[0] #gives the following ['"1239","2249.00","1","3","2011-02-20"'] #lets say i do the following ele = str(fin[0]) ele = ele.strip().split(',') print ele #gives me following ['[\'"1239"', '"2249.00"', '"1"', '"3"', '"2011-02-20"\']'] now ele[0] gives me --> output---> ['"1239" How do I get rid of that [' In the end, I want to do is get 1239 and convert it to integer.. ? Any clues why this is happening Thanks Edit:*Never mind.. resolved thanks to the first comment *
Change your delimiter to ',' and you will get a list of those values from the csv reader.
It's because you are converting a list to a string, there is no need to do this. Grab the first element of the list (in this case it is a string) and parse that: >>> a = ['"1239","2249.00","1","3","2011-02-20"'] >>> a ['"1239","2249.00","1","3","2011-02-20"'] >>> a[0] '"1239","2249.00","1","3","2011-02-20"' >>> b = a[0].replace('"', '').split(',') >>> b[-1] '2011-02-20' of course before you do replace and split string methods you should check if the type is string or handle the exception if it isn't. Also Blahdiblah is correct your delimiter is probably wrong.