Convert List to A String - python
I am having problems keeping the data into a string format. The data converts to a list once I perform a split on each row (x.split). What do I need to do to keep the data in a string format?
from pyspark import SparkContext
sc = SparkContext.getOrCreate()
document = sc.textFile("/content/sample_data/dr_csv")
print type(document)
print document.count()
document.take(5)
document.takeSample(True, 5, 3)
record = document.map(lambda x: x.split(','))
record.take(3)
You can just have a copy of x to split it without affecting x as follows:
temp = x
record = document.map(lambda temp: temp.split(','))
You can use the .join method if you want to get a string with all of the elements of the list. Suppose you have lst = ['cat', 'dog', 'pet']. Performing " ".join(lst) would return a string with all the elements of lst separated by a space: "cat dog pet".
''.join([str(i) for i in document.map(lambda x: x.split(',')])
Related
How to turn a list containing strings into a list containing integers (Python)
I am optimizing PyRay (https://github.com/oscr/PyRay) to be a usable Python ray-casting engine, and I am working on a feature that takes a text file and turns it into a list (PyRay uses as a map). But when I use the file as a list, it turns the contents into strings, therefore not usable by PyRay. So my question is: How do I convert a list of strings into integers? Here is my code so far. (I commented the actual code so I can test this) print("What map file to open?") mapopen = input(">") mapload = open(mapopen, "r") worldMap = [line.split(',') for line in mapload.readlines()] print(worldMap) The map file: 1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2, 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2, 2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1, 1,0,2,0,0,3,0,0,0,0,0,0,0,2,3,2,3,0,0,2, 2,0,3,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,1, 1,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,2, 2,3,1,0,0,2,0,0,0,2,3,2,0,0,0,0,0,0,0,1, 1,0,0,0,0,0,0,0,0,1,0,1,0,0,1,2,0,0,0,2, 2,0,0,0,0,0,0,0,0,2,0,2,0,0,2,1,0,0,0,1, 1,0,0,0,0,0,0,0,0,1,3,1,0,0,0,0,0,0,0,2, 2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1, 1,0,2,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,2, 2,0,3,0,0,2,0,0,0,0,0,0,0,2,3,2,1,2,0,1, 1,0,0,0,0,3,0,0,0,0,0,0,0,1,0,0,2,0,0,2, 2,3,1,0,0,2,0,0,2,1,3,2,0,2,0,0,3,0,3,1, 1,0,0,0,0,0,0,0,0,3,0,0,0,1,0,0,2,0,0,2, 2,0,0,0,0,0,0,0,0,2,0,0,0,2,3,0,1,2,0,1, 1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,3,0,2, 2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,1, 2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1, Please help me, I have been searching all about and I can't find anything.
try this: Did you want a list of lists? or just one big list? with open(filename, "r") as txtr: data = txtr.read() data = txtr.split("/n") # split into list of strings data = [ list(map(int, x.split(","))) for x in data] fourth line splits string into list by removing comma, then appliea int() on each element then turns it into a list. It does this for every element in data. I hope it helps. Here is for just one large list. with open(filename, "r") as txtr: data = txtr.readlines() # remove empty lines in your file! data = ",".join(data) # turns it into a large string data = data.split(",") # now you have a list of strings data = list(map(int, data)) # applies int() to each element in data.
Look into the map built-in function in python. L=['1', '2', '3'] map = map(int, L) for el in map: print(el) >>> 1 ... 2 ... 3
As per you question, please find below a way you can change list of strings to list of integers (or integers if you use list index to get the integer value). Hope this helps. myStrList = ["1","2","\n","3"] global myNewIntList myNewIntList = [] for x in myStrList: if(x != "\n"): y = int(x) myNewIntList.append(y) print(myNewIntList)
Extract numeric values from a string for python
I have a string with contains numeric values which are inside quotes. I need to remove numeric values from these and also the [ and ] sample string: texts = ['13007807', '13007779'] texts = ['13007807', '13007779'] texts.replace("'", "") texts..strip("'") print texts # this will return ['13007807', '13007779'] So what i need to extract from string is: 13007807 13007779
If your texts variable is a string as I understood from your reply, then you can use Regular expressions: import re text = "['13007807', '13007779']" regex=r"\['(\d+)', '(\d+)'\]" values=re.search(regex, text) if values: value1=int(values.group(1)) value2=int(values.group(2)) output: value1=13007807 value2=13007779
You can use * unpack operator: texts = ['13007807', '13007779'] print (*texts) output: 13007807 13007779 if you have : data = "['13007807', '13007779']" print (*eval(data)) output: 13007807 13007779
The easiest way is to use map and wrap around in list list(map(int,texts)) Output [13007807, 13007779] If your input data is of format data = "['13007807', '13007779']" then import re data = "['13007807', '13007779']" list(map(int, re.findall('(\d+)',data))) or list(map(int, eval(data)))
Python: How to spilt string in dictionary
I have the following JSON Data: json_data = {"window_string": "X=-10 H=30 Y=20 W=40"} How would I split the values to a list that is similar to this: window_string = ["X = -10", "Y = 20", "W=40"]
json_data = {"window_string": "X=-10 H=30 Y=20 W=40"} print(json_data["window_string"].split()) #Use str.split() Output: ['X=-10', 'H=30', 'Y=20', 'W=40']
split() function takes a string and splits it in list of strings, where every item in that splitted list is a word in the original string. Since json_data['window_string'] has 4 words that every word is one item in the output list, it works just fine: json_data = {'window_string': 'X=-10 H=30 Y=20 W=40'} window_string = json_data['window_string'].split()
Creating RDD from input data with repeated delimiters - Spark
I have input data as key value pairs with pipe delimitation as below, some of values contain delimiters in its fields. key1:value1|key2:val:ue2|key3:valu||e3 key1:value4|key2:value5|key3:value6 Expected output is below. value1|val:ue2|valu||e3 value4|value5|value6 i tried as below to create RDD, rdd=sc.textFile("path").map(lambda l: [x.split(":")[1] for x in l.split("|")]).map(tuple) Above mapping works when we don't have these delimiters in the input value fields as below. key1:value1|key2:value2|key3:value3 key1:value4|key2:value5|key3:value6 And also i tried regex as below, rdd=sc.textFile("path").map(lambda l: [x.split(":")[1] for x in l.split("((?<!\|)\|(?!\|))")]).map(tuple) Input data without delimiters key1:value1|key2:value2|key3:value3 key1:value4|key2:value5|key3:value6 >>> rdd=sc.textFile("testcwp").map(lambda l: [x.split(":")[1] for x in l.split("|")]) >>> rdd.collect() [(u'value1', u'value2', u'value3'), (u'value4', u'value5', u'value6')] Input data with delimiters key1:value1|key2:val:ue2|key3:valu||e3 key1:value4|key2:value5|key3:value6 Without regex >>> rdd=sc.textFile("testcwp").map(lambda l: [x.split(":")[1] for x in l.split("|")]).map(tuple) >>> rdd.collect() Error: IndexError: list index out of range with regex >>> rdd=sc.textFile("testcwp").map(lambda l: [x.split(":")[1] for x in l.split("((?<!\|)\|(?!\|))")).map(tuple) >>> rdd.collect() [(u'value1|key2'), (u'value4|key2')] How can i achieve below result from the input? [(u'value1', u'val:ue2', u'valu||e3'), (u'value4', u'value5', u'value6')] From this i will create dataframe do some processing. Any suggestions from pure python also welcome. Thanks in Advance!
Here is the solution: The main issue is l.split() works for fixed delimiter only. rdd=sc.textFile("testcwp").map(lambda l: [x.split(":")[1:] for x in re.split("((?<!\|)\|(?!\|))",l)]).map(tuple) >>> rdd.collect() [([u'value1'], [u'val', u'ue2'], [u'val||ue3']), ([u'value4'], [u'value5'], [u'value6'])] Following RDD concatenates elements inside lists, >>> rdd2=rdd.map(lambda l: ['|'.join(x) for x in l]).map(tuple) >>> rdd2.collect() [(u'value1', u'value2', u'val||ue3'), (u'value4', u'value5', u'value6')]
Remove string quotes from array in Python
I'm trying to get rid of some characters in my array so I'm just left with the x and y coordinates, separated by a comma as follows: [[316705.77017187304,790526.7469308273] [321731.20991025254,790958.3493565321]] I have used zip() to create a tuple of the x and y values (as pairs from a list of strings), which I've then converted to an array using numpy. The array currently looks like this: [['316705.77017187304,' '790526.7469308273,'] ['321731.20991025254,' '790958.3493565321,']] I need the output to be an array. I'm pretty stumped about how to get rid of the single quotes and the second comma. I have read that map() can change string to numeric but I can't get it to work. Thanks in advance
Using 31.2. ast — Abstract Syntax Trees¶ import ast xll = [['321731.20991025254,' '790958.3493565321,'], ['321731.20991025254,' '790958.3493565321,']] >>> [ast.literal_eval(xl[0]) for xl in xll] [(321731.20991025254, 790958.3493565321), (321731.20991025254, 790958.3493565321)] Above gives list of tuples for list of list, type following: >>> [list(ast.literal_eval(xl[0])) for xl in xll] [[321731.20991025254, 790958.3493565321], [321731.20991025254, 790958.3493565321]] OLD: I think this: >>> sll [['316705.770172', '790526.746931'], ['321731.20991', '790958.349357']] >>> fll = [[float(i) for i in l] for l in sll] >>> fll [[316705.770172, 790526.746931], [321731.20991, 790958.349357]] >>> old Edit: >>> xll = [['321731.20991025254,' '790958.3493565321,'], ['321731.20991025254,' '790958.3493565321,']] >>> [[float(s) for s in xl[0].split(',') if s.strip() != ''] for xl in xll] [[321731.20991025254, 790958.3493565321], [321731.20991025254, 790958.3493565321]]