I have a .txt file with floating point numbers inside. This file always contains an even number of values which need to be formatted as follows: [[[a,b],[c,d],[e,f]]]
The values always need to be in pairs of two. Even when there are less or more values: [[[a,b], ... [y,z]]]
So it needs to go from this:
3.31497114423 50.803721015, 7.09205325687 50.803721015, 7.09205325687 53.5104033474, 3.31497114423 53.5104033474, 3.31497114423 50.803721015
To this:
[[[3.31497114423,50.803721015],[7.09205325687,50.803721015],[7.09205325687,53.5104033474],[3.31497114423,53.5104033474],[3.31497114423,50.803721015]]]
I have the feeling this can be done fairly easy and efficiƫnt. The code I have so far works, but is far from efficient...
with open(filename) as f:
for line in f:
footprint = line.strip()
splitted = footprint.split(' ')
list_str = []
for coordinate in splitted:
list_str.append(coordinate.replace(',', ''))
list_floats = [float(x) for x in list_str]
footprint = [list_floats[x:x+2] for x in range(0, len(list_floats), 2)]
return [footprint]
Any help is greatly appreciated!
The split function is very useful in scenarios such as these.
with open(filename) as f:
# Format the string of numbers into a list seperated by commas
new_list = f.read().split(", ")
# For every element in this list, make it a list seperated by space
# Also convert the strings into floats
for i in range(len(new_list)):
new_list[i] = list(map(float, new_list[i].split(" ")))
new_list = [new_list]
The first split converts the code from this
3.31497114423 50.803721015, 7.09205325687 50.803721015, 7.09205325687 53.5104033474, 3.31497114423 53.5104033474, 3.31497114423 50.803721015
To this
['3.31497114423 50.803721015', '7.09205325687 50.803721015', '7.09205325687 53.5104033474', '3.31497114423 53.5104033474', '3.31497114423 50.803721015']
The second split converts that to this
[['3.31497114423', '50.803721015'], ['7.09205325687', '50.803721015'], ['7.09205325687', 53.5104033474'], ['3.31497114423', '53.5104033474'], ['3.31497114423', '50.803721015']]
Then the mapping of the float function converts it to this (the list converts the map object to a list object)
[[3.31497114423, 50.803721015], [7.09205325687, 50.803721015], [7.09205325687, 53.5104033474], [3.31497114423, 53.5104033474], [3.31497114423, 50.803721015]]
The last brackets place the whole thing into another list
[[[3.31497114423, 50.803721015], [7.09205325687, 50.803721015], [7.09205325687, 53.5104033474], [3.31497114423, 53.5104033474], [3.31497114423, 50.803721015]]]
I am optimizing PyRay (https://github.com/oscr/PyRay) to be a usable Python ray-casting engine, and I am working on a feature that takes a text file and turns it into a list (PyRay uses as a map). But when I use the file as a list, it turns the contents into strings, therefore not usable by PyRay. So my question is: How do I convert a list of strings into integers? Here is my code so far. (I commented the actual code so I can test this)
print("What map file to open?")
mapopen = input(">")
mapload = open(mapopen, "r")
worldMap = [line.split(',') for line in mapload.readlines()]
print(worldMap)
The map file:
1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,
2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,
1,0,2,0,0,3,0,0,0,0,0,0,0,2,3,2,3,0,0,2,
2,0,3,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,1,
1,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,2,
2,3,1,0,0,2,0,0,0,2,3,2,0,0,0,0,0,0,0,1,
1,0,0,0,0,0,0,0,0,1,0,1,0,0,1,2,0,0,0,2,
2,0,0,0,0,0,0,0,0,2,0,2,0,0,2,1,0,0,0,1,
1,0,0,0,0,0,0,0,0,1,3,1,0,0,0,0,0,0,0,2,
2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,
1,0,2,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,2,
2,0,3,0,0,2,0,0,0,0,0,0,0,2,3,2,1,2,0,1,
1,0,0,0,0,3,0,0,0,0,0,0,0,1,0,0,2,0,0,2,
2,3,1,0,0,2,0,0,2,1,3,2,0,2,0,0,3,0,3,1,
1,0,0,0,0,0,0,0,0,3,0,0,0,1,0,0,2,0,0,2,
2,0,0,0,0,0,0,0,0,2,0,0,0,2,3,0,1,2,0,1,
1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,3,0,2,
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,1,
2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,
Please help me, I have been searching all about and I can't find anything.
try this: Did you want a list of lists? or just one big list?
with open(filename, "r") as txtr:
data = txtr.read()
data = txtr.split("/n") # split into list of strings
data = [ list(map(int, x.split(","))) for x in data]
fourth line splits string into list by removing comma, then appliea int() on each element then turns it into a list. It does this for every element in data. I hope it helps.
Here is for just one large list.
with open(filename, "r") as txtr:
data = txtr.readlines() # remove empty lines in your file!
data = ",".join(data) # turns it into a large string
data = data.split(",") # now you have a list of strings
data = list(map(int, data)) # applies int() to each element in data.
Look into the map built-in function in python.
L=['1', '2', '3']
map = map(int, L)
for el in map:
print(el)
>>> 1
... 2
... 3
As per you question, please find below a way you can change list of strings to list of integers (or integers if you use list index to get the integer value). Hope this helps.
myStrList = ["1","2","\n","3"]
global myNewIntList
myNewIntList = []
for x in myStrList:
if(x != "\n"):
y = int(x)
myNewIntList.append(y)
print(myNewIntList)
How can I do a search of a value of the first "latitude, longitude" coordinate in a "file.txt" list in Python and get 3 rows above and 3 rows below?
Value
37.0459
file.txt
37.04278,-95.58895
37.04369,-95.58592
37.04369,-95.58582
37.04376,-95.58557
37.04376,-95.58546
37.04415,-95.58429
37.0443,-95.5839
37.04446,-95.58346
37.04461,-95.58305
37.04502,-95.58204
37.04516,-95.58184
37.04572,-95.58139
37.04597,-95.58127
37.04565,-95.58073
37.04546,-95.58033
37.04516,-95.57948
37.04508,-95.57914
37.04494,-95.57842
37.04483,-95.5771
37.0448,-95.57674
37.04474,-95.57606
37.04467,-95.57534
37.04462,-95.57474
37.04458,-95.57396
37.04454,-95.57274
37.04452,-95.57233
37.04453,-95.5722
37.0445,-95.57164
37.04448,-95.57122
37.04444,-95.57054
37.04432,-95.56845
37.04432,-95.56834
37.04424,-95.5668
37.044,-95.56251
37.04396,-95.5618
Expected Result
37.04502,-95.58204
37.04516,-95.58184
37.04572,-95.58139
37.04597,-95.58127
37.04565,-95.58073
37.04546,-95.58033
37.04516,-95.57948
Additional information
In linux I can get the closest line and do the treatment I need using grep, sed, cut and others, but I'd like in Python.
Any help will be greatly appreciated!
Thank you.
How can I do a search of a value of the first "latitude, longitude"
coordinate in a "file.txt" list in Python and get 3 rows above and 3
rows below?*
You can try:
with open("text_filter.txt") as f:
text = f.readlines() # read text lines to list
filter= "37.0459"
match = [i for i,x in enumerate(text) if filter in x] # get list index of item matching filter
if match:
if len(text) >= match[0]+3: # if list has 3 items after filter, print it
print("".join(text[match[0]:match[0]+3]).strip())
print(text[match[0]].strip())
if match[0] >= 3: # if list has 3 items before filter, print it
print("".join(text[match[0]-3:match[0]]).strip())
Output:
37.04597,-95.58127
37.04565,-95.58073
37.04546,-95.58033
37.04597,-95.58127
37.04502,-95.58204
37.04516,-95.58184
37.04572,-95.58139
You can use pandas to import the data in a dataframe and then easily manipulate it. As per your question the value to check is not the exact match and therefore I have converted it to string.
import pandas as pd
data = pd.read_csv("file.txt", header=None, names=["latitude","longitude"]) #imports text file as dataframe
value_to_check = 37.0459 # user defined
for i in range(len(data)):
if str(value_to_check) == str(data.iloc[i,0])[:len(str(value_to_check))]:
break
print(data.iloc[i-3:i+4,:])
output
latitude longitude
9 37.04502 -95.58204
10 37.04516 -95.58184
11 37.04572 -95.58139
12 37.04597 -95.58127
13 37.04565 -95.58073
14 37.04546 -95.58033
15 37.04516 -95.57948
A solution with iterators, that only keeps in memory the necessary lines and doesn't load the unnecessary part of the file:
from collections import deque
from itertools import islice
def find_in_file(file, target, before=3, after=3):
queue = deque(maxlen=before)
with open(file) as f:
for line in f:
if target in map(float, line.split(',')):
out = list(queue) + [line] + list(islice(f, 3))
return out
queue.append(line)
else:
raise ValueError('target not found')
Some tests:
print(find_in_file('test.txt', 37.04597))
# ['37.04502,-95.58204\n', '37.04516,-95.58184\n', '37.04572,-95.58139\n', '37.04597,-95.58127\n',
# '37.04565,-95.58073\n', '37.04565,-95.58073\n', '37.04565,-95.58073\n']
print(find_in_file('test.txt', 37.044)) # Only one line after the match
# ['37.04432,-95.56845\n', '37.04432,-95.56834\n', '37.04424,-95.5668\n', '37.044,-95.56251\n',
# '37.04396,-95.5618\n']
Also, it works if there is less than the expected number of lines before or after the match. We match floats, not strings, as '37.04' would erroneously match '37.0444' otherwise.
This solution will print the before and after elements even if they are less than 3.
Also I am using string as it is implied from the question that you want partial matches also. ie. 37.0459 will match 37.04597
search_term='37.04462'
with open('file.txt') as f:
lines = f.readlines()
lines = [line.strip().split(',') for line in lines] #remove '\n'
for lat,lon in lines:
if search_term in lat:
index=lines.index([lat,lon])
break
left=0
right=0
for k in range (1,4): #bcoz last one is not included
if index-k >=0:
left+=1
if index+k<=(len(lines)-1):
right+=1
for i in range(index-left,index+right+1): #bcoz last one is not included
print(lines[i][0],lines[i][1])
So I have a text file with around 400,000 lists that mostly look like this.
100005 127545 202036 257630 362970 376927 429080
10001 27638 51569 88226 116422 126227 159947 162938 184977 188045
191044 246142 265214 290507 296858 300258 341525 348922 359832 365744
382502 390538 410857 433453 479170 489980 540746
10001 27638 51569 88226 116422 126227 159947 162938 184977 188045
191044 246142 265214 290507 300258 341525 348922 359832 365744 382502
So far I have a for loop that goes line by line and turns the current line into a temp array list.
How would I create a top ten list that has the list with the most elements of the whole file.
This is the code I have now.
file = open('node.txt', 'r')
adj = {}
top_ten = []
at_least_3 = 0
for line in file:
data = line.split()
adj[data[0]] = data[1:]
And this is what one of the list look like
['99995', '110038', '330533', '333808', '344852', '376948', '470766', '499315']
# collect the lines
lines = []
with open("so.txt") as f:
for line in f:
# split each line into a list
lines.append(line.split())
# sort the lines by length, descending
lines = sorted(lines, key=lambda x: -len(x))
# print the first 10 lines
print(lines[:10])
Why not use collections to display the top 10? i.e.:
import re
import collections
file = open('numbers.txt', 'r')
content = file.read()
numbers = re.findall(r"\d+", content)
counter = collections.Counter(numbers)
print(counter.most_common(10))
Ideone Demo
When wanting to count and then find the one(s) with the highest counts, collections.Counter comes to mind:
from collections import Counter
lists = Counter()
with open('node.txt', 'r') as file:
for line in file:
values = line.split()
lists[tuple(values)] = len(values)
print('Length Data')
print('====== ====')
for values, length in lists.most_common(10):
print('{:2d} {}'.format(length, list(values)))
Output (using sample file data):
Length Data
====== ====
10 ['191044', '246142', '265214', '290507', '300258', '341525', '348922', '359832', '365744', '382502']
10 ['191044', '246142', '265214', '290507', '296858', '300258', '341525', '348922', '359832', '365744']
10 ['10001', '27638', '51569', '88226', '116422', '126227', '159947', '162938', '184977', '188045']
7 ['382502', '390538', '410857', '433453', '479170', '489980', '540746']
7 ['100005', '127545', '202036', '257630', '362970', '376927', '429080']
Use a for loop and max() maybe? You say you've got a for loop that's placing the values into a temp array. From that you could use "max()" to pick out the largest value and put that into a list.
As an open for loop, something like appending max() to a new list:
newlist = []
for x in data:
largest = max(x)
newlist.append(largest)
Or as a list comprehension:
newlist = [max(x) for x in data]
Then from there you have to do the same process on the new list(s) until you get to the desired top 10 scenario.
EDIT: I've just realised that i've misread your question. You want to get the lists with the most elements, not the highest values. Ok.
len() is a good one for this.
for x in data:
if len(templist) > x:
newlist.append(templist)
That would give you the current highest and from there you could create a top 10 list of lengths or of the temp lists themselves, or both.
If your data is really as shown with each number the same length, then I would make a dictionary with key = line, value = length, get the top value / key pairs in the dictionary and voila. Sounds easy enough.