Sum in a mixed list of data - python

This is an specific questions that will help me finalize an outgoing project. It's pretty simple, I'm very new to Python and my background is mostly art. I like it, but I find it very challenging too.
I have a Text file (Data.txt) that contains a list of numbers/names, like this (short example):
String 1
34.6
45.6
45.9
String 2
34.6
45.6
45.9
This is a mixed list. After every String....12 numbers follow and so on. Notice that the numbers are 'float'.
I designed this:
numberList = []
data = []
data = open("SalesData.txt").read().split()
for i in data:
numberList.append(i)
print numberList
This will append and print all the data in the external .txt list. How can I get all that data in the new list (numberList), but excluding all the 'Strings' found through the reading of the file. This way, I can perform a total Sum of only the numbers ---

First, you shouldn't do .read().split() on your file - that will split on any whitespace, not just on newlines. Fortunately, Python can iterate directly over files.
Then, you could try to convert each line into a float and only append it to the list if that works (and skip it otherwise). Also, you can convert it to floats right away - makes summing it up easier.
number_list = []
with open("SalesData.txt") as myfile:
for line in myfile:
try:
number_list.append(float(line))
except ValueError:
pass
print(sum(number_list))

just iterate over the lines and only add the numbers to the list ...
with open("somefile.txt") as f:
my_list = []
for line in f:
try:
my_list.append(float(line))
except ValueError:
pass
print sum(my_list)

If your data is structured (which seems to be the case), I would simply use a counter and drop the 1st element of each sequence of len 12.
Here's an example:
numberList = []
data = []
counter = 0
with open("SalesData.txt") as myfile:
for line in myfile:
if counter > 0:
number_list.append(float(line))
counter = (counter + 1) % 12
print numberList

You could do type checking, or if the numbers are actually digits in a string, you could do something like a "13".isdigit() check but I'd probably want to do something a little more clever:
numberList = []
for i in range(0, len(data)/13):
numberList.append(data[1*(i+1):13*(i+1)])
That (or something like it) should target the groups of numbers after the strings. This does depend on your input data not being junk, but it should work quicker on large datasets than doing .isdigit() or other type checks.

try this:
tsum = []
for j in numberList:
try: tsum.append(float(j))
except: pass
sum = sum(tsum)

import re
ss = '''String 1
34.6
45.6
45.9
String 2
34.6
45.6
45.9
'''
print re.findall('^(?!.*?[a-zA-Z])[0-9.+-]+',ss,re.MULTILINE)
But if there can be letters whose none is ASCII, it won't work

Related

Efficiently create list of list of list with varying amount of input

I have a .txt file with floating point numbers inside. This file always contains an even number of values which need to be formatted as follows: [[[a,b],[c,d],[e,f]]]
The values always need to be in pairs of two. Even when there are less or more values: [[[a,b], ... [y,z]]]
So it needs to go from this:
3.31497114423 50.803721015, 7.09205325687 50.803721015, 7.09205325687 53.5104033474, 3.31497114423 53.5104033474, 3.31497114423 50.803721015
To this:
[[[3.31497114423,50.803721015],[7.09205325687,50.803721015],[7.09205325687,53.5104033474],[3.31497114423,53.5104033474],[3.31497114423,50.803721015]]]
I have the feeling this can be done fairly easy and efficiƫnt. The code I have so far works, but is far from efficient...
with open(filename) as f:
for line in f:
footprint = line.strip()
splitted = footprint.split(' ')
list_str = []
for coordinate in splitted:
list_str.append(coordinate.replace(',', ''))
list_floats = [float(x) for x in list_str]
footprint = [list_floats[x:x+2] for x in range(0, len(list_floats), 2)]
return [footprint]
Any help is greatly appreciated!
The split function is very useful in scenarios such as these.
with open(filename) as f:
# Format the string of numbers into a list seperated by commas
new_list = f.read().split(", ")
# For every element in this list, make it a list seperated by space
# Also convert the strings into floats
for i in range(len(new_list)):
new_list[i] = list(map(float, new_list[i].split(" ")))
new_list = [new_list]
The first split converts the code from this
3.31497114423 50.803721015, 7.09205325687 50.803721015, 7.09205325687 53.5104033474, 3.31497114423 53.5104033474, 3.31497114423 50.803721015
To this
['3.31497114423 50.803721015', '7.09205325687 50.803721015', '7.09205325687 53.5104033474', '3.31497114423 53.5104033474', '3.31497114423 50.803721015']
The second split converts that to this
[['3.31497114423', '50.803721015'], ['7.09205325687', '50.803721015'], ['7.09205325687', 53.5104033474'], ['3.31497114423', '53.5104033474'], ['3.31497114423', '50.803721015']]
Then the mapping of the float function converts it to this (the list converts the map object to a list object)
[[3.31497114423, 50.803721015], [7.09205325687, 50.803721015], [7.09205325687, 53.5104033474], [3.31497114423, 53.5104033474], [3.31497114423, 50.803721015]]
The last brackets place the whole thing into another list
[[[3.31497114423, 50.803721015], [7.09205325687, 50.803721015], [7.09205325687, 53.5104033474], [3.31497114423, 53.5104033474], [3.31497114423, 50.803721015]]]

How to turn a list containing strings into a list containing integers (Python)

I am optimizing PyRay (https://github.com/oscr/PyRay) to be a usable Python ray-casting engine, and I am working on a feature that takes a text file and turns it into a list (PyRay uses as a map). But when I use the file as a list, it turns the contents into strings, therefore not usable by PyRay. So my question is: How do I convert a list of strings into integers? Here is my code so far. (I commented the actual code so I can test this)
print("What map file to open?")
mapopen = input(">")
mapload = open(mapopen, "r")
worldMap = [line.split(',') for line in mapload.readlines()]
print(worldMap)
The map file:
1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,
2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,
1,0,2,0,0,3,0,0,0,0,0,0,0,2,3,2,3,0,0,2,
2,0,3,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,1,
1,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,2,
2,3,1,0,0,2,0,0,0,2,3,2,0,0,0,0,0,0,0,1,
1,0,0,0,0,0,0,0,0,1,0,1,0,0,1,2,0,0,0,2,
2,0,0,0,0,0,0,0,0,2,0,2,0,0,2,1,0,0,0,1,
1,0,0,0,0,0,0,0,0,1,3,1,0,0,0,0,0,0,0,2,
2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,
1,0,2,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,2,
2,0,3,0,0,2,0,0,0,0,0,0,0,2,3,2,1,2,0,1,
1,0,0,0,0,3,0,0,0,0,0,0,0,1,0,0,2,0,0,2,
2,3,1,0,0,2,0,0,2,1,3,2,0,2,0,0,3,0,3,1,
1,0,0,0,0,0,0,0,0,3,0,0,0,1,0,0,2,0,0,2,
2,0,0,0,0,0,0,0,0,2,0,0,0,2,3,0,1,2,0,1,
1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,3,0,2,
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,1,
2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,
Please help me, I have been searching all about and I can't find anything.
try this: Did you want a list of lists? or just one big list?
with open(filename, "r") as txtr:
data = txtr.read()
data = txtr.split("/n") # split into list of strings
data = [ list(map(int, x.split(","))) for x in data]
fourth line splits string into list by removing comma, then appliea int() on each element then turns it into a list. It does this for every element in data. I hope it helps.
Here is for just one large list.
with open(filename, "r") as txtr:
data = txtr.readlines() # remove empty lines in your file!
data = ",".join(data) # turns it into a large string
data = data.split(",") # now you have a list of strings
data = list(map(int, data)) # applies int() to each element in data.
Look into the map built-in function in python.
L=['1', '2', '3']
map = map(int, L)
for el in map:
print(el)
>>> 1
... 2
... 3
As per you question, please find below a way you can change list of strings to list of integers (or integers if you use list index to get the integer value). Hope this helps.
myStrList = ["1","2","\n","3"]
global myNewIntList
myNewIntList = []
for x in myStrList:
if(x != "\n"):
y = int(x)
myNewIntList.append(y)
print(myNewIntList)

Find first line of text according to value in Python

How can I do a search of a value of the first "latitude, longitude" coordinate in a "file.txt" list in Python and get 3 rows above and 3 rows below?
Value
37.0459
file.txt
37.04278,-95.58895
37.04369,-95.58592
37.04369,-95.58582
37.04376,-95.58557
37.04376,-95.58546
37.04415,-95.58429
37.0443,-95.5839
37.04446,-95.58346
37.04461,-95.58305
37.04502,-95.58204
37.04516,-95.58184
37.04572,-95.58139
37.04597,-95.58127
37.04565,-95.58073
37.04546,-95.58033
37.04516,-95.57948
37.04508,-95.57914
37.04494,-95.57842
37.04483,-95.5771
37.0448,-95.57674
37.04474,-95.57606
37.04467,-95.57534
37.04462,-95.57474
37.04458,-95.57396
37.04454,-95.57274
37.04452,-95.57233
37.04453,-95.5722
37.0445,-95.57164
37.04448,-95.57122
37.04444,-95.57054
37.04432,-95.56845
37.04432,-95.56834
37.04424,-95.5668
37.044,-95.56251
37.04396,-95.5618
Expected Result
37.04502,-95.58204
37.04516,-95.58184
37.04572,-95.58139
37.04597,-95.58127
37.04565,-95.58073
37.04546,-95.58033
37.04516,-95.57948
Additional information
In linux I can get the closest line and do the treatment I need using grep, sed, cut and others, but I'd like in Python.
Any help will be greatly appreciated!
Thank you.
How can I do a search of a value of the first "latitude, longitude"
coordinate in a "file.txt" list in Python and get 3 rows above and 3
rows below?*
You can try:
with open("text_filter.txt") as f:
text = f.readlines() # read text lines to list
filter= "37.0459"
match = [i for i,x in enumerate(text) if filter in x] # get list index of item matching filter
if match:
if len(text) >= match[0]+3: # if list has 3 items after filter, print it
print("".join(text[match[0]:match[0]+3]).strip())
print(text[match[0]].strip())
if match[0] >= 3: # if list has 3 items before filter, print it
print("".join(text[match[0]-3:match[0]]).strip())
Output:
37.04597,-95.58127
37.04565,-95.58073
37.04546,-95.58033
37.04597,-95.58127
37.04502,-95.58204
37.04516,-95.58184
37.04572,-95.58139
You can use pandas to import the data in a dataframe and then easily manipulate it. As per your question the value to check is not the exact match and therefore I have converted it to string.
import pandas as pd
data = pd.read_csv("file.txt", header=None, names=["latitude","longitude"]) #imports text file as dataframe
value_to_check = 37.0459 # user defined
for i in range(len(data)):
if str(value_to_check) == str(data.iloc[i,0])[:len(str(value_to_check))]:
break
print(data.iloc[i-3:i+4,:])
output
latitude longitude
9 37.04502 -95.58204
10 37.04516 -95.58184
11 37.04572 -95.58139
12 37.04597 -95.58127
13 37.04565 -95.58073
14 37.04546 -95.58033
15 37.04516 -95.57948
A solution with iterators, that only keeps in memory the necessary lines and doesn't load the unnecessary part of the file:
from collections import deque
from itertools import islice
def find_in_file(file, target, before=3, after=3):
queue = deque(maxlen=before)
with open(file) as f:
for line in f:
if target in map(float, line.split(',')):
out = list(queue) + [line] + list(islice(f, 3))
return out
queue.append(line)
else:
raise ValueError('target not found')
Some tests:
print(find_in_file('test.txt', 37.04597))
# ['37.04502,-95.58204\n', '37.04516,-95.58184\n', '37.04572,-95.58139\n', '37.04597,-95.58127\n',
# '37.04565,-95.58073\n', '37.04565,-95.58073\n', '37.04565,-95.58073\n']
print(find_in_file('test.txt', 37.044)) # Only one line after the match
# ['37.04432,-95.56845\n', '37.04432,-95.56834\n', '37.04424,-95.5668\n', '37.044,-95.56251\n',
# '37.04396,-95.5618\n']
Also, it works if there is less than the expected number of lines before or after the match. We match floats, not strings, as '37.04' would erroneously match '37.0444' otherwise.
This solution will print the before and after elements even if they are less than 3.
Also I am using string as it is implied from the question that you want partial matches also. ie. 37.0459 will match 37.04597
search_term='37.04462'
with open('file.txt') as f:
lines = f.readlines()
lines = [line.strip().split(',') for line in lines] #remove '\n'
for lat,lon in lines:
if search_term in lat:
index=lines.index([lat,lon])
break
left=0
right=0
for k in range (1,4): #bcoz last one is not included
if index-k >=0:
left+=1
if index+k<=(len(lines)-1):
right+=1
for i in range(index-left,index+right+1): #bcoz last one is not included
print(lines[i][0],lines[i][1])

How to sort a large number of lists to get a top 10 of the longest lists

So I have a text file with around 400,000 lists that mostly look like this.
100005 127545 202036 257630 362970 376927 429080
10001 27638 51569 88226 116422 126227 159947 162938 184977 188045
191044 246142 265214 290507 296858 300258 341525 348922 359832 365744
382502 390538 410857 433453 479170 489980 540746
10001 27638 51569 88226 116422 126227 159947 162938 184977 188045
191044 246142 265214 290507 300258 341525 348922 359832 365744 382502
So far I have a for loop that goes line by line and turns the current line into a temp array list.
How would I create a top ten list that has the list with the most elements of the whole file.
This is the code I have now.
file = open('node.txt', 'r')
adj = {}
top_ten = []
at_least_3 = 0
for line in file:
data = line.split()
adj[data[0]] = data[1:]
And this is what one of the list look like
['99995', '110038', '330533', '333808', '344852', '376948', '470766', '499315']
# collect the lines
lines = []
with open("so.txt") as f:
for line in f:
# split each line into a list
lines.append(line.split())
# sort the lines by length, descending
lines = sorted(lines, key=lambda x: -len(x))
# print the first 10 lines
print(lines[:10])
Why not use collections to display the top 10? i.e.:
import re
import collections
file = open('numbers.txt', 'r')
content = file.read()
numbers = re.findall(r"\d+", content)
counter = collections.Counter(numbers)
print(counter.most_common(10))
Ideone Demo
When wanting to count and then find the one(s) with the highest counts, collections.Counter comes to mind:
from collections import Counter
lists = Counter()
with open('node.txt', 'r') as file:
for line in file:
values = line.split()
lists[tuple(values)] = len(values)
print('Length Data')
print('====== ====')
for values, length in lists.most_common(10):
print('{:2d} {}'.format(length, list(values)))
Output (using sample file data):
Length Data
====== ====
10 ['191044', '246142', '265214', '290507', '300258', '341525', '348922', '359832', '365744', '382502']
10 ['191044', '246142', '265214', '290507', '296858', '300258', '341525', '348922', '359832', '365744']
10 ['10001', '27638', '51569', '88226', '116422', '126227', '159947', '162938', '184977', '188045']
7 ['382502', '390538', '410857', '433453', '479170', '489980', '540746']
7 ['100005', '127545', '202036', '257630', '362970', '376927', '429080']
Use a for loop and max() maybe? You say you've got a for loop that's placing the values into a temp array. From that you could use "max()" to pick out the largest value and put that into a list.
As an open for loop, something like appending max() to a new list:
newlist = []
for x in data:
largest = max(x)
newlist.append(largest)
Or as a list comprehension:
newlist = [max(x) for x in data]
Then from there you have to do the same process on the new list(s) until you get to the desired top 10 scenario.
EDIT: I've just realised that i've misread your question. You want to get the lists with the most elements, not the highest values. Ok.
len() is a good one for this.
for x in data:
if len(templist) > x:
newlist.append(templist)
That would give you the current highest and from there you could create a top 10 list of lengths or of the temp lists themselves, or both.
If your data is really as shown with each number the same length, then I would make a dictionary with key = line, value = length, get the top value / key pairs in the dictionary and voila. Sounds easy enough.

Python: Counting a specific set of character occurrences in lines of a file

I am struggling with a small program in Python which aims at counting the occurrences of a specific set of characters in the lines of a text file.
As an example, if I want to count '!' and '#' from the following lines
hi!
hello#gmail.com
collection!
I'd expect the following output:
!;2
#;1
So far I got a functional code, but it's inefficient and does not use the potential that Python libraries have.
I have tried using collections.counter, with limited success. The efficiency blocker I found is that I couldn't select specific sets of characters on counter.update(), all the rest of the characters found were also counted. Then I would have to filter the characters I am not interested in, which adds another loop...
I also considered regular expressions, but I can't see an advantage in this case.
This is the functional code I have right now (the simplest idea I could imagine), which looks for special characters in file's lines. I'd like to see if someone can come up with a neater Python-specific idea:
def count_special_chars(filename):
special_chars = list('!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~ ')
dict_count = dict(zip(special_chars, [0] * len(special_chars)))
with open(filename) as f:
for passw in f:
for c in passw:
if c in special_chars:
dict_count[c] += 1
return dict_count
thanks for checking
Why not count the whole file all together? You should avoid looping through string for each line of the file. Use string.count instead.
from pprint import pprint
# Better coding style: put constant out of the function
SPECIAL_CHARS = '!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~ '
def count_special_chars(filename):
with open(filename) as f:
content = f.read()
return dict([(i, content.count(i)) for i in SPECIAL_CHARS])
pprint(count_special_chars('example.txt'))
example output:
{' ': 0,
'!': 2,
'.': 1,
'#': 1,
'[': 0,
'~': 0
# the remaining keys with a value of zero are ignored
...}
Eliminating the extra counts from collections.Counter is probably not significant either way, but if it bothers you, do it during the initial iteration:
from collections import Counter
special_chars = '''!"#$%&'()*+,-./:;<=>?#[\\]^_`{|}~ '''
found_chars = [c for c in open(yourfile).read() if c in special_chars]
counted_chars = Counter(found_chars)
need not to process file contents line-by-line
to avoid nested loops, which increase complexity of your program
If you want to count character occurrences in some string, first, you loop over the entire string to construct an occurrence dict. Then, you can find any occurrence of character from the dict. This reduce complexity of the program.
When constructing occurrence dict, defaultdict would help you to initialize count values.
A refactored version of the program is as below:
special_chars = list('!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~ ')
dict_count = defaultdict(int)
with open(filename) as f:
for c in f.read():
dict_count[c] += 1
for c in special_chars:
print('{0};{1}'.format(c, dict_count[c]))
ref. defaultdict Examples: https://docs.python.org/3.4/library/collections.html#defaultdict-examples
I did something like this where you do not need to use the counter library. I used it to count all the special char but you can adapt to put the count in a dict.
import re
def countSpecial(passwd):
specialcount = 0
for special in special_chars:
lenght = 0
#print special
lenght = len(re.findall(r'(\%s)' %special , passwd))
if lenght > 0:
#print lenght,special
specialcount = lenght + specialcount
return specialcount

Categories