Pythonic and concise way to construct this list? - python

How can I write the following code more concisely?
scores = []
for f in glob.glob(path):
score = read_score(f, Normalize = True)
scores.append(score)
I know this can be written in one or two lines without using append, but I'm a Python newbie.

Oh, I got it while browsing a related question:
scores = [read_score(f, normalize=True) for f in glob.glob(path)]

Related

Searching the the lowest Value from a file

I've included a test sample of my golf.txt file and the code I've used to print the results.
golf.txt:
Andrew
53
Dougie
21
The code to open this file and print the results (to keep this short I only have two players and two scores
golfData = open('golf.txt','r')
whichLine=golfData.readlines()
for i in range(0,len(whichLine),2):
print('Name:'+whichLine[i])
print('Score:'+whichLine[i+1])
golfData.close()
Can I modify the code I have to pull out the minimum player score with name? I believe I can without writing to a list or dictionary but have NO clue how.
Help/suggestions much appreciated.
Use min() function for that:
with open('file.txt') as f_in:
min_player, min_score = min(zip(f_in, f_in), key=lambda k: int(k[1]))
print(min_player, min_score)
Prints:
Dougie
21
As #h0r53 indicated, you can try something like this:
golfData = open('golf.txt','r')
whichLine=golfData.readlines()
lowest=float('Inf')
Name=''
for i in range(0,len(whichLine),2):
if float(whichLine[i+1])<lowest:
lowest=float(whichLine[i+1])
Name=whichLine[i]
golfData.close()
print(Name)
print(lowest)

Python Script takes much time / List comprehensions

I have a small script which compares, from CSV input files, how many items of the first list are in the second list. However, it takes a certain time to run when there is many references.
data_1 = import_csv("test1.csv")
print(len(data_1))
data_2 = import_csv("test2.csv")
print(len(data_2))
data_to_keep = len([i for i in data_1 if i in data_2])
I just run a test with 598756 items for the first list and 76612 for the second, and the script hasn't finished yet.
As I'm still relatively new to Python, I would like to know if there is a fastest way to achieve what I'm trying to do. Thank you for your help :)
EDIT : import CSV looks like this :
def import_csv(csvfilename):
data = []
with open(csvfilename, "r", encoding="utf-8", errors="ignore") as scraped:
reader = csv.reader(scraped, delimiter=',')
for row in reader:
if row: # avoid blank lines
data.append(row[0])
return data
Make data_2 a set.
data_2 = set(import_csv("test2.csv"))
In Python, sets are much faster for checking if an object is present (using the in operator).
You might also see some improvement from switching the order of your inputs. Make the larger file the set, that way you do fewer lookups when you iterate over the elements of the smaller file.
You can use set and it's intersection, if duplicates can be safely discarded:
data1 = [1,2,3,3,4]
data2 = [2,3,5,6,1,6]
print(len(set(data1).intersection(data2)))
# 3
This is set operation and is guaranteed to be faster than what you do.
Try it
import csv
with open('test1.csv', newline='') as csvfile:
list1 = csv.reader(csvfile, delimiter=',')
with open('test2.csv', newline='') as csvfile2:
list2 = csv.reader(csvfile2, delimiter=',')
data_to_keep = len([i for i in list1 if i in list2])
I'm making a few assumptions here but here's an idea...
test1.csv and test2.csv hold something unique, like serial numbers. Like...
9210268126,4628032171,6691918168,1499888554,2024134986,
8826205840,5643225730,3174290295,1881330725,7192644763,
7210351670,7956881819,4897219228,4638431591,6444695480,
1949859915,8919131597,2176933146,3875411064,3546520925
Try...
with open("test1.csv") as f1, open("test2.csv") as f2:
data_1 = [line.split(",") for line in f1]
data_2 = [line.split(",") for line in f2]
Since they're unique we can use the set functions to see which entries are in the other file:
data_to_keep = set(data_1).intersection(set(data_2))
I'm not sure how to do it faster - at that point it might be a hardware bottleneck.
That one should also work. It converts the list to a dictionary and avoids a sequential search that is performed using the in operator. In large datasest you often avoid the use of in operator.
data_1 = import_csv("test1.csv")
data_2 = dict([(i,i) for i in import_csv("test2.csv")])
data_to_keep = len([i for i in data_1 if data_2.get(i) is not None])

How to iteratively bring in different layers from a folder and assign them each different variable names?

I have been working with some code that exports layers individually filled with important data into a folder. The next thing I want to do is bring each one of those layers into a different program so that I can combine them and do some different tests. The current way that I know how to do it is by importing them one by one (as seen below).
fn0 = 'layer0'
f0 = np.genfromtxt(fn0 + '.csv', delimiter=",")
fn1 = 'layer1'
f1 = np.genfromtxt(fn1 + '.csv', delimiter=",")
The issue with continuing this way is that I may have to deal with up to 100 layers at a time, and it would be very inconvenient to have to import each layer individually.
Is there a way I can change my code to do this iteratively so that I can have a code similar to such:
N = 100
for i in range(N)
fn(i) = 'layer(i)'
f(i) = np.genfromtxt(fn(i) + '.csv', delimiter=",")
Please let me know if you know of any ways!
you can use string formatting as follows
N = 100
f = [] #create an empty list
for i in range(N)
fn_i = 'layer(%d)'%i #parentheses!
f.append(np.genfromtxt(fn_i + '.csv', delimiter=",")) #add to f
What I mean by parentheses! is that they are 'important' characters. They indicate function calls and tuples, so you shouldn't use them in variables (ever!)
The answer of Mohammad Athar is correct. However, you should not use the % printing any longer. According to PEP 3101 (https://www.python.org/dev/peps/pep-3101/) it is supposed to be replaced by format(). Moreover, as you have more than 100 files a format like layer_007.csv is probably appreciated.
Try something like:
dataDict=dict()
for counter in range(214):
fileName = 'layer_{number:03d}.csv'.format(number=counter)
dataDict[fileName] = np.genfromtxt( fileName, delimiter="," )
When using a dictionary, like here, you can directly access your data later by using the file name; it is unsorted though, such that you might prefer the list version of Mohammad Athar.

Repeating code when sorting information from a txt-file - Python

I'm having some problem with avoiding my code to repeat itself, like the title says, when I import data from a txt-file. My question is, if there is a smarter way to loop the function. I'm still very new to python in general so I don't have good knowledge in this area.
The code that I'm using is the following
with open("fundamenta.txt") as fundamenta:
fundamenta_list = []
for row in fundamenta:
info_1 = row.strip()
fundamenta_list.append(info_1)
namerow_1 = fundamenta_list[1]
sol_1 = fundamenta_list[2]
pe_1 = fundamenta_list[3]
ps_1 = fundamenta_list[4]
namerow_2 = fundamenta_list[5]
sol_2 = fundamenta_list[6]
pe_2 = fundamenta_list[7]
ps_2 = fundamenta_list[8]
namerow_3 = fundamenta_list[9]
sol_3 = fundamenta_list[10]
pe_3 = fundamenta_list[11]
ps_3 = fundamenta_list[12]
So when the code is reading from "fundamenta_list" how do I change to prevent code repetition?
It looks to me that your input file has records each as a block of 4 rows, so in turn is namerow, sol, pe, ps, and you'll be creating objects that take these 4 fields. Assuming your object is called MyObject, you can do something like:
with open("test.data") as f:
objects = []
while f:
try:
(namerow, sol, pe, ps) = next(f).strip(), next(f).strip(), next(f).strip(), next(f).strip()
objects.append(MyObject(namerow, sol, pe, ps))
except:
break
then you can access your objects as objects[0] etc.
You could even make it into a function returning the list of objects like in Moyote's answer.
If I understood your question correctly, you may want to make a function out of your code, so you can avoid repeating the same code.
You can do this:
def read_file_and_save_to_list(file_name):
with open(file_name) as f:
list_to_return = []
for row in f:
list_to_return.append(row.strip())
return list_to_return
Then afterwards you can call the function like this:
fundamenta_list = read_file_and_save_to_list("fundamenta.txt")

How to compare two files (~50MB) fast?

I am using Python and want to compare two files and find the lines that are unique to each of them. I'm doing as shown below, but it is too slow.
f1 = open(file1)
text1Lines = f1.readlines()
f2 = open(file2)
text2Lines = f2.readlines()
diffInstance=difflib.Differ()
diffList = list(diffInstance.compare(text1Lines, text2Lines))
How can I speed it up considerably?
You might read and compare files simultaneously, instead of storing them in memory. The following snippet makes a lot of unrealistic assumptions (ie the files are of the same length and no line is present in the same file twice), but it illustrates the idea:
unique_1 = []
unique_2 = []
for line_1 in handle_1:
# Reading line from the 1st file and checking if we have already seen them in in the 2nd
if line_1 in unique_2:
unique_2.remove(line)
# If line was unique, remember it
else:
unique_1.append(line)
# The same, only files are the other way
line_2 = handle_2.readline()
if line_2 in unique_1:
unique_1.remove(line)
else:
unique_2.append(line)
print('\n'.join(unique_1))
print('\n'.join(unique_2))
Sure, it smells of reinventing bicycle, but you might get better performance by using simple algorithm instead of fancy diff-building and distance calculations of difflib. Alternatively, if you are absolutely sure your files will never be too big to fit in memory (not the safest assumption, to be honest), you can just use the set difference:
set1=set()
set2=set()
for line in handle_1:
set1.add(line)
for line in handle_2:
set2.add(line)
set1_uniques = set1.difference(set2)
set2_uniques = set2.difference(set1)
your code may have some bugs... you can only find the difference in same line. if 2 files have different number of lines or the data is unsorted your code will have problem...next is my code:
f1 = open('a.txt')
text1Lines = f1.readlines()
f2 = open('b.txt')
text2Lines = f2.readlines()
set1 = set(text1Lines)
set2 = set(text2Lines)
diffList = (set1|set2)-(set1&set2)

Categories