I have two lists structured somehow like this:
A= [[1,27],[2,27],[3,27],[4,28],[5,29]]
B= [[6,30],[7,31],[8,31]]
and i have a file that has numbers:
1 5
2 3
3 1
4 2
5 5
6....
i want a code that reads this file and maps it to the list. e.g if the file has 1, it should read A list and output 27, if it has 6, it should read B and print 30, such that I get
27 29
27 27
27 27
28 27
29 29
30 31
The problem is, that my code gives index error, i read the file line by line and have an if condition that checks if the number i read from the file is less than the maximum number in list A, if so, it outputs the second character of that list and otherwise move on. The problem is, that instead of moving on to list B, my program still reads A and gives index error.
with open(filename) as myfile:
for line in myfile.readlines():
parts=line.split()
if parts[0]< maxnumforA:
print A[int(parts[0])-1]
else:
print B[int(parts[0]-1)
You should turn that lists into dictionaries. For example:
_A = dict(A)
_B = dict(B)
with open(filename) as myfile:
for line in myfile:
parts = line.split()
for part in parts:
part = int(part)
if part in _A:
print _A[part]
elif part in _B:
print _B[part]
If the action that will take place does not need to know if it comes from A or B, both can be turned into a single dictionary:
d = dict(A + B) # Creating the dictionary
with open(filename) as myfile:
for line in myfile:
parts = line.split()
for part in parts:
part = int(part)
if part in d:
print d[part]
Creating the dictionary can be acomplished in many different ways, I will list some of them:
d = dict(A + B): First joins both lists into a single list (without modifying A or B) and then turns the result into a dictionary. It's the most clear way to do it.
d = {**dict(A), **dict(B)}: Turns both lists into two separates dictionaries (without mmodifying A or B), unpacks them and pack both of them into a single dictionary. Slighlty (and I mean really slightly) faster than the previous method and less clear. Proposed by #Nf4r
d = dict(A) & d.update(B): Turns the first list into a dictionary and updates that dictionary with the content of the second list. Fastest method, 1 line of code per list instead of 1 line for any list and no temporary objects generation so more efficient memory-wise.
As everyone stated before, dict would be much better. I don't know if the left-sided values in each lists are unique, but if yes, you could just go for:
d = {**dict(A), **dict(B)} # available in Python 3.x
with open(filename) as file:
for line in file.readlines():
for num in line.split():
if int(num) in d:
print(d[int(num)])
Related
I'm making python read one file that makes all the lines of a file as a space-separated list current I'm facing an issue where I want python to read a list only if the content is more than 3 parts
I'm trying if int(len(list[3])) == 3: then read the 3 parts of the list but the program is giving the error of
`IndexError: list index out of range
it is usually given when I access something that doesn't exist but the line of code that read the 3rd part shouldn't run on a list without 3+ parts
You are probably looking for:
if len(list) > 3:
# read list
You don't need to convert len() to an int - it already is an int
list[3] gives you back fourth element of the list, you need to pass whole list object to len function
int == 3 will only catch numbers equal to 3, while you wanted all numbers above 3
I think is this:
def get_file_matrix(file:str):
with open(file,'r') as arq:
lines = arq.readlines()
#saniting
lines_clear = list(map(lambda line:line.strip(),lines))
lines_splited = list(map(lambda line:line.split(' '),lines_clear))
lines_filtered = list(filter(lambda line: len(line) <= 3,lines_splited ) )
return lines_filtered
r = get_file_matrix('test.txt')
print(r)
I am making something like a random sentence generator. I want to make a random sentence from words taken randomly from 10 .csv files, which change size frequently so I have to count their size before I select a random line. I have already made it but I'm using way too much code... it currently does something like this:
def file_len(fname):
f = open(fname)
try:
for i, l in enumerate(f):
pass
finally:
f.close()
return i + 1
then selecting random lines...
for all of them I do this:
file1_size = file_len('items/file1.csv')
file1_line = randrange(file1_size) + 1
file1_output = linecache.getline('items/file1.csv', file1_line)
and when it is done for all of them, I just print the output...
print file1_out + file2_out + file3_out ...
Also, sometimes I only want to use some files and not others, so I'd just print the ones I want... e.g. if I just want files number 3, 4 and 7 then I do:
print file3_out + file4_out + file7_out
Obviously there's 30 lines in the line counting, selecting random and assigning that to a variable - 3 lines of code for each file. But things are getting more complex and I thought a dictionary variable might be able to do what I want more quickly and with less code.
I thought it would be good to generate a variable whereby we end up with
random_lines = {'file1.csv': 24, 'file2.csv': 13, 'file3.csv': 3, 'file4.csv': 22, 'file5.csv': 8, 'file6.csv': 97, 'file7.csv': 34, 'file8.csv': 67, 'file9.csv': 31, 'file10.csv': 86}
(The key is the filename and the integer is a random line within the file, re-assigned each time the code is run)
Then, some kind of process that picks the required items (let's say sometimes we only want to use lines from files 1, 6, 8, and 10) and outputs the random line
output = file1.csv random line + file6.csv random line + file8.csv random line + file10.csv random line
then
print output
If anyone can see the obvious quick way to do this (I don't think it's rocket science but I am a beginner at python!) then I'd be very grateful!
Any time you find yourself reusing the same code over and over in an object-oriented language, use a function instead.
def GetOutput(file_name):
file_size = file_len(file_name)
file_line = randrange(file_size) + 1
return linecache.getline(file_name, file_line)
file_1 = GetOutput(file_1)
file_2 = GetOutput(file_2)
...
You can further simplify by storing everything in a dict as you suggest in your original question.
input_files = ['file_1.txt', 'file_2.txt', ...]
outputs = {}
for file_name in input_files:
outputs[file_name] = GetOutput(file_name)
I have a text file with many tens of thousands short sentences like this:
go to venice
come back from grece
new york here i come
from belgium to russia and back to spain
I run a tagging algorithm which produces a tagged output of this sentence file:
go to <place>venice</place>
come back from <place>grece</place>
<place>new york</place> here i come
from <place>belgium</place> to <place>russia</place> and back to <place>spain</place>
The algorithm runs over the input multiple times and produces each time slightly different tagging. My goal is to identify those lines where those differences occur. In other words, print all utterances for which the tagging differs across N results files.
For example N=10, I get 10 tagged files. Suppose line 1 is tagged all the time the same for all 10 tagged files - do not print it. Suppose line 2 is tagged once this way and 9 times other way - print it. And so on.
For N=2 is easy, I just run diff. But what to do if I have N=10 results?
If you have the tagged files - just create a counter for each line of how many times you've seen it:
# use defaultdict for convenience
from collections import defaultdict
# start counting at 0
counter_dict = defaultdict(lambda: 0)
tagged_file_names = ['tagged1.txt', 'tagged2.txt', ...]
# add all lines of each file to dict
for file_name in tagged_file_names:
with open(file_name) as f:
# use enumerate to maintain order
# produces (LINE_NUMBER, LINE CONTENT) tuples (hashable)
for line_with_number in enumerate(f.readlines()):
counter_dict[line_with_number] += 1
# print all values that do not repeat in all files (in same location)
for key, value in counter_dict.iteritems():
if value < len(tagged_file_names):
print "line number %d: [%s] only repeated %d times" % (
key[0], key[1].strip(), value
)
Walkthrough:
First of all, we create a data structure to enable us counting our entries, which are numbered lines. This data structure is a collections.defaultdict which a default value of 0 - which is the count of newly added lines (increased to 1 with each add).
Then, we create the actual entry using a tuple which is hashable, so it can be used as a dictionary key, and by default deeply-comparable to other tuples. this means (1, "lolz") is equal to (1, "lolz") but different than (1, "not lolz") or (2, lolz) - so it fits our use of deep-comparing lines to account for content as well as position.
Now all that's left to do is add all entries using a straightforward for loop and see what keys (which correspond to numbered lines) appear in all files (that is - their value is equal to the number of tagged files provided).
Example:
reut#tHP-EliteBook-8470p:~/python/counter$ cat tagged1.txt
123
abc
def
reut#tHP-EliteBook-8470p:~/python/counter$ cat tagged2.txt
123
def
def
reut#tHP-EliteBook-8470p:~/python/counter$ ./difference_counter.py
line number 1: [abc] only repeated 1 times
line number 1: [def] only repeated 1 times
if you compare all of them to the first text, then you can get a list of all texts that are different. this might not be the quickest way but it would work.
import difflib
n1 = '1 2 3 4 5 6'
n2 = '1 2 3 4 5 6'
n3 = '1 2 4 5 6 7'
l = [n1, n2, n3]
m = [x for x in l if x != l[0]]
diff = difflib.unified_diff(l[0], l.index(m))
print ''.join(diff)
I am trying to use python to parse a text file (stored in the var trackList) with times and titles in them it looks like this
00:04:45 example text
00:08:53 more example text
12:59:59 the last bit of example text
My regular expression (rem) works, I am also able to split the string (i) into two parts correctly (as in I separate times and text) but I am unable to then add the arrays (using .extend) that the split returns to a large array I created earlier (sLines).
f=open(trackList)
count=0
sLines=[[0 for x in range(0)] for y in range(34)]
line=[]
for i in f:
count+=1
line.append(i)
rem=re.match("\A\d\d\:\d\d\:\d\d\W",line[count-1])
if rem:
sLines[count-1].extend(line[count-1].split(' ',1))
else:
print("error on line: "+count)
That code should go through each line in the file trackList, test to see if the line is as expected, if so separate the time from the text and save the result of that as an array inside an array at the index of one less than the current line number, if not print an error pointing me to the line
I use array[count-1] as python arrays are zero indexed and file lines are not.
I use .extend() as I want both elements of the smaller array added to the larger array in the same iteration of the parent for loop.
So, you have some pretty confusing code there.
For instance doing:
[0 for x in range(0)]
Is a really fancy way of initializing an empty list:
>>> [] == [0 for x in range(0)]
True
Also, how do you know to get a matrix that is 34 lines long? You're also confusing yourself with calling your line 'i' in your for loop, usually that would be reserved as a short hand syntax for index, which you'd expect to be a numerical value. Appending i to line and then re-referencing it as line[count-1] is redundant when you already have your line variable (i).
Your overall code can be simplified to something like this:
# load the file and extract the lines
f = open(trackList)
lines = f.readlines()
f.close()
# create the expression (more optimized for loops)
expr = re.compile('^(\d\d:\d\d:\d\d)\s*(.*)$')
sLines = []
# loop the lines collecting both the index (i) and the line (line)
for i, line in enumerate(lines):
result = expr.match(line)
# validate the line
if ( not result ):
print("error on line: " + str(i+1))
# add an invalid list to the matrix
sLines.append([]) # or whatever you want as your invalid line
continue
# add the list to the matrix
sLines.append(result.groups())
I have two files which I loaded into lists. The content of the first file is something like this:
d.complex.1
23
34
56
58
68
76
.
.
.
etc
d.complex.179
43
34
59
69
76
.
.
.
etc
The content of the second file is also the same but with different numerical values. Please consider from one d.complex.* to another d.complex.* as one set.
Now I am interested in comparing each numerical value from one set of first file with each numerical value of the sets in the second file. I would like to record the number of times each numerical has appeared in the second file overall.
For example, the number 23 from d.complex.1 could have appeared 5 times in file 2 under different sets. All I want to do is record the number of occurrences of number 23 in file 2 including all sets of file 2.
My initial approach was to load them into a list and compare but I am not able to achieve this. I searched in google and came across sets but being a python noob, I need some guidance. Can anyone help me?
If you feel the question is not clear,please let me know. I have also pasted the complete file 1 and file 2 here:
http://pastebin.com/mwAWEcTa
http://pastebin.com/DuXDDRYT
Open the file using Python's open function, then iterate over all its lines. Check whether the line contains a number, if so, increase its count in a defaultdict instance as described here.
Repeat this for the other file and compare the resulting dicts.
First create a function which can load a given file, as you may want to maintain individual sets and also want to count occurrence of each number, best would be to have a dict for whole file where keys are set names e.g. complex.1 etc, for each such set keep another dict for numbers in set, below code explains it better
def file_loader(f):
file_dict = {}
current_set = None
for line in f:
if line.startswith('d.complex'):
file_dict[line] = current_set = {}
continue
if current_set is not None:
current_set[line] = current_set.get(line, 0)
return file_dict
Now you can easily write a function which will count a number in given file_dict
def count_number(file_dict, num):
count = 0
for set_name, number_set in file_dict.iteritems():
count += number_set.get(num, 0)
return count
e.g here is a usage example
s = """d.complex.1
10
11
12
10
11
12"""
file_dict = file_loader(s.split("\n"))
print file_dict
print count_number(file_dict, '10')
output is:
{'d.complex.1': {'11': 2, '10': 2, '12': 2}}
2
You may have to improve file loader, e.g. skip empty lines, convert to int etc