Formatting output when writing a list to textfile - python

i have a list of lists that looks like this:
dupe = [['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'apa.txt'], ['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'knark.txt'], ['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', 'apa2.txt'], ['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', 'jude.txt']]
I write it to a file using a very basic function():
try:
file_name = open("dupe.txt", "w")
except IOError:
pass
for a in range (len(dupe)):
file_name.write(dupe[a][0] + " " + dupe[a][1] + " " + dupe[a][2] + "\n");
file_name.close()
With the output in the file looking like this:
95d1543adea47e88923c3d4ad56e9f65c2b40c76 ron\c apa.txt
95d1543adea47e88923c3d4ad56e9f65c2b40c76 ron\c knark.txt
b5cc17d3a35877ca8b76f0b2e07497039c250696 ron\a apa2.txt
b5cc17d3a35877ca8b76f0b2e07497039c250696 ron\a jude.txt
However, how can i make the output in the dupe.txt file to look like this:
95d1543adea47e88923c3d4ad56e9f65c2b40c76 ron\c apa.txt, knark.txt
b5cc17d3a35877ca8b76f0b2e07497039c250696 ron\a apa2.txt, jude.txt

First, group the lines by the "key" (the first two elements of each array):
dupedict = {}
for a, b, c in dupe:
dupedict.setdefault((a,b),[]).append(c)
Then print it out:
for key, values in dupedict.iteritems():
print ' '.join(key), ', '.join(values)

i take it your last question didn't solve your problem?
instead of putting each list with repeating ID's and directories in seperate lists, why not make the file element of the list another sub list which contains all the files which have the same id and directory.
so dupe would look like this:
dupe = [['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', ['apa.txt','knark.txt']],
['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', ['apa2.txt','jude.txt']]
then your print loop could be similar to:
for i in dupe:
print i[0], i[1],
for j in i[2]
print j,
print

from collections import defaultdict
dupe = [
['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'apa.txt'],
['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'knark.txt'],
['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', 'apa2.txt'],
['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', 'jude.txt'],
]
with open("dupe.txt", "w") as f:
data = defaultdict(list)
for hash, dir, fn in dupe:
data[(hash, dir)].append(fn)
for hash_dir, fns in data.items():
f.write("{0[0]} {0[1]} {1}\n".format(hash_dir, ', '.join(fns)))

Use a dict to group them:
data = [['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'apa.txt'], \
['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'knark.txt'], \
['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', 'apa2.txt'], \
['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', 'jude.txt']]
dupes = {}
for row in data:
if dupes.has_key(row[0]):
dupes[row[0]].append(row)
else:
dupes[row[0]] = [row]
for dupe in dupes.itervalues():
print "%s\t%s\t%s" % (dupe[0][0], dupe[0][1], ",".join([x[2] for x in dupe]))

If this is your actual answer, you can:
Output one line per every two elements in dupe. This is easier. Or,
If your data isn't as structured (so you may you can make a dictionary where your long hash is the key, and the tail end of the string is your output. Make sense?
In idea one, mean that you can something like this:
tmp_string = ""
for a in range (len(dupe)):
if isOdd(a):
tmp_string = dupe[a][0] + " " + dupe[a][1] + " " + dupe[a][2]
else:
tmp_string += ", " + dupe[a][2]
file_name.write(dupe[a][0] + " " + dupe[a][1] + " " + dupe[a][2] + "\n");
In idea two, you may have something like this:
x=dict()
for a in range(len(dupe)):
# check if the hash exists in x; bad syntax - I dunno "exists?" syntax
if (exists(x[dupe[a][0]])):
x[a] += "," + dupe[a][2]
else:
x[a] = dupe[a][0] + " " + dupe[a][1] + " " + dupe[a][2]
for b in x: # bad syntax: basically, for every key in dictionary x
file_name.write(x[b]);

Related

Append the string in python for forloop

I need to print and append the list of all value in the list of python, which I not able to get it. Can you one help me on this. I am just comparing one system data with another system data and identified the mismatches on each field.
Example, Table as kid_id, name, age, gender, address in 2 different systems. I need to ensure all kids data are correctly moved from 1data to 2data systems.
Emp_id like 1,2,3,4,5,6
2_data = self.get2Data(kid_id)
1_data = self.get1Data(kid_id)
for i in range(len(1data)):
for key, value in 1data[i].items():
if 1data[i][key] == 2data[i][key]:
result = str("LKG") + ","+ str(kid_id) +","+ str("PASS") + "," + str(key)
else:
result = str("LKG") + "," + str(kid_id) + "," + str("FAIL") + "," + str(key)
MatchResult = result.split()
print MatchResult
print "***It is Done*****"
Currently my Output is like,
['LKG,100,PASS,address']
['LKG,102,FAIL,dob']
['LKG,105,FAIL,gender']
but i need in the way of,
(['LKG,100,PASS,address'],['LKG,102,FAIL,dob'],['LKG,105,FAIL,gender'])
or
[('LKG,100,PASS,address'),('LKG,102,FAIL,dob'),('LKG,105,FAIL,gender')]
Code details: The above code will compare the two system data and show the pass and fail cases by printing the above format. If you see the above result, it is print address as pass and dob as fail and gender as fail that means still data mismatch is their for dob and gender field for the kid holding 102 and 105.
Move list variable declaration before loop and initialize it to empty list and then append the results each time.
2_data = self.get2Data(kid_id)
1_data = self.get1Data(kid_id)
MatchResult=[]
for i in range(len(1data)):
for key, value in 1data[i].items():
if 1data[i][key] == 2data[i][key]:
result = str("LKG") + ","+ str(kid_id) +","+ str("PASS") + "," + str(key)
else:
result = str("LKG") + "," + str(kid_id) + "," + str("FAIL") + "," + str(key)
MatchResult.append( result.split())
print MatchResult
print "***It is Done*****"

Loop not display all lines when adding a sum function

I am reading an external txt file and displaying all the lines which has field 6 as Y. I then need to count these lines. However, when I add the sum function it will only print 1 of the lines, if I remove this sum function all the lines display as expected. I am presuming its something to do with the for loop but can't seem figure out how to get all lines to display and keep my sum. Can anyone help me spot where this is going wrong?
noLines = 0
fileOpen = open ("file.txt","r")
print ("Name: " + "\tDate: " + "\tAge: " + "\tColour: " + "\tPet")
for line in fileOpen:
line = line[:-1]
field = line.split(',')
if field[6] == "Y":
print()
print (field[0] +"\t\t" + field[1] + "\t" + field[2] + "\t\t" + field[3] + "\t\t" + field[4])
noLines = sum(1 for line in fileOpen)
print ()
print(noLines)
You are using sum incorrectly. In order to achieve the desired result, you may replace your current code with sum as:
noLines = sum(1 for l in open("file.txt","r") if l[:-1].split(',')[6]=='Y')
Issue with current code: Because fileOpen is a generator. You are exhausting it completely within sum and hence your next for iteration is not happening. Instead of using sum, you may initialize noLines before the for loop as:
noLines = 0
for line in fileOpen:
# your stuff ...
And instead of sum, do:
noLines += 1
# instead of: noLines = sum(1 for line in fileOpen)

Python: Calculating difference of values in a nested list by using a While Loop

I have a list that is composed of nested lists, each nested list contains two values - a float value (file creation date), and a string (a name of the file).
For example:
n_List = [[201609070736L, 'GOPR5478.MP4'], [201609070753L, 'GP015478.MP4'],[201609070811L, 'GP025478.MP4']]
The nested list is already sorted in order of ascending values (creation dates). I am trying to use a While loop to calculate the difference between each sequential float value.
For Example: 201609070753 - 201609070736 = 17
The goal is to use the time difference values as the basis for grouping the files.
The problem I am having is that when the count reaches the last value for len(n_List) it throws an IndexError because count+1 is out of range.
IndexError: list index out of range
I can't figure out how to work around this error. no matter what i try the count is always of range when it reaches the last value in the list.
Here is the While loop I've been using.
count = 0
while count <= len(n_List):
full_path = source_folder + "/" + n_List[count][1]
time_dif = n_List[count+1][0] - n_List[count][0]
if time_dif < 100:
f_List.write(full_path + "\n")
count = count + 1
else:
f_List.write(full_path + "\n")
f_List.close()
f_List = open(source_folder + 'GoPro' + '_' + str(count) + '.txt', 'w')
f_List.write(full_path + "\n")
count = count + 1
PS. The only work around I can think of is to assume that the last value will always be appended to the final group of files. so, when the count reaches len(n_List - 1), I skip the time dif calculation, and just automatically add that final value to the last group. While this will probably work most of the time, I can see edge cases where the final value in the list may need to go in a separate group.
I think using zip could be easier to get difference.
res1,res2 = [],[]
for i,j in zip(n_List,n_List[1:]):
target = res1 if j[0]-i[0] < 100 else res2
target.append(i[1])
n_list(len(n_list)) will always return an index out of range error
while count < len(n_List):
should be enough because you are starting count at 0, not 1.
FYI, here is the solution I used, thanks to #galaxyman for the help.
I handled the issue of the last value in the nested list, by simply
adding that value after the loop completes. Don't know if that's the most
elegant way to do it, but it works.
(note: i'm only posting the function related to the zip method suggested in the previous posts).
def list_zip(get_gp_list):
ffmpeg_list = open(output_path + '\\' + gp_List[0][1][0:8] + '.txt', 'a')
for a,b in zip(gp_List,gp_List[1:]):
full_path = gopro_folder + '\\' + a[1]
time_dif = b[0]-a[0]
if time_dif < 100:
ffmpeg_list.write("file " + full_path + "\n")
else:
ffmpeg_list.write("file " + full_path + "\n")
ffmpeg_list.close()
ffmpeg_list = open(output_path + '\\' + b[1][0:8] + '.txt', 'a')
last_val = gp_List[-1][1]
ffmpeg_list.write("file " + gopro_folder + '\\' + last_val + "\n")
ffmpeg_list.close()

Write a Python formatted generator

To generate a Tecplot file I use:
import numpy as np
x, y = np.genfromtxt('./files.dat', unpack=True)
nb_value = x.size
x_splitted = np.split(x, nb_value // 1000 + 1)
y_splitted = np.split(y, nb_value // 1000 + 1)
with open('./test.dat', 'w') as f:
f.write('TITLE = \" YOUPI \" \n')
f.write('VARIABLES = \"x\" \"Y\" \n')
f.write('ZONE T = \"zone1 \" , I=' + str(nb_value) + ', F=BLOCK \n')
for idx in range(len(x_splitted)):
string_list = ["%.7E" % val for val in x_splitted[idx]]
f.write('\t'.join(string_list)+'\n')
for idx in range(len(y_splitted)):
string_list = ["%.7E" % val for val in y_splitted[idx]]
f.write('\t'.join(string_list)+'\n')
Here is an example of file.dat:
-6.491083147394967334e-02 6.917197804459292456e+02
-6.489978349202699115e-02 6.871829941905543819e+02
-6.481115367048655151e-02 6.707292800160890920e+02
-6.479991205404790622e-02 6.756112033303363660e+02
-6.471117816968344205e-02 7.666798999627604871e+02
-6.469995628177811764e-02 7.819675271405360490e+02
This code is working but I have seen that I should use .format() instead of %. This is running: string_list = ["{}".format(list(val for val in y_splitted[idx]))] but won't work with Tecplot because we need 7E.
If I try: string_list = ["{.7E}".format(list(val for val in y_splitted[idx]))] it doesn't work at all. I got: AttributeError: 'list' object has no attribute '7E'
What would be the best way to do what I am trying to do?
Formatting specifiers come after a : colon:
["{:.7E}".format(val) for val in y_splitted[idx]]
Note that I had to adjust your list comprehension syntax as well; you only want to apply each val to str.format(), not the whole loop. In essence, you only needed to replace the "%.7E" % val part here.
See the Format String Syntax documentation:
replacement_field ::= "{" [field_name] ["!" conversion] [":" format_spec] "}"
Demo:
>>> ["%.7E" % val for val in (2.8, 4.2e5)]
['2.8000000E+00', '4.2000000E+05']
>>> ["{:.7E}".format(val) for val in (2.8, 4.2e5)]
['2.8000000E+00', '4.2000000E+05']
Not that you really need to use str.format() since there is there are no other parts to the string; if all you have is "{:<formatspec>}", just use the format() function and pass in the <formatspec> as the second argument:
[format(val, ".7E") for val in y_splitted[idx]]
Note that in Python, you generally don't loop over a range() then use the index to get a list value. Just loop over the list directly:
for xsplit in x_splitted:
string_list = [format(val, ".7E") for val in xsplit]
f.write('\t'.join(string_list) + '\n')
for ysplit in y_splitted:
string_list = [format(val, ".7E") for val in ysplit]
f.write('\t'.join(string_list)+'\n')
You also don't have to escape the " characters in your strings; you only need to do that when the string delimiters are also " characters; you are using ' instead. You can use str.format() to insert the nb_value there too:
f.write('TITLE = " YOUPI " \n')
f.write('VARIABLES = "x" "Y" \n')
f.write('ZONE T = "zone1 " , I={}, F=BLOCK \n'.format(nb_value))

Organize dictionary by frequency

I create a dictionary for the most used words and get the top ten. I need to sort this for the list, which should be in order. I can't do that without making a list, which I can't use. Here is my code. I am away dictionaries cannot be sorted, but i still need help.
most_used_words = Counter()
zewDict = Counter(most_used_words).most_common(10)
newDict = dict(zewDict)
keys = newDict.keys()
values = newDict.values()
msg = ('Here is your breakdown of your most used words: \n\n'
'Word | Times Used'
'\n:--:|:--:'
'\n' + str(keys[0]).capitalize() + '|' + str(values[0]) +
'\n' + str(keys[1]).capitalize() + '|' + str(values[1]) +
'\n' + str(keys[2]).capitalize() + '|' + str(values[2]) +
'\n' + str(keys[3]).capitalize() + '|' + str(values[3]) +
'\n' + str(keys[4]).capitalize() + '|' + str(values[4]) +
'\n' + str(keys[5]).capitalize() + '|' + str(values[5]) +
'\n' + str(keys[6]).capitalize() + '|' + str(values[6]) +
'\n' + str(keys[7]).capitalize() + '|' + str(values[7]) +
'\n' + str(keys[8]).capitalize() + '|' + str(values[8]) +
'\n' + str(keys[9]).capitalize() + '|' + str(values[9]))
r.send_message(user, 'Most Used Words', msg)
How would I do it so the msg prints the words in order from most used word on the top to least on the bottom with the correct values for the word?
Edit: I know dictionaries cannot be sorted on their own, so can I work around this somehow?
Once you have the values it's as simple as:
print('Word | Times Used')
for e, t in collections.Counter(values).most_common(10):
print("%s|%d" % (e,t))
Print something like:
Word | Times Used
e|4
d|3
a|2
c|2
From the Docs: most_common([n])
Return a list of the n most common elements and their counts from the
most common to the least. If n is not specified, most_common() returns
all elements in the counter. Elements with equal counts are ordered
arbitrarily:
>>> Counter('abracadabra').most_common(3)
[('a', 5), ('r', 2), ('b', 2)]
Your code can be:
from collections import Counter
c = Counter(most_used_words)
msg = "Here is your breakdown of your most used words:\n\nWords | Times Used\n:--:|:--:\n"
msg += '\n'.join('%s|%s' % (k.capitalize(), v) for (k, v) in c.most_common(10))
r.send_message(user, 'Most Used Words', msg)
import operator
newDict = dict(zewDict)
sorted_newDict = sorted(newDict.iteritems(), key=operator.itemgetter(1))
msg = ''
for key, value in sorted_newDict:
msg.append('\n' + str(key).capitalize() + '|' + str(value))
This will sort by the dictionary values. If you want it in the other order add reverse=True to sorted().

Categories