txt file parsing in python, repetition of a line

txt file parsing in python, repetition of a line - python

I have the following code:
for line in contents:
line_fields = line.strip().split()
f2.write("ID: " + line_fields[0] + '\n')
f2.write("Name:" + line_fields[1] + '\n')
What I am trying to do is write ID only once for each number contained in line_fields[0]. So, it should look like the left instead of the right column:
ID: 1 ID: 1
Name1 Name1
Name1 ID: 1
ID: 2 Name1
Name2 ID: 2
Name2 Name2
...
I am actually quite confused and would be very grateful to get some advice from you

If you dont want to deal with dicts you can do this:
current = ""
for line in contents:
line_fields = line.strip().split()
if current != line_fields[0]:
f2.write("ID: " + line_fields[0] + '\n')
current = line_fields[0]
f2.write("Name:" + line_fields[1] + '\n')
It will only write ID if it is different from the previous one.

Thanks for asking the question.
You can use dictionary as check mechanism to view the repeated values.
As pointed out by author
d = {}
for line in contents:
line_fields = line.strip().split()
if line_fields[0] not in d.keys():
f2.write("ID: " + line_fields[0] + '\n')
f2.write("Name:" + line_fields[0] + '\n')
d[line_fields[0]] = line_fields[1]

Related

Python replace str in list with new value

I’m writing a program that makes music albums into files that you can search for, and for that i need a str in the file that have a specific value that is made after the list is complete. Can you go back in that list and change a blank str with a new value?
I have searched online and found something called words.replace, but it doesn’t work, i get a Attribute error.
def create_album():
global idnumber, current_information
file_information = []
if current_information[0] != 'N/A':
save()
file_information.append(idnumber)
idnumber += 1
print('Type c at any point to abort creation')
for i in creation_list:
value = input('\t' + i)
if value.upper == 'C':
menu()
else:
-1file_information.append('')
file_information.append(value)
file_information.append('Album created - ' + file_information[2] +'\nSongs:')
-2file_information = [w.replace(file_information[1], str(file_information[0]) + '-' + file_information[2]) for w in file_information]
current_information = file_information
save_name = open(save_path + str(file_information[0]) + '-' + str(file_information[2]) + '.txt', 'w')
for i in file_information:
save_name.write(str(i) + '\n')
current_files_ = open(information_file + 'files.txt', 'w')
filenames.append(file_information[0])
for i in filenames:
current_files_.write(str(i) + '\n')
id_file = open(information_file + 'albumid.txt', 'w')
id_file.write(str(idnumber))
-1 is where i have put aside a blank row
-2 is the where i try to replace row 1 in the list with the value of row 0 and row 2.
The error message I receive is ‘int’ object has no attribute ‘replace’

Did you try this?
-2file_information = [w.replace(str(file_information[1]), str(file_information[0]) + '-' + file_information[2]) for w in file_information]

Python nested loop not working as intended

I'm working on an assignment for school where I have a text file: data.txt which looks like this:(instead of 'name' there are actual names I just replaced them here)
10001-31021 'name' 2015.12.30. 524432
10001-31121 'name' 2016.03.21. 765432
10012-34321 'name' 2016.02.20. 231231
10201-11021 'name' 2016.01.10. 2310456
And I have an update.txt which looks like this:
2016.03.22.
10001-31021 'name' +20000
10012-34321 'name' +35432
10012-34321 'name' -10000
10120-00123 'name' +120334
10001-31021 'name' +5000
10210-41011 'name' -6000
10201-11021 'name' +100210
12345-32100 'name' +123456
And I have to make a newdata.txt file according to the changes to the last column that update.txt includes.
This is my code so far:
adat = open("data.txt", "r")
newdata = open("newdata.txt", "w")
update = open("update", "r")
date = update.readline().decode("utf-8-sig").encode("utf-8").splitlines()
num_lines = sum(1 for line in open('update'))
elsociklus = 0
masodikciklus = 0
for num_lines in update:
updateData = re.search("(.{11}\t)(\D+\t)([+-]\d+)", num_lines)
elsociklus = elsociklus + 1
print("elsociklus: " + str(elsociklus))
for j in adat:
data = re.search("(.{11}\t)(\D+\t)(\d{4}\.\d{2}\.\d{2}\.\t)(\d+)", j)
masodikciklus = masodikciklus + 1
print("masodikciklus: " + str(masodikciklus))
if data != None:
if updateData.group(1) == data.group(1):
print("regi: " + data.group(0))
print("update: " + updateData.group(0))
print("uj: " + data.group(1) + data.group(2) + date[0] + "\t" + str(int(data.group(4)) + int(updateData.group(3))))
newdata.write(data.group(1) + data.group(2) + date[0] + "\t" + str(int(data.group(4)) + int(updateData.group(3))))
newdata.write("\n")
else:
print("nincs valtozas: " + data.group(0))
adat.close()
newdata.close()
update.close()
My problem is with the nested loop. I just can't figure it out why it isn't entering the inner loop for the second time. It works perfectly on the first iteration but when entering the 2nd one in the outer loop it just ignores the inner loop.
Thank you in advance for your help.

Thanks to codingCat for the answer. I fixed the problem by returning my file pointer to the beginning of my file in the inner loop

How can I improve this messy function or its reiteration?

Very long and annoying question, ill try to do my best to explain it.
I have a file containing data such as
Sarah;Brown;s.brown#gmail.com;0715123451;1;24;0;0
Joe;Blogg;j.bloggs#gmail.com;0749814574;1;60;0;0
Andrew;Smith;a.smith#gmail.com;0718451658;1;45;0;0
Ray;Charles;r.charles#gmail.com;0715451589;1;40;0;0
Kevin;White;k.white#gmail.com;0749858748;1;20;0;0
Samantha;Collins;s.collins#gmail.com;0715243568;1;10;0;0
Frank;Jones;f.jones#gmail.com;0719487516;2;10;0;0
Liam;Blair;l.blair#gmail.com;0729857614;2;4;0;0
Pat;Phillips;p.phillips#gmail.com;071574216;2;17;0;0
John;Brown;j.brown#gmail.com;0798452648;2;11;0;0
Peter;Bond;p.bond#gmail.com;0798415758;6;4;0;0
Edward;Costello;e.costello#gmail.com;0712474588;2;45;0;0
Iain;Wilkins;i.wilkins#gmail.com;0715497211;2;23;0;0
Time;Pratchett;t.pratchett#gmail.com;0784975135;3;48;0;0
Eleanor;House;e.house#gmail.com;0799871542;3;9;0;0
Gergory;Davies;g.davies#gmail.com;0719475847;3;22;0;0
Tina;Turner;t.turner#gmail.com;0749857123;3;17;0;0
Sally;Stevens;s.stevens#gmail.com;077154198;3;30;0;0
John;Lennon;j.lennon#gmail.com;0704910874;3;29;0;0
The first element is the name, second is surname, third is email address, 4rth is phone number, 5th is division number, 6th is points scored, the remaining two don't matter for this issue.
What I have been asked to do is to check the top two scorers (6ths element) and the bottom two scorers in a division (5th element). When they have been identified, the top two should be promoted (go up a division), the bottom 2 should be demoted (go down a division). I have written this piece of crap:
def rollDivision2():
lst = [line.strip().split(';') for line in open('players.txt','r').readlines()] # creates nested list
totalPoints = []
for i in range(len(lst)):
if lst[i][4] == "2": # checking divisions this is why i need 6 different function
totalPoints.append(int(lst[i][5])) # creates lists from all scores for the division chosen
#----------------------------------------------- figuring out best two scores and writing
maxPoints = max(totalPoints)
for person in lst:
if person[4] == "2" and person[5] == str(maxPoints): #this is why i need 6 different function
biggest = person # creating variable with name of person that has the biggest score
biggestStr = biggest[0] + ";" + biggest[1] + ";" + biggest[2] + ";" + biggest[3] + ";" + biggest[4] + ";" + biggest[5] + ";" + biggest[6] + ";" + biggest[7] + "\n" #puts that lists into a string
break #personWithMostPoints is the whole line of player with most points
secondMaxPoints = secondLargest(totalPoints) #this is why i need 6 different function
for person in lst:
if person[4] == "2" and person[5] == str(secondMaxPoints): #checking for most points in the division
secondBiggest = person
secondBiggestStr = secondBiggest[0] + ";" + secondBiggest[1] + ";" + secondBiggest[2] + ";" + secondBiggest[3] + ";" + secondBiggest[4] + ";" + secondBiggest[5] + ";" + secondBiggest[6] + ";" + secondBiggest[7] + "\n"
break
lineToWriteBiggest = biggest[0] + ";" + biggest[1] + ";" + biggest[2] + ";" + biggest[3] + ";" + "1" + ";" + biggest[5] + ";" + biggest[6] + ";" + biggest[7] + "\n"
lineToWriteSecondBiggest = secondBiggest[0] + ";" + secondBiggest[1] + ";" + secondBiggest[2] + ";" + secondBiggest[3] + ";" + "1" + ";" + secondBiggest[5] + ";" + secondBiggest[6] + ";" + secondBiggest[7] + "\n"
#----------------------------------------------- figuring out best two scores
#----------------------------------------------- figuring out worst two scores
minPoints = min(totalPoints)
for person in lst:
if person[4] == "2" and person[5] == str(minPoints): #this is why i need 6 different function
least = person
leastStr = least[0] + ";" + least[1] + ";" + least[2] + ";" + least[3] + ";" + least[4] + ";" + least[5] + ";" + least[6] + ";" + least[7] + "\n"
break #personWithMostPoints is the whole line of player with most points
secondLeastPoints = secondSmallest(totalPoints) #method defined in utility functions
for person in lst:
if person[4] == "2" and person[5] == str(secondLeastPoints): #this is why i need 6 different function
secondLeast = person
secondLeastStr = secondLeast[0] + ";" + secondLeast[1] + ";" + secondLeast[2] + ";" + secondLeast[3] + ";" + secondLeast[4] + ";" + secondLeast[5] + ";" + secondLeast[6] + ";" + secondLeast[7] + "\n"
break
lineToWriteLeast = least[0] + ";" + least[1] + ";" + least[2] + ";" + least[3] + ";" + "3" + ";" + least[5] + ";" + least[6] + ";" + least[7] + "\n"
lineToWriteSecondLeast = secondLeast[0] + ";" + secondLeast[1] + ";" + secondLeast[2] + ";" + secondLeast[3] + ";" + "3" + ";" + secondLeast[5] + ";" + secondLeast[6] + ";" + secondLeast[7] + "\n"
f = open("players.txt","a")
f.write(lineToWriteBiggest)
f.write(lineToWriteSecondBiggest)
f.write(lineToWriteLeast)
f.write(lineToWriteSecondLeast)
f.close()
f = open("players.txt",'r') # Input file
t = open("temp.txt", 'w') #Temp output file
for line in f:
if line != biggestStr and line != secondBiggestStr and line != leastStr and line != secondLeastStr and line != "\n":
t.write(line) #writes all lines apart from the original line (one that needs to be deleted)
f.close()
t.close()
os.remove("players.txt") #deletes players
os.rename('temp.txt', 'players.txt') #new file with modified info is renamed to players
It is very ugly and impractical, moreover, I have to have 6 of these functions (as there are 6 divisions) which makes the program ridiculously overweight.
If anyone could help me, whether it be how I could use this function just once to check all 6 division instead of having to write 6 individual ones.
I'm sorry you had to see this, I just don't know what else to do at this point.
Any help would be very appreciated.
Many thanks

lines = list(open("some.txt"))
for division,items in itertools.groupby(lines,lambda line:int(line.split(";")[4])):
sorted_people = sorted(items,key=lambda item:int(item.split(";")[-1])
print "DIVISION:",division
print "TOP TWO:",sorted_people[-2:]
print "BOTTOM TWO:",sorted_people[:2]

Check this out -
def convert(arr):
return [int(x) if x.isnumeric() else x for x in arr]
def find_top_two(division):
cur_div = [x for x in arr if x[4] == division]
cur_div.sort(key = lambda T : T[5], reverse = True)
return cur_div[:2]
def find_bottom_two(division):
cur_div = [x for x in arr if x[4] == division]
cur_div.sort(key = lambda T : T[5])
return cur_div[:2]
with open('players.txt', 'r') as f:
arr = f.readlines()
arr = [convert(s.strip('\n').split(';')) for s in arr]
print('Top 2 of division 2 :\n', find_top_two(2))
print('Bottom 2 of division 2 :\n', find_bottom_two(2))
Output -
Top 2 of division 2 :
[['Edward', 'Costello', 'e.costello#gmail.com', 712474588, 2, 45, 0, 0], ['Iain', 'Wilkins', 'i.wilkins#gmail.com', 715497211, 2, 23, 0, 0]]
Bottom 2 of division 2 :
[['Liam', 'Blair', 'l.blair#gmail.com', 729857614, 2, 4, 0, 0], ['Frank', 'Jones', 'f.jones#gmail.com', 719487516, 2, 10, 0, 0]]

Not the best and most efficient code, but this should work. I'll leave you to fill in the blanks.
def getPlayersOfDivision(playerInfo, division):
result = []
for player in playerInfo:
if int(player[4]) == division:
result.append(player)
def findPlayer(playerInfo, firstName, lastName):
for i,player in enuemrate(playerInfo):
if player[0] == firstName and player[1] = lastName:
return i
def get2BestPlayers(playerInfo):
pass
def get2WorstPlayers(playerInfo):
pass
def changePlayerDivision(playerInfo, pos, upOrDown):
pass
with open('players.txt', 'r') as f:
playerInfo = f.readlines()
playerInfo = [line.strip().split(';') for line in playerInfo]
numDivisions = ???
for i in range(numDivisions):
divisionPlayers = getPlayersOfDivision(playerInfo, i)
bestPlayers = get2BestPlayers(divisionPlayers)
worstPlayers = get2WorstPlayers(divisionPlayers)
newPlayerInfo = changePlayerDivision(playerInfo, findPlayer(playerInfo, divisionPlayer[bestPlayers[0]][0], divisionPlayer[bestPlayers[0]][1]), +1)
newPlayerInfo = changePlayerDivision(playerInfo, findPlayer(playerInfo, divisionPlayer[bestPlayers[1]][0], divisionPlayer[bestPlayers[1]][1]), +1)
newPlayerInfo = changePlayerDivision(playerInfo, findPlayer(playerInfo, divisionPlayer[worstPlayers[0]][0], divisionPlayer[worstPlayers[0]][1]), -1)
newPlayerInfo = changePlayerDivision(playerInfo, findPlayer(playerInfo, divisionPlayer[worstPlayers[1]][0], divisionPlayer[worstPlayers[1]][1]), -1)

remove similar lines in text file

I am not using Python but I have script in python:
part of script
elif line.find("CONECT") > -1:
con = line.split()
line_value = line_value + 1
#print line_value
#print con[2]
try:
line_j = "e" + ', ' + str(line_value) + ', ' + con[2] + "\n"
output_file.write(line_j)
print(line_j)
line_i = "e" + ', ' + str(line_value) + ', ' + con[3] + "\n"
output_file.write(line_i)
print(line_i)
line_k = "e"+ ', ' + str(line_value) + ', ' + con[4] + "\n"
print(line_k)
output_file.write(line_k)
except IndexError:
continue
which give .txt output in format
e, 1, 2
e, 1, 3
e, 1, 4
e, 2, 1
e, 2, 3
etc.
I need remove similar lines with the same numbers, but no matter on order this numbers
i.e. line e, 2, 1..
Is it possible?

Of course, it is better to modify your code to remove that lines BEFORE you're writing them to file. You can use a list to store already saved values, and on each itereation, perfom a search if the values you're want to add is already exists in that list. The code below isn't tested and optimized, but it explains an idea:
# 'added = []' should be placed somewhere before 'if'
added = []
# you part of code
elif line.find("CONECT") > -1:
con = line.split()
line_value = line_value + 1
try:
line_j = "e, %s, %s\n" % (str(line_value),con[2])
tmp = sorted((str(line_value),con[2]))
if tmp not in added:
added.append(tmp)
output_file.write(line_j)
print(line_j)
line_i = "e, %s, %s\n" % (str(line_value),con[3])
tmp = sorted((str(line_value),con[3]))
if tmp not in added:
added.append(tmp)
output_file.write(line_i)
print(line_i)
line_k = "e, %s, %s\n" % (str(line_value),con[4])
tmp = sorted((str(line_value),con[4]))
if tmp not in added:
added.append(tmp)
print(line_k)
output_file.write(line_k)
except IndexError:
continue

Here is a comparison method for two lines of your file:
def compare(line1, line2):
els1 = line1.strip().split(', ')
els2 = line2.strip().split(', ')
return Counter(els1) == Counter(els2)
See the documentation for the Counter class.
If the count of elements doesn't matter you can replace the Counter class with set instead

The following approach should work. First add the following line further up in your code:
seen = set()
Then replace everything inside the try with the following code:
for con_value in con[2:5]:
entry = frozenset((line_value, con_value))
if entry not in seen:
seen.append(entry)
line_j = "e" + ', ' + str(line_value) + ', ' + con_value + "\n"
output_file.write(line_j)
print(line_j)
Make sure this code is indented to the same level as the code it replaces.

Working with dictionaries

I have dictionary that takes data from a file and puts it in list. I want to make a search engine that when I type name or quantity or price of a component it will find all with that name and print info that it holds (price, quantity, category).
Input
I just can't make my script read info from lines in the file. The file's text looks like:
AMD A4-3300 2.5GHz 2-Core Fusion APU Box|5.179,00 din|58|opis|Procesor
AMD Athlon II X2 340 3.2GHz Box|4.299,00 din|8|opis|Procesor
INTEL Celeron G465 1.9GHz Box|3.339,00 din|46|opis|Procesor
INTEL Celeron Dual Core G550 2.6GHz Box|1.439,00 din|13|opis|Procesor
Output
Here is my code which should be a search engine for my components, I just don't know how I can take form list data and target that data full info for example I type key word like AMD and seach engine print all AMD components that have AMD in their name or price I put price range and I got all prices in that range. I tried some things but it wont work.Sorry for long time to respond.I translated my code, there may be some lines left out but I hope you get the picture.
def option_p_components():
option = 0
#component = []
components = []
while option == 0 :
option_comp = option_p_components_str()
option_k = int(raw_input("Chose option : ")
print "" \
""
if option_k != 1 and option_k != 2 :
error = "!!!Error!!!"
error_p = " you typed wrong command please try again ."
print "-" * 80
print error.center(80)
print error_p.center(80)
print "-" * 80
option = 0
if option_k == 1 :
option_p_d = 0
print "Components search "
print"-" * 80
cu = temp_comp(components)
print cu
print "X)Working with components(editing, deleting )"
print"-" * 80
print "1)Change components "
print "2)Editing components"
print "3)Delating componetns"
print "4)Components search "
print "5)Back"
print"-" * 80
option_p_d = int(raw_input("Chose option :"))
if Option_p_d == 2 :
option_d = 0
for I in range(5):
u_component_name = raw_input("Unesite naziv komponente :")
u_component_price= raw_input("Unestie cenu komponente:")
u_component_quantity = raw_input("Unesite kolicinu komponente :")
u_component_opis = raw_input("Unesite opis komponente :")
u_component_category = raw_input("Unesite kategoriju komponente:")
component = {"name_compo":u_komponenta_ime,
"price":u_komponenta_cena,
"quantity":u_komponenta_kolicina,
"opis":u_komponenta_opis,
"category":u_komponenta_kategorija}
upis_komponente = saving_components(component)
components.append(saving_components)
print"-" * 80
print "1)New component"
print "2)Back"
print"-" * 80
option_d = int(raw_input("Odaberite opciju :"))
if option_d == 1 :
option_k = 0
elif option_d == 2 :
option_p_komponenti()
elif option_k == 2 :
print "Back"
def saving_components(component):
final_komponenta = component["name_compo"] + "|" + component["price"] + "|" + componenta["quantity"] + "|"\
+ component["opis"] + "|" + component["category"]
file = open("Data/component.txt", "a")
file.write(final_component)
file.close
def reading_component(component):
file = open("Data/component.txt", "r")
for line in file :
name_comp, price, quantity, opis, category = line.split("|")
komponenta = {"name_compo": name_comp,
"price": price,
"quantity": quantity,
"opis" : opis,
"category": category}
# ovo izvlaci samo pojedinacne vrednosti iz recnika
compon_info = "Name: " + component["name_compo"] + "\n" + "price: " + component["Price"]+"\n" +\
"Quantity:" + component["quantity"] + "\n" + "Opis: " + komponenta["opis"] + \
"\n" + "category: " + component["category"] + "\n"
#print compon_info
component.append(component)
#print sortiranje(kompon_info)
#print sorted([compon_info])
#print compon_info.sort()
#Vrti koliko ima u fajlu for ...a to je 7
file.close()
return component
def temp_comp(components):
pretraga_po_opisu(komponente)
def pretraga_po_opisu(komponente):
kolicina = str(raw_input("Unesite kolicinu:"))
for komponenta in komponente:
if komponenta["kolicina"] == kolicina:
print komponenta["kolicina"]
return None
def pera(komponente, cena):
ulaz = input("Unesi")
list = komponente.pera("cena",cena)

All you need is csv.DictReader() together with a sequence of key names for each column:
with open(inputfilename, 'rb') as fileobj:
reader = csv.DictReader(fileobj,
('name_compon', 'price', 'quantity', 'something_else', 'category'),
delimiter='|')
for row in reader:
print row
where row is the dictionary you wanted.

If you want to look into using zip, you could always use it here:
component_dicts = []
components = ("name_compon", "price", "quanity", "category")
with open('/path/to/data') as f:
for line in f.readlines():
components_dicts.append(dict(zip(components, line.split("|")[:4])))
#slicing the first four elements because you never say which 4 out of 5 you wanted.
for c in components_dict:
print c
Here the line.split("|") method is creating a list of str's, dividing the string being read wherever the "|" character is found.
Then zip will return a list of tuples which you then feed into a dict:
# This is what it would look like after you zip the components tuple and the line.split("|") data
[(name_compon, 'AMD A4-3300 2.5GHz 2-Core Fusion APU Box'), (price, '5.179,00 din'), (quanity, 58), (type, opis)]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

txt file parsing in python, repetition of a line - python

Related

Python replace str in list with new value

Python nested loop not working as intended

How can I improve this messy function or its reiteration?

remove similar lines in text file

Working with dictionaries

Categories

Resources