Write map to a csv in python

Write map to a csv in python - python

I am not sure if the title of this is right. I know it is not a list and I am trying to take the results into a dictionary, but it is only adding the last value of my loop.
So I have pasted all my code but I have a question specifically on my candidates loop, where I am trying to get the percentages of votes per candidate. When I print the information it looks like this:
enter image description here
As you can see the 3rd session of the results is showing the candidates and next to them the percentage and the total votes. This results is what I am not sure what is (not a list not a dictionary)
I am trying to write this in my output csv, however after so many ways I always get to write only the last result which is O'Tooley.
I am new at this, so I am not sure first, why even if I save my percentage in a list after each loop, I am still saving only the percentage of O'Tooley. That's why I decided to print after each loop. That was my only way to make sure all the results look as in the picture.
import os
import csv
electiondatapath = os.path.join('../..','gt-atl-data-pt-03-2020-u-c', '03-Python', 'Homework', 'PyPoll', 'Resources', 'election_data.csv')
with open (electiondatapath) as csvelectionfile:
csvreader = csv.reader(csvelectionfile, delimiter=',')
# Read the header row first
csv_header = next(csvelectionfile)
#hold number of rows which will be the total votes
num_rows = 0
#total votes per candidate
totalvotesDic = {}
#list to zip and write to csv
results = []
for row in csvreader:
#total number of votes cast
num_rows += 1
# Check if candidate in the dictionary keys, if is not then add the candidate to the dictionary and count it as one, else sum 1 to the votes
if row[2] not in totalvotesDic.keys():
totalvotesDic[row[2]] = 1
else:
totalvotesDic[row[2]] += 1
print("Election Results")
print("-----------------------")
print(f"Total Votes: {(num_rows)}")
print("-----------------------")
#get the percentage of votes and print result next to candidate and total votes
for candidates in totalvotesDic.keys():
#totalvotesDic[candidates].append("{:.2%}".format(totalvotesDic[candidates] / num_rows))
candidates_info = candidates, "{:.2%}".format(totalvotesDic[candidates] / num_rows), "(", totalvotesDic[candidates], ")"
print(candidates, "{:.2%}".format(totalvotesDic[candidates] / num_rows), "(", totalvotesDic[candidates], ")")
#get the winner out of the candidates
winner = max(totalvotesDic, key=totalvotesDic.get)
print("-----------------------")
print(f"Winner: {(winner)}")
print("-----------------------")
#append to the list to zip
results.append("Election Results")
results.append(f"Total Votes: {(num_rows)}")
results.append(candidates_info)
results.append(f"Winner: {(winner)}")
# zip list together
cleaned_csv = zip(results)
# Set variable for output file
output_file = os.path.join("output_Pypoll.csv")
# Open the output file
with open(output_file, "w") as datafile:
writer = csv.writer(datafile)
# Write in zipped rows
writer.writerows(cleaned_csv)

In each iteration, you created a variable named candidates_info for just one candidate. You need to concatenate strings like this example:
candidates_info = ""
for candidates in totalvotesDic.keys():
candidates_info = '\n'.join([candidates_info, candidates + "{:.2%}".format(totalvotesDic[candidates] / num_rows) + "("+ str(totalvotesDic[candidates])+ ")"])
print(candidates_info)
# prints
# O'Tooley 0.00%(2)
# Someone 30.00%(..)
Also, you don't need keys(). Try this instead:
candidates_info = ""
for candidates, votes in totalvotesDic.items():
candidates_info = '\n'.join([candidates_info, str(candidates) + "{:.2%}".format(votes / num_rows) + "("+ str(votes)+ ")"])

Related

How to print a certain cell under a specific condition

import csv
from statistics import mean
def analyze(entries):
print(f'first entry: {entries[0]}')
with open("reddit_vm.csv", "r", encoding='UTF-8', errors="ignore") as input:
entries = [(e['id'], int(e['score']), int(e['comms_num']), e['title']) for e in csv.DictReader(input)]
avgScore = analyze(entries)
# highest score / title for that post
highScore = max(score) # Find the highest score
print("\nHighest score:")
print(highScore)
title = []
title = [i[3] for i in entries]
print("\nTitle for the Highest Scores:")
selected_column = [title, score]
highTitle = title(score == highScore)
print(highTitle)
This is some portion of my code in Python.
I need to find the title for the highest scores, and print it.
it is 'I would rage if this was handed to me...'.
So the expected output should be like this:
Title for the Highest Scores:
I would rage if this was handed to me...
This is the excel file: https://drive.google.com/file/d/1glhNHkzKwHVqwuWbS8ajiNJR3H94Xw4W/view?usp=sharing

I've downloaded your csv and corrected your code
I've written comments for you to understand your errors
import csv
from statistics import mean
def analyze(entries):
print(f'first entry: {entries[0]}')
with open("reddit_vm.csv", "r", encoding='UTF-8', errors="ignore") as input:
entries = [(e['id'], int(e['score']), int(e['comms_num']), e['title']) for e in csv.DictReader(input)]
avgScore = analyze(entries)
# highest score / title for that post
# highScore = max(score) # Find the highest score
#first error: what 'score'? what is this? you need to assign a variable first before reference it.
highScore = max([i[1] for i in entries])
print(f'highest score: {highScore}')
# title = []
# what for do you need this line? it's absolute useless
title = [i[3] for i in entries]
#here you've created list `title` with all titles
# print("\nTitle for the Highest Scores:")
# selected_column = [title, score]
#here you create list with two elements
#firs element title list with all titles - so it's 2D list
#sec element non-existing score variable
# highTitle = title(score == highScore)
# in this line you actually run some `title()` function that doesn't exist
# and place bool of `score==highScore` that would be False if `score` would exist as func's arg
#to get this:
# Title for the Highest Scores:
# I would rage if this was handed to me...
# you'd rather use something like this:
entries.sort(key=lambda x:x[1])
#you sort `entries` in a way when the itep with the highest score will be placed as the last element in list. you can use `entries[-1]` to get it
print('Title for the Highest Scores:')
print(entries[-1][3])
it works as you wanted it to

Is there a foolproof way of matching two similar string sequences?

This is the CSV in question. I got this data from Extra History, which is a video series that talks about many historical topics through many mini-series, such as 'Rome: The Punic Wars', or 'Europe: The First Crusades' (in the CSV). Episodes in these mini-series are numbered #1 up to #6 (though Justinian Theodora has two series, the first numbered #1-#6, the second #7-#12).
I would like to do some statistical analysis on these mini-series, which entails sorting these numbered episodes (e.g. episodes #2-#6) into their appropriate series, i.e. end result look something like this; I can then easily automate sorting into the appropriate python list.
My python code matches the #2-#6 episodes to the #1 episode correctly 99% of the time, with only 1 big error in red, and 1 slight error in yellow (because the first episode of that series is #7, not #1). However, I get the nagging feeling that there is an easier and foolproof way since the strings are well organized, with regular patterns in their names. Is that possible? And can I achieve that with my current code, or should I change and approach it from a different angle?
import csv
eh_csv = '/Users/Work/Desktop/Extra History Playlist Video Data.csv'
with open(eh_csv, newline='', encoding='UTF-8') as f:
reader = csv.reader(f)
data = list(reader)
import re
series_first = []
series_rest = []
singles_music_lies = []
all_episodes = [] #list of name of all episodes
#seperates all videos into 3 non-overlapping list: first episodes of a series,
#the other numbered episodes of the series, and singles/music/lies videos
for video in data:
all_episodes.append(video[0])
#need regex b/c normall string search of #1 also matched (Justinian &) Theodora #10
if len(re.findall('\\b1\\b', video[0])) == 1:
series_first.append(video[0])
elif '#' not in video[0]:
singles_music_lies.append(video[0])
else:
series_rest.append(video[0])
#Dice's Coefficient
#got from here; John Rutledge's answer with NinjaMeTimbers modification
#https://stackoverflow.com/questions/653157/a-better-similarity-ranking-algorithm-for-variable-length-strings
#------------------------------------------------------------------------------------------
def get_bigrams(string):
"""
Take a string and return a list of bigrams.
"""
s = string.lower()
return [s[i:i+2] for i in list(range(len(s) - 1))]
def string_similarity(str1, str2):
"""
Perform bigram comparison between two strings
and return a percentage match in decimal form.
"""
pairs1 = get_bigrams(str1)
pairs2 = get_bigrams(str2)
union = len(pairs1) + len(pairs2)
hit_count = 0
for x in pairs1:
for y in pairs2:
if x == y:
hit_count += 1
pairs2.remove(y)
break
return (2.0 * hit_count) / union
#-------------------------------------------------------------------------------------------
#only take couple words of the episode's names for comparison, b/c the first couple words are 99% of the
#times the name of the series; can't make too short or words like 'the, of' etc will get matched (now or
#in future), or too long because will increase chance of superfluous match; does much better than w/o
#limitting to first few words
def first_three_words(name_string):
#eg ''.join vs ' '.join
first_three = ' '.join(name_string.split()[:5]) #-->'The Haitian Revolution' slightly worse
#first_three = ''.join(name_string.split()[:5]) #--> 'TheHaitianRevolution', slightly better
return first_three
#compared given episode with all first videos, and return a list of comparison scores
def compared_with_first(episode, series_name = series_first):
episode_scores = []
for i in series_name:
x = first_three_words(episode)
y = first_three_words(i)
#comparison_score = round(string_similarity(episode, i),4)
comparison_score = round(string_similarity(x,y),4)
episode_scores.append((comparison_score, i))
return episode_scores
matches = []
#go through video number 2,3,4 etc in a series and compare them with the first episode
#of all series, then get a comparison score
for episode in series_rest:
scores_list = compared_with_first(episode)
similarity_score = 0
most_likely_match = []
#go thru list of comparison scores returned from compared_with_first,
#then append the currentepisode/highest score/first episode to
#most_likely_match; repeat for all non-first episodes
for score in scores_list:
if score[0] > similarity_score:
similarity_score = score[0]
most_likely_match.clear() #MIGHT HAVE BEEN THE CRUCIAL KEY
most_likely_match.append((episode,score))
matches.append(most_likely_match)
final_match = []
for i in matches:
final_match.append((i[0][0], i[0][1][1], i[0][1][0]))
#just to get output in desired presentation
path = '/Users/Work/Desktop/'
with open('EH Sorting Episodes.csv', 'w', newline='',encoding='UTF-8') as csvfile:
csvwriter = csv.writer(csvfile)
for currentRow in final_match:
csvwriter.writerow(currentRow)
#print(currentRow)

Is there any way to put a single result from a CSV into a variable?

I'm making a program in school where users are quizzed on certain topics and their results are saved into a csv file. I've managed to print off the row with the highest score, but this doesn't look very neat.
with open ('reportForFergusTwo.csv', 'r') as highScore:
highScoreFinder=highScore
valid3=False
for row in highScoreFinder:
if subjectInput in row:
if difficultyInput in row:
if ('10' or '9' or '8' or '7' or '6' or '5' or '4' or '3' or '2' or '1') in row:
valid3=True
print("The highest score for this quiz is:",row)
For example: it says, "The highest score for this quiz is: chemistry,easy,10,Luc16" but I would prefer it to say something like "The highest score for this quiz is: 10" and "This score was achieved by: Luc16", rather than just printing the whole row off, with unnecessary details like what the quiz was on.
My CSV file looks like this:
Subject,Difficulty,Score,Username
language,easy,10,Luc16
chemistry,easy,10,Luc16
maths,easy,9,Luc16
chemistry,easy,5,Eri15
chemistry,easy,6,Waf1
chemistry,easy,0,Eri15
I thought that maybe if I could find a way to take the individual results (the score and username) and put them into their own individual variables, then it would be much easier to present it the way I want, and be able to reference them later on in the function if I need them to be displayed again.
I'm just fairly new to coding and curious if this can be done, so I can improve the appearance of my code.
Edit: To solve the issue, I used str.split() to break up the indivudal fields in the rows of my CSV, so that they could be selected and held by a variable. The accepted answer shows the solution I used, but this is my final code in case this wasn't clear
with open ('details.csv', 'r') as stalking:
stalkingReader=csv.reader(stalking)
valid4=False
for column in stalkingReader:
if user in column[3]:
valid4=True
print("Here are the details for user {}... ".format(user))
splitter=row.split(',')
name=splitter[0]
age=splitter[1]
year=splitter[2]
print("Name: {}".format(name))
print("Age: {}".format(age))
print("Year Group: {}".format(year))
postReport()
if valid4==False:
print("Sorry Fergus, this user doesn't seem to be in our records.")

with open("reportForFergusTwo.csv", "r") as highScore:
subject = []
difficulty = []
score = []
name = []
for line in highScore:
subject.append(line.split(',')[0])
difficulty.append(line.split(',')[1])
score.append(line.split(',')[2])
name.append(line.split(',')[3])
ind = score.index(max(score)
print("The highest score for this quiz is: ", max(score))
print("This was achieved by ", name[ind])
with opens (and will close) the .csv file.
Then, four empty lists are created.
Next, I loop through every line in the file, and I split every line using a comma as the delimiter. This produces a list of four elements, which are appended to each list.

You can use str.split() to break up the rows of your CSV so that you can individually reference the fields:
split_row = row.split(',')
score = split_row[2]
user = split_row[3]
print("The highest score for this quiz is: " + score)
print("This score was achieved by: " + user)

You can use csv library
import csv
with open("data", "r") as f:
reader = csv.reader(f)
# skip header
reader.next()
# organize data in 2D array
data = [ [ sub, dif, int(score), name ] for sub, dif, score, name in reader ]
# sort by score
data.sort(key=lambda x: x[2], reverse=True)
# pretty print
print "The highest score for this quiz is:", data[0][2]
print "This score was achieved by:", data[0][3]

(Posted solution on behalf of the OP).
To solve the issue, I used str.split() to break up the indivudal fields in the rows of my CSV, so that they could be selected and held by a variable. The accepted answer shows the solution I used, but this is my final code in case this wasn't clear
with open ('details.csv', 'r') as stalking:
stalkingReader=csv.reader(stalking)
valid4=False
for column in stalkingReader:
if user in column[3]:
valid4=True
print("Here are the details for user {}... ".format(user))
splitter=row.split(',')
name=splitter[0]
age=splitter[1]
year=splitter[2]
print("Name: {}".format(name))
print("Age: {}".format(age))
print("Year Group: {}".format(year))
postReport()
if valid4==False:
print("Sorry Fergus, this user doesn't seem to be in our records.")

Python: leaderboard

I have been asked to design a leader board,
This is what I tried
def leader():
file = open ("res.txt","r")
reader = csv.reader(file)
print (("{:20s}{:20s}{:20s}{:20s}".format("\nPlayer","Matches played","Won","Lost")))
won = 100
for r in reader:
won = won-1
if r[2] == str(won):
print (("{:20s}{:20s}{:20s}{:20s}".format(r[0],r[1],r[2],r[3])))
file.close()
My csv file looks like this
Leeroy,19,7,12
Jenkins,19,8,11
Tyler,19,0,19
Napoleon Wilson,19,7,12
Big Boss,19,7,12
Game Dude,19,5,14
Macho Man,19,3,16
Space Pirate,19,6,13
Billy Casper,19,7,12
Otacon,19,7,12
Big Brother,19,7,12
Ingsoc,19,5,14
Ripley,19,5,14
M'lady,19,4,15
Einstein100,19,8,11
Dennis,19,5,14
Esports,19,8,11
RNGesus,19,7,12
Kes,19,9,10
Magnitude,19,6,13
I wish for it to display the person with the most wins first, can you help?

Try slurping the file into a list:
all_rows = list(reader)
Then sort all the rows on the key of most wins:
sorted_rows = sorted(all_rows, key=(lambda row: row[2]), reverse=True)
You might want to stack several sorts, or define a more complex key, so that you can enforce ordering within equal numbers of wins (for example, 7 wins of 8 plays is better than 7 wins of 100 plays).

keys = ["Player","Matches played","Won","Lost"]
# read file
with open("res.txt") as f:
content = f.readlines()
# you may also want to remove whitespace characters like `\n` at the end of each line and
# split it by ',' then form a dictionary out of it
content = [dict(zip(keys,x.strip().split(','))) for x in content]
# sort the above list of dicts in decreasing orde.
final_result = sorted(content, key=lambda k: k['Won'], reverse=True)
print final_result# sorted By score
# first element is the one with highest Won value
print final_result[0] # highest Won

Files in python

My program has to do two things with this file.
It needs to print the following information:

def getlines(somefile):
f = open(somefile).readlines()
lines = [line for line in f if not line.startswith("#") and not line.strip() == ""]
return lines
entries = getlines(input("Name of input file: "))
animal_visits = {}
month_visits = [0] * 13
for entry in entries:
# count visits for each animal
animal = entry[:3]
animal_visits[animal] = animal_visits.get(animal, 0) + 1
# count visits for each month
month = int(entry[4:6])
month_visits[month] += 1
print("Total Number of visits for each animal")
for x in sorted(animal_visits):
print(x, "\t", animal_visits[x])
print("====================================================")
print("Month with highest number of visits to the stations")
print(month_visits.index(max(month_visits)))
Outputs:
Name of input file: log
Total Number of visits for each animal
a01 3
a02 3
a03 8
====================================================
Month with highest number of visits to the stations
1

I prepared the following script:
from datetime import datetime # to parse your string as a date
from collections import defaultdict # to accumulate frequencies
import calendar # to get the names of the months
# Store the names of the months
MONTHS = [item for item in calendar.month_name]
def entries(filename):
"""Yields triplets (animal, date, station) contained in
`filename`.
"""
with open(filename, "rb") as fp:
for line in (_line.strip() for _line in fp):
# skip comments
if line.startswith("#"):
continue
try:
# obtain the entry or try next line
animal, datestr, station = line.split(":")
except ValueError:
continue
# convert date string to actual datetime object
date = datetime.strptime(datestr, "%m-%d-%Y")
# yield the value
yield animal, date, station
def visits_per_animal(data):
"""Count of visits per station sorted by animal."""
# create a dictionary whose value is implicitly created to an
# integer=0
counter = defaultdict(int)
for animal, date, station in data:
counter[animal] += 1
# print the outcome
print "Visits Per Animal"
for animal in sorted(counter.keys()):
print "{0}: {1}".format(animal, counter[animal])
def month_of_highest_frequency(data):
"""Calulates the month with the highest frequency."""
# same as above: a dictionary implicitly creation integer=0 for a
# new key
counter = defaultdict(int)
for animal, date, station in data:
counter[date.month] += 1
# select the (key, value) where value is maximum
month_max, visits_max = max(counter.iteritems(), key=lambda t: t[1])
# pretty-print
print "{0} has the most visits ({1})".format(MONTHS[month_max], visits_max)
def main(filename):
"""main program: get data, and apply functions"""
data = [entry for entry in entries(filename)]
visits_per_animal(data)
month_of_highest_frequency(data)
if __name__ == "__main__":
import sys
main(sys.argv[1])
Use as:
$ python animalvisits.py animalvisits.txt
Visits Per Animal
a01: 3
a02: 3
a03: 8
January has the most visits (3)
Having done that I must advice you agains this approach. Querying data like this is very inefficient, difficult, and error prone. I recommend you store your data in an actual database (Python offers an excellent binding for SQlite), and use SQL to make your reductions.
If you adopt the SQlite philosophy, you will simply store your queries as plain text files and run them on demand (via Python, or GUI, or command line).
Visit http://docs.python.org/2/library/sqlite3.html for more details.

have you tried using regex?
I guess your code would reduce to a very few lines if you use regex?
use findall("DIFFERENT REGULAR EXPRESSIONS") and store the values into list. Then you can count the length of the list.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Write map to a csv in python - python

Related

How to print a certain cell under a specific condition

Is there a foolproof way of matching two similar string sequences?

Is there any way to put a single result from a CSV into a variable?

Python: leaderboard

Files in python

Categories

Resources