How to have multiple arrays in python variable - python

Right now I have a small script that writes and read data to a CSV file.
Brief snippet of the write function:
with open(filename,'w') as f1:
writer=csv.writer(f1, delimiter=';',lineterminator='\n',)
for a,b in my_function:
do_things_to_get_data
writer.writerow([tech_link, str(total), str(avg), str(unique_count)])
Then brief snippet of reading the file:
infile = open(filename,"r")
for line in infile:
row = line.split(";")
tech = row[0]
total = row[1]
average = row[2]
days_worked = row[3]
do_things_with_each_row_of_data
I'd like to just skip the CSV part all together and see if I can just hold all that data in a variable but I'm not sure what that looks like. Any help is appreciated.
Thank you.

...no point in me saving data to a csv file just to read it later in the script
Just keep it in a list of lists
data = []
for a,b in my_function:
do_things_to_get_data
data.append([tech_link, str(total), str(avg), str(unique_count)])
...
for tech, total, average, days_worked in data:
do_things_with_each_row_of_data
It might be worth saving each row as a namedtuple or a dictionary

Related

How can I modify my code to fit the function in order for it to run correctly?

So I'm tasked with finding the most popular bid from a list of csv files like this
1,8dac2b,ewmzr,jewelry,phone0,9759243157894736,us,69.166.231.58,vasstdc27m7nks3
2,668d39,aeqok,furniture,phone1,9759243157894736,jp,50.201.125.84,jmqlhflrzwuay9c
3,622r49,arqek,vehicle,phone2,9759544365415694736,az,53.001.135.54,weqlhrerreuert6f
4,6444t43,rrdwk,vehicle,phone9,9759543263245434353,au,54.241.234.64,weqqyqtqwrtert6f
and I have to determine the most popular bid, which is found in every 4th element within the csv file...
so the thing that I'm gonna be looking for are the ones with (** **) around them, as indicated don below from the same example
1,8dac2b,ewmzr,**jewelry**,phone0,9759243157894736,us,69.166.231.58,vasstdc27m7nks3
2,668d39,aeqok,**furniture**,phone1,9759243157894736,jp,50.201.125.84,jmqlhflrzwuay9c
3,622r49,arqek,**vehicle**,phone2,9759544365415694736,az,53.001.135.54,weqlhrerreuert6f
4,6444t43,rrdwk,**vehicle**,phone9,9759543263245434353,au,54.241.234.64,weqqyqtqwrtert6f
So from my example, the output should be vehicle
Look at my previous question here where I asked the same thing:
How can I return the most reoccuring value in the given csv list?
Now with the help of a kind StackOverflow user https://stackoverflow.com/users/5387738/nikita-almakov , I was able to get this code...
import csv
value_to_count = {}
with open("tmp.csv", "r") as f:
reader = csv.reader(f)
for row in reader:
category = row[3]
if category in value_to_count:
value_to_count[category] += 1
else:
value_to_count[category] = 1
count_to_value = sorted((v, k) for k, v in value_to_count.items())
However, when I tried to put this code under a function popvote(filename), I encounter an issue which is that I don't know where to include the filename in the piece of code in order to get the right output.
Here is what I have so far:
import csv
def popvote(filename):
value_to_count = {}
with open("tmp.csv", "r") as f:
reader = csv.reader(f)
for row in reader:
category = row[3]
if category in value_to_count:
value_to_count[category] += 1
else:
value_to_count[category] = 1
count_to_value = sorted((v, k) for k, v in value_to_count.items())
return count_to_value
However, the output, which as expected is not being printed correctly. I know that I haven't implemented filename into the function, and the reason is I don't know where to do so, can someone please help me figure this out?
Change
with open("tmp.csv", "r") as f:
to
with open(filename, "r") as f:
If the file filename ist the CSV file with the Values above, then you should replace the "tmp.csv" with filename. Then this function should return an tuple with the most popular bids on the front.

More efficient way to go through .csv file?

I'm trying to parse through a few dictionary a in .CSV file, using two lists in separate .txt files so that the script knows what it is looking for. The idea is to find a line in the .CSV file which matches both a Word and IDNumber, and then pull out a third variable if there is a match. However, the code is running really slow. Any ideas how I could make it more efficient?
import csv
IDNumberList_filename = 'IDs.txt'
WordsOfInterest_filename = 'dictionary_WordsOfInterest.txt'
Dictionary_filename = 'dictionary_individualwords.csv'
WordsOfInterest_ReadIn = open(WordsOfInterest_filename).read().split('\n')
#IDNumberListtoRead = open(IDNumberList_filename).read().split('\n')
for CurrentIDNumber in open(IDNumberList_filename).readlines():
for CurrentWord in open(WordsOfInterest_filename).readlines():
FoundCurrent = 0
with open(Dictionary_filename, newline='', encoding='utf-8') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
if ((row['IDNumber'] == CurrentIDNumber) and (row['Word'] == CurrentWord)):
FoundCurrent = 1
CurrentProportion= row['CurrentProportion']
if FoundCurrent == 0:
CurrentProportion=0
else:
CurrentProportion=1
print('found')
First of all, consider to load file dictionary_individualwords.csv into the memory. I guess that python dictionary is proper data structure for this case.
Your are opening the CSV file N times where N = (# lines in IDS.txt) * (# lines in dictionary_WordsOfInterest.txt). If the file is not too large, you can avoid that by saving its content to a dictionary or a list of lists.
The same way you open dictionary_WordsOfInterest.txt every time you read a new line from IDS.txt
Also It seems that you are looking for any combination of pair (CurrentIDNumber, CurrentWord) possible from the txt files. So for example you can store the ids in a set, and the words in an other, and for each row in the csv file, you can check if both the id and the word are in their respective set.
As you use readlines for the .txt files, you already build an in memory list with them. You should build those lists first and them only parse once the csv file. Something like:
import csv
IDNumberList_filename = 'IDs.txt'
WordsOfInterest_filename = 'dictionary_WordsOfInterest.txt'
Dictionary_filename = 'dictionary_individualwords.csv'
WordsOfInterest_ReadIn = open(WordsOfInterest_filename).read().split('\n')
#IDNumberListtoRead = open(IDNumberList_filename).read().split('\n')
numberlist = open(IDNumberList_filename).readlines():
wordlist = open(WordsOfInterest_filename).readlines():
FoundCurrent = 0
with open(Dictionary_filename, newline='', encoding='utf-8') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
for CurrentIDNumber in numberlist:
for CurrentWord in wordlist :
if ((row['IDNumber'] == CurrentIDNumber) and (row['Word'] == CurrentWord)):
FoundCurrent = 1
CurrentProportion= row['CurrentProportion']
if FoundCurrent == 0:
CurrentProportion=0
else:
CurrentProportion=1
print('found')
Beware: untested

Problems with handling files in Python

Good evening everyone,
I am fairly new to Python and at the moment and at the moment I'm struggling with the problem of how to properly edit a file (.txt or .csv) in python. I am trying to write a little program that will take each line of a text file, encrypt it and then overwrite the file line by line and save it. The relevant part of my code looks like this so far:
with open('/home/path/file.csv', 'r+') as csvfile:
for row in csv.reader(csvfile, delimiter='\t'):
y = []
for i in range(0, len(row)):
x = encrypt(row[i], password)
y.append(x)
csvfile.write(''.join(y))
Which, when executed, does nothing. I've played with the code a little, sometimes it runs into a
TypeError: expected a character buffer object
The encryption function returns a string and my file consists of 3 strings per row, seperated by a tab, like this:
key1 value1 value1'
key2 value2 value2'
key3 value3 value3'
...
The csv.reader seems to read the file properly and returns one list per row, y then returns a list with the encrypted phrases. However, I can't seem to get the file.write() function to actually overwrite the file. Does anyone know how to get around this?
Any help would be greatly appreciated.
Thanks,
Andy
You've open the file as read only. You need to open a second file for writing.
with open('/home/path/file.csv', 'r+') as csvfile:
for row in csv.reader(csvfile, delimiter='\t'):
y = []
for i in range(0, len(row)):
x = encrypt(row[i], password)
y.append(x)
with open('/home/path/file.csv', 'w') as csvfile:
csvfile.write(''.join(y))
I never like to overwrite my files, disk space is cheap.
with open('/home/path/file.csv', 'r+') as csvfile:
with open('/home/path/file.enc', 'w') as csvencryptedfile:
for row in csv.reader(csvfile, delimiter='\t'):
y = []
for i in range(0, len(row)):
x = encrypt(row[i], password)
y.append(x)
csvencryptedfile.write('\t'.join(y))
csvencryptedfile.write('\n')

How do I create add new items to a dictionary while in a loop?

I'm writing a program that reads names and statistics related to those names from a file. Each line of the file is another person and their stats. For each person, I'd like to make their last name a key and everything else linked to that key in the dictionary. The program first stores data from the file in an array and then I'm trying to get those array elements into the dictionary, but I'm not sure how to do that. Plus I'm not sure if each time the for loop iterates, it will overwrite the previous contents of the dictionary. Here's the code I'm using to attempt this:
f = open("people.in", "r")
tmp = None
people
l = f.readline()
while l:
tmp = l.split(',')
print tmp
people = {tmp[2] : tmp[0])
l = f.readline()
people['Smith']
The error I'm currently getting is that the syntax is incorrect, however I have no idea how to transfer the array elements into the dictionary other than like this.
Use key assignment:
people = {}
for line in f:
tmp = l.rstrip('\n').split(',')
people[tmp[2]] = tmp[0]
This loops over the file object directly, no need for .readline() calls here, and removes the newline.
You appear to have CSV data; you could also use the csv module here:
import csv
people = {}
with open("people.in", "rb") as f:
reader = csv.reader(f)
for row in reader:
people[row[2]] = row[0]
or even a dict comprehension:
import csv
with open("people.in", "rb") as f:
reader = csv.reader(f)
people = {r[2]: r[0] for r in reader}
Here the csv module takes care of the splitting and removing newlines.
The syntax error stems from trying close the opening { with a ) instead of }:
people = {tmp[2] : tmp[0]) # should be }
If you need to collect multiple entries per row[2] value, collect these in a list; a collections.defaultdict instance makes that easier:
import csv
from collections import defaultdict
people = defaultdict(list)
with open("people.in", "rb") as f:
reader = csv.reader(f)
for row in reader:
people[row[2]].append(row[0])
In repsonse to Generalkidd's comment above, multiple people with the same last time, an addition to Martijn Pieter's solution, posted as an answer for better formatting:
import csv
people = {}
with open("people.in", "rb") as f:
reader = csv.reader(f)
for row in reader:
if not row[2] in people:
people[row[2]] = list()
people[row[2]].append(row[0])

Using CSV module to append multiple files while removing appended headers

I would like to use the Python CSV module to open a CSV file for appending. Then, from a list of CSV files, I would like to read each csv file and write it to the appended CSV file. My script works great - except that I cannot find a way to remove the headers from all but the first CSV file being read. I am certain that my else block of code is not executing properly. Perhaps my syntax for my if else code is the problem? Any thoughts would be appreciated.
writeFile = open(append_file,'a+b')
writer = csv.writer(writeFile,dialect='excel')
for files in lstFiles:
readFile = open(input_file,'rU')
reader = csv.reader(readFile,dialect='excel')
for i in range(0,len(lstFiles)):
if i == 0:
oldHeader = readFile.readline()
newHeader = writeFile.write(oldHeader)
for row in reader:
writer.writerow(row)
else:
reader.next()
for row in reader:
row = readFile.readlines()
writer.writerow(row)
readFile.close()
writeFile.close()
You're effectively iterating over lstFiles twice. For each file in your list, you're running your inner for loop up from 0. You want something like:
writeFile = open(append_file,'a+b')
writer = csv.writer(writeFile,dialect='excel')
headers_needed = True
for input_file in lstFiles:
readFile = open(input_file,'rU')
reader = csv.reader(readFile,dialect='excel')
oldHeader = reader.next()
if headers_needed:
newHeader = writer.writerow(oldHeader)
headers_needed = False
for row in reader:
writer.writerow(row)
readFile.close()
writeFile.close()
You could also use enumerate over the lstFiles to iterate over tuples containing the iteration count and the filename, but I think the boolean shows the logic more clearly.
You probably do not want to mix iterating over the csv reader and directly calling readline on the underlying file.
I think you're iterating too many times (over various things: both your list of files and the files themselves). You've definitely got some consistency problems; it's a little hard to be sure since we can't see your variable initializations. This is what I think you want:
with open(append_file,'a+b') as writeFile:
need_headers = True
for input_file in lstFiles:
with open(input_file,'rU') as readFile:
headers = readFile.readline()
if need_headers:
# Write the headers only if we need them
writeFile.write(headers)
need_headers = False
# Now write the rest of the input file.
for line in readFile:
writeFile.write(line)
I took out all the csv-specific stuff since there's no reason to use it for this operation. I also cleaned the code up considerably to make it easier to follow, using the files as context managers and a well-named boolean instead of the "magic" i == 0 check. The result is a much nicer block of code that (hopefully) won't have you jumping through hoops to understand what's going on.

Categories