I am just beginning working with Python and am a little confused. I understand the basic idea of a dictionary as (key, value). I am writing a program and want to read in a file, story it in a dictionary and then complete different functions by referrencing the values. I am not sure if I should use a dictionary or lists. The basic layout of the file is:
Name followed by 12 different years for example :
A 12 12 01 11 0 0 2 3 4 9 12 9
I am not sure what the best way to read in this information would be. I was thinking that a dictionary may be helpful if I had Name followed by Years, but I am not sure if I can map 12 years to one key name. I am really confused on how to do this. I can read in the file line by line, but not within the dictionary.
def readInFile():
fileDict ={"Name ": "Years"}
with open("names.txt", "r") as f:
_ = next(f)
for line in f:
if line[1] in fileDict:
fileDict[line[0]].append(line[1])
else:
fileDict[line[0]] = [line[1]]
My thinking with this code was to append each year to the value.
Please let me know if you have any recommendations.
Thank you!
You can do in one line :)
print({line[0]:line[1:].split() for line in open('file.txt','r') if line[0]!='\n'})
output:
{'A': ['12', '12', '01', '11', '0', '0', '2', '3', '4', '9', '12', '9']}
Above dict comprehension is same as:
dict_1={}
for line in open('legend.txt', 'r'):
if line[0]!='\n':
dict_1[line[0]]=line[1:].split()
print(dict_1)
You can map 12 years to one key name. You seem to think that you need to choose between a dictionary and a list ("I am not sure if I should use a dictionary or lists.") But those are not alternatives. Your 12 years can usefully be represented as a list. Your names can be dictionary keys. So you need (as PM 2Ring suggests) a dictionary where the key is a name and the value is a list of years.
def readInFile():
fileDict = {}
with open(r"names.txt", "r") as f:
for line in f:
name, years = line.split(" ",1)
fileDict[name] = years.split()
There are two calls to the string method split(). The first splits the name from the years at the first space. (You can get the name using line[0], but only if the name is one character long, and that is unlikely to be useful with real data.) The second call to split() picks the years apart and puts them in a list.
The result from the one-line sample file will be the same as running this:
fileDict = {'A': ['12', '12', '01', '11', '0', '0', '2', '3', '4', '9', '12', '9']}
As you can see, these years are strings not integers: you may want to convert them.
Rather than doing:
_ = next(f)
to throw away your record count, consider doing
for line in f:
if line.strip().isdigit():
continue
instead. If you are using file's built-in iteration (for line in f) then it's generally best not to call next() on f yourself.
It's also not clear to me why your code is doing this:
fileDict ={"Name ": "Years"}
This is a description of what you plan to put in the dictionary, but that is not how dictionaries work. They are not database tables with named columns. If you use a dictionary with key:name and value:list of years, that structure is implicit. The best you can do is describe it in a comment or a type annotation. Performing the assignment will result in this:
fileDict = {
'A': ['12', '12', '01', '11', '0', '0', '2', '3', '4', '9', '12', '9'],
'Name ': 'Years'
}
which mixes up description and data, and is probably not what you want, because your subsequent code is likely to expect a 12-list of years in the dictionary value, and if so it will choke on the string "Years".
Values in a dict can be anything, including a new dict, but in this case a list sounds good. Maybe something like this.
from io import StringIO # just to make it run without an actual file
the_file_content = 'A 12 12 01 11\nB 13 13 02'
fake_file = StringIO(the_file_content)
# this stays for your
#with open('names.txt', 'rt') as f:
# lines = f.readlines()
lines = fake_file.readlines() # this goes away for you
lines = [l.strip().split(' ') for l in lines]
fileDict = {row[0]: row[1:] for row in lines}
# if you want the values to be actual numbers rather than strings
for k, v in fileDict.items():
fileDict[k] = [int(i) for i in v]
In python there are constructs where most simple as well as complex things can be done in one go, rather than looping with index-like constructs.
Related
I was trying to come up with a function that would read an .csv archive and from there I could get for example, grades for students tests, example below:
NOME,G1,G2
Paulo,5.0,7.2
Pedro,6,4.1
Ana,3.3,2.3
Thereza,5,6.5
Roberto,7,5.2
Matheus,6.3,6.1
I managed to split the lines on the , part but I end up with somewhat a matrix:
[['NOME', 'G1', 'G2'], ['Paulo', '5.0', '7.2'], ['Pedro', '6', '4.1'], ['Ana', '3.3', '2.3'], ['Thereza', '5', '6.5'], ['Roberto', '7', '5.2'], ['Matheus', '6.3', '6.1']]
How do I go from one list to the other and manage to get the grades within them?
This is the code I got so far:
def leArquivo(arquivo):
arq = open(arquivo, 'r')
conteudo = arq.read()
arq.close
return conteudo
def separaLinhas(conteudo):
conteudo=conteudo.split('\n')
conteudo1 = []
for i in conteudo:
conteudo1.append(i.split(','))
return conteudo1
Where do I go from here?
A simple for will do it, i.e.:
notas = [['NOME', 'G1', 'G2'], ['Paulo', '5.0', '7.2'], ['Pedro', '6', '4.1'], ['Ana', '3.3', '2.3'], ['Thereza', '5', '6.5'], ['Roberto', '7', '5.2'], ['Matheus', '6.3', '6.1']]
for nota in notas[1:]: ## [1:] skip the first item
nome = nota[0]
g1 = nota[1]
g2 = nota[2]
print ("NOME:{} | G1: {} | G2: {}".format(nome, g1, g2))
DEMO
PS: You may want to cast g1 and g2 to a float - float(nota[1])- if you need to perform math operations.
Since you're working with a csv file, you may want to look at the csv module in Python. That module has many convenient options and forms in which the data is read. Following is an example of csv.DictReader reading and usage,
import csv
# Read the data
with open('data.csv') as f:
reader = csv.DictReader(f)
data = [row for row in reader]
# Print it
for row in data:
print (' ').join(['Nome:',row['NOME'],'G1:',row['G1'],'G2:',row['G2']])
# Print only names and G2 grades as a table
print '- '*10
print 'NOME\t' + 'G2'
for row in data:
print row['NOME'] + '\t' + row['G2']
# Average of G1 and G2 for each student
print '- '*10
print 'NOME\t' + 'Average'
for row in data:
gpa = (float(row['G1']) + float(row['G2']))/2.0
print row['NOME'] + '\t' + str(gpa)
Here the data is read as a list of dictionaries - each element in the list is a dictionary representing a single row of your dataset. The dictionary keys are names of your headers (NOME, G1) and values are the corresponding values for that row.
That particular form can be useful in some situations. Here in the first part of the program the data is printed with keys and values, one row per line. The thing to note is that dictionaries are unordered - to ensure printing in some specific order we need to traverse the dictionary "manually". I used join simply to demonstrate an alternative to format (which is actually more powerful) or just typing everything with spaces in between. Second usage example prints names and the second grade as a table with proper headers. Third calculates the average and prints it as a table.
For me this approach proved very useful when dealing with datasets with several thousands entries that have many columns - headers - that I want to study separately (thus I don't mind them not being in order). To get an ordered dictionary you can use OrderedDict or consider other available datastructures. I also use Python 2.7, but since you tagged the question as 3.X, the links point to 3.X documentation.
I've been working on a problem which I realise I am probably approaching the wrong way but am now confused and out of ideas. Any research that I have done has left me more confused, and thus I have come for help.
I have a nested list:
[['# Name Surname', 'Age', 'Class', 'Score', '\n'], ['name', '9', 'B',
'N/A', '\n'], ['name1', '9', 'B', 'N/A', '\n'], ['name2', '8', 'B',
'N/A', '\n'], ['name3', '9', 'B', 'N/A', '\n'], ['name4', '8', 'B',
'N/A', '']]
I am trying to make it so this list is imported into a text file in the correct layout. For this I flattened the string and then joined it together with ','.
The problem with this is that because the '\n' is being stored in the list itself, it adds a comma after this, which ends up turning this:
Name Surname,Age,Class,Score,
Name,9,B,N/A,
Name1,9,B,N/A,
Name2,8,B,N/A,
Name3,9,B,N/A,
Name4,8,B,N/A,
into:
Name Surname,Age,Class,Score,
,
,Name,9,B,N/A,
,Name1,9,B,N/A,
,Name2,8,B,N/A,
,Name3,9,B,N/A,
,Name4,8,B,N/A,
If I remove the \n from the code the formatting in the text file is all wrong due to no new lines.
Is there a better way to approach this or is there a quick fix to all my problems that I cannot see?
Thanks!
My code for reference:
def scorestore(score):
user[accountLocation][3] = score
file = ("classdata",schclass,".txt")
file = "".join(file)
flattened = [val for sublist in user for val in sublist]
flatstring = ','.join(str(v) for v in flattened)
accountlist = open(file,"w")
accountlist.write(flatstring)
accountlist.close()
I'm not sure which list is the one in your post (sublist?) but when you flatten it, just discard the "\n" strings:
flattened = [x for x in sublist if x != ["\n"]]
The easiest way would probably be to remove the newlines from the sublists as you get them, the print each sublist one at a time. This would look something like:
for sublist in users:
print(",".join(val for val in sublist if not val.isspace()), file=accountlist)
This will fail on the 0 in your list, however. I'm not sure if you intend to handle that, or if it's extraneous. If you do need to handle is, then you'll have to change the generator expression to str(val) for val in sublist if not str(val).isspace().
Instead of making one string, how about writing lines. Use something like this:
list_of_list = [[...]]
lines = [','.join(line).strip() for line in list_of_list]
lines = [line for line in lines if line]
open(file,'w').writelines(lines)
Use the csv module to make it easier:
import csv
data = [
['# Name Surname', 'Age', 'Class', 'Score','\n'],
['\n'],
['Name', '9', 'B', 'N/A','\n'],
['Name1', '9', 'B', 'N/A','\n'],
['Name2', '8', 'B', 'N/A','\n'],
['Name3', '9', 'B', 'N/A','\n'],
['Name4', '8', 'B', 0]
]
# Remove all the ending new lines
data = [row[:-1] if row[-1] == '\n' else row for row in data]
# Write to file
with open('write_sublists.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(data)
Discussion
Your data is irregular: some row contains the ending new line, some row don't. Yet some row contains all strings and some row contains a mixed data type. The first step is to normalize them by remove all ending new lines. The csv module can take care of mixed data types just fine.
I used the csv module to create lists from a data file. It looks something like this now:
['unitig_5\t.\tregion\t401\t500\t0.00\t+\t.\tcov2=3.000', '0.000;gaps=0',
'0;cov=3', '3', '3;cQv=20', '20', '20;del=0;ins=0;sub=0']
['unitig_5\t.\tregion\t2201\t2300\t0.00\t+\t.\tcov2=10.860',
'1.217;gaps=0', '0;cov=8', '11', '13;cQv=20', '20', '20;del=0;ins=0;sub=0']
I need to pull lists and put them into a new file if cov2= (part of the first column above) is equal to some number greater than some specified integer (say 140), so then in that case the two lists above wouldn't be accepted.
How would I set it up to check which lists meet this qualification and put those lists to a new file?
You can use regex :
>>> l=['unitig_5\t.\tregion\t401\t500\t0.00\t+\t.\tcov2=3.000', '0.000;gaps=0',
... '0;cov=3', '3', '3;cQv=20', '20', '20;del=0;ins=0;sub=0']
>>> import re
>>> float(re.search(r'cov2=([\d.]+)',l[0]).group(1))
3.0
The pattern r'cov2=([\d.]+)' will match and combination of digits (\d) and dot with length 1 or more. then you can convert the result to float and compare :
>>> var=float(re.search(r'cov2=([\d.]+)',l[0]).group(1))
>>> var>140
False
Also as its possible that your regex doesn't match the pattern you can use a try-except to handle the exception :
try :
var=float(re.search(r'cov2=([\d.]+)',l[0]).group(1))
print var>140
except AttributeError:
#print 'the_error_message'
I would first split the first string by tabs "\t", which seems to separate the fields.
Then, if cov2 is always the last fild, further parsing would be easy (cut of "cov2=", then convert the remainder to float and compare.
If not necessarily the last field, a simple search for the start should be sufficient.
Of course, complexity could be increased indefinitively if error-checking or a more tolerant search is required.
lst = [ ['unitig_5\t.\tregion\t401\t500\t0.00\t+\t.\tcov2=3.000', '0.000;gaps=0',
'0;cov=3', '3', '3;cQv=20', '20', '20;del=0;ins=0;sub=0'],
['unitig_5\t.\tregion\t2201\t2300\t0.00\t+\t.\tcov2=10.860',
'1.217;gaps=0', '0;cov=8', '11', '13;cQv=20', '20', '20;del=0;ins=0;sub=0'], ]
filtered_list = [ l for l in lst if re.match('.*cov2=([\d.]+$'), l) ]
You could extract the float value using rsplit if all the first elements contain the substring:
for row in list_of_rows:
if float(row[0].rsplit("=",1)[1]) > 140:
# write rows
If you don't actually need every row you should do it when you first read the file writing as you go.
with open("input.csv") as f, open("output.csv", "w") as out:
r = csv.reader(f)
wr = csv.writer(out)
for row in r:
if float(row[0].rsplit("=", 1)[1]) > 140:
wr.writerows(row)
I tried to read some integers from a txt, and makes them into a list in python using the code like below:
nums=list()
txt=open('integers.txt')
for i in txt:
nums.append(i)
print(nums)
But I got the output like:
['5 34 33 45 6 4 23 76 434']
It looks ok, but actually there's only one element of this list, which is
'5 34 33 45 6 4 23 76 434', not a series of them like ['5', '34', '33', '45', '6', '4', '23', '76', '434'], I don't know how to solve the problem...
Thanks for your help
The issue here is that your loop for i in txt is actually iterating over the lines in the file. So it looks like you have a 1-line file and you are just appending the [last] line into your list.
Instead, you probably want to split up the elements (split by whitespace):
with open('integers.txt') as f:
nums = f.read().split()
nums=list()
txt=open('integers.txt', 'r')
print [int(number) for number in txt.read().split()]
txt.close()
You may use join() method.
nums=list()
txt=open('integers.txt')
for i in txt:
nums.append(i)
print(' '.join(nums))
You can split each line to get each individual entry:
nums=[]
txt=open('integers.txt')
for i in txt:
nums.append([item for item in i.split()])
print(nums)
I am writing a program in python to take 5 lines of input from a file 'var_input' and input it into a list, and then input each seperate number into the list first or second
I am just wondering what the best way would be to go about separating the space from in between each number and then appending it to the lists first or second. I am thinking about using python's split method but I am not sure about how to do this
Data in input file would look like this
18 24
10 5
101 567
234 90
107 4567
first should contain ['18', '10', '101', '234', '107']
second should contain ['24', '5', '567', '90', '4567']
Here's What I have so far
first = []
second = []
file_input = open('var_input')
input_list = file_input.readlines()
Thank You So Much, any help would be greatly appreciated
You can do this with zip and split:
with open('var_input') as file_input:
input_list = file_input.readlines()
first, second = zip(*[l[:-1].split() for l in input_list])
How it works- [l[:-1].split() for l in input_list] is a list comprehension, which applies the split method to each line to make it look like:
[["18", "24"], ["10", "5"], ["101", "567"], ["234", "90"], ["107", "4567"]]
zip is a function that then zips multiple lists together (when given with *, it treats each item in the input list as a separate argument). It "zips" it by taking the first item of each list, then the second item of each list (if you had three or more items on each line you'd end up with three or more output lists). The result will look like
[('18', '10', '101', '234', '107'), ('24', '5', '567', '90', '4567')]
first = []
second = []
with open('var_input') as fp:
for line in fp:
temp = line.split()
first.append(temp[0])
second.append(temp[1])
This may looks stupid but it is simple and works.