So I have a script that produces files named for each key in a dictionary. (script below)
for key, values in sequencelist.items():
with open(key, 'w') as out:
for value in values:
out.write('\n'.join(value.split()) + '\n')
Can someone help me modify the above syntax to do more? I would like to append some plain text onto the filename as well as add the current len(dict.keys()) using range() See my script below, which doesn't work! :)
for key, values in sequencelist.items():
for i in range(len(sequencelist.keys())):
j = i+1
with open('OTU(%j)' +'_' + key +'.txt' %j, 'w') as out:
for value in values:
out.write('\n'.join(value.split()) + '\n')
So this the first file created would be OTU(1)_key.txt
I am sure the with open() line is 100% wrong.
Could someone also link me stuff to read on the use of %j to call the variable j from the line before works? I was trying to use code from this Overflow answer (Input a text file and write multiple output files in Python) with no explanation.
Try the following
for count, (key, values) in enumerate(sequencelist.items()):
with open('OTU(%d)_%s.txt' % (count+1, str(key)), 'w') as out:
for value in values:
out.write('\n'.join(value.split()) + '\n')
I swapped the ordering of your open call with your value iteration so you don't get len(sequencelist) files for each value. It seemed like your j argument was not required after this change. The enumerate call makes the count part of the for loop increment each time the loop repeats (it doesn't have to be called count).
The %d asks for an integer, the %s for a string, which depending on the key name will convert nicely with str(). If your key is some custom class you'll want to convert it to a nicer string format as you'll get someting like <class __main__.Test at 0x00000....>.
Related
This is what I am supposed to do in my assignment:
This function is used to create a bank dictionary. The given argument
is the filename to load. Every line in the file will look like key:
value Key is a user's name and value is an amount to update the user's
bank account with. The value should be a number, however, it is
possible that there is no value or that the value is an invalid
number.
What you will do:
Try to make a dictionary from the contents of the file.
If the key doesn't exist, create a new key:value pair.
If the key does exist, increment its value with the amount.
You should also handle cases when the value is invalid. If so, ignore that line and don't update the dictionary.
Finally, return the dictionary.
Note: All of the users in the bank file are in the user account file.
Example of the contents of 'filename' file:
Brandon: 115.5
James: 128.87
Sarah: 827.43
Patrick:'18.9
This is my code:
bank = {}
with open(filename) as f:
for line in f:
line1 = line
list1 = line1.split(": ")
if (len(list1) == 2):
key = list1[0]
value = list1[1]
is_valid = value.isnumeric()
if is_valid == True
value1 = float(value)
bank[(key)] = value1
return bank
My code returns a NoneType object which causes an error but I don't know where the code is wrong. Also, there are many other errors. How can I improve/fix the code?
Try this code and let me explain everything on it because it depends on how much you're understanding Python Data structure:
Code Syntax
adict = {}
with open("text_data.txt") as data:
"""
adict (dict): is a dictionary variable which stores the data from the iteration
process that's happening when we're separating the file syntax into 'keys' and 'values'.
We're doing that by iterate the file lines from the file and looping into them.
The `line` is each line from the func `readlines()`. Now the magic happens here,
you're playing with the line using slicing process which helps you to choose
the location of the character and play start from it. BUT,
you'll face a problem with how will you avoid the '\n' that appears at the end of each line.
you can use func `strip` to remove this character from the end of the file.
"""
adict = {line[:line.index(':')]: line[line.index(':')+1: ].strip('\n') for line in data.readlines()}
print(adict)
Output
{' Brandon': '115.5', ' James': '128.87', ' Sarah': '827.43', ' Patrick': "'18.9"}
In term of Value Validation by little of search you will find that you can check the value if its a number or not
According to Detect whether a Python string is a number or a letter
a = 5
def is_number(a):
try:
float (a)
except ValueError:
return False
else:
return True
By Calling the function
print(is_number(a))
print(is_number(1.4))
print(is_number('hello'))
OUTPUT
True
True
False
Now, let's back to our code to edit;
All you need to do is to add condition to this dict..
adict = {line[:line.index(':')]: line[line.index(':')+1: ].strip(' \n') for line in data.readlines() if is_number(line[line.index(':')+1: ].strip('\n')) == True}
OUTPUT
{'Brandon': '115.5', 'James': '128.87', 'Sarah': '827.43'}
You can check the value of the dict by passing it to the function that we created
Code Syntax
print(is_number(adict['Brandon']))
OUTPUT
True
You can add more extensions to the is_number() function if you want.
You're likely hitting the return in the else statement, which doesn't return anything (hence None). So as soon as there is one line in your file that does not contain 2 white-space separated values, you're returning nothing.
Also note that your code is only trying to assign a value to a key in a dictionary. It is not adding a value to an existing key if it already exists, as per the documentation.
This should effectively do the job:
bank = {}
with open(filename) as file:
for line in file:
key, val = line.rsplit(": ", 1) # This will split on the last ': ' avoiding ambiguity of semi-colons in the middle
# Using a trial and error method to convert number to float
try:
bank[key] = float(val)
except ValueError as e:
print(e)
return bank
dict ={"Rahul":"male",
"sahana":"female"
"pavan":"male" }
in a text file we have
rahul|sharma
sahana|jacob
Pavan|bhat
in a python program we have to open the text file and read the all line and "Name" we have to match with dict what we have and make a new text file with gender..
OUTPUT SHOULD BE LIKE
rahul|sharma|male
sahana|jacob|female
Pavan|bhat|male
It would seem to me that this is roughly what you want. Note that your formatting for input and output was slightly off, but I'm pretty sure I've got it.
genders = {"rahul":"male",
"sahana":"female",
"pavan":"male" }
with open("input.txt") as in_file:
for line in in_file:
a, b = line.strip().split("|")
gen = genders[a]
print("{}|{}|{}".format(a, b, gen))
where input.txt contains
rahul|sharma
sahana|jacob
pavan|bhat
will correctly (I think) produce the output
rahul|sharma|male
sahana|jacob|female
pavan|bhat|male
I have changed all of your data to be lowercase, as with your casing, it would have been ambiguous as to how to lookup in the dictionary, and how to end up providing output (only one key was capital-cased, so I couldn't use any kind of reasonable string function to accomodate the keys as they were). I've also had to add a comma to your dictionary.
I've also renamed your dictionary - it's no longer dict, because dict is a Python builtin. It seems a bit strange to me that you will have available in your code a dictionary that can anticipate your input file, but this is what I got from the question.
To get the value for the key in a dict, the syntax is simply:
b = "Rahul"
dict = {"Rahul":"male", "Mahima":"female"}
dict[b]
I'm extremely new to python and was having some trouble with removing duplicate values from an attribute of a class (I think this is the correct terminology).
Specifically I want to remove every value that is the same year. I should note that I'm printing only the first four value and searching for the first four values. The data within the attribute is actually in Yearmonthday format (example: 19070101 is the year 1907 on the first on january).
Anyways, here is my code:
import csv
import os
class Datatype:
'Data from the weather station'
def __init__ (self, inputline):
[ self.DATE,
self.PRCP] = inputline.split(',')
filename ='LAWe.txt'
LAWd = open(filename, 'r')
LAWefile = LAWd.read()
LAWd.close()
'Recognize the line endings for MS-DOS, UNIX, and Mac and apply the .split() method to the string wholeFile'
if '\r\n' in LAWefile:
filedat = LAWefile.split('\r\n') # the split method, applied to a string, produces a list
elif '\r' in LAWefile:
filedat = LAWefile.split('\r')
else:
filedat = LAWefile.split('\n')
collection = dict()
date= dict()
for thisline in filedat:
thispcp = Datatype(thisline) # here is where the Datatype object is created (running the __init__ function)
collection[thispcp.DATE] = thispcp # the dictionary will be keyed by the ID attribute
for thisID in collection.keys():
studyPRP = collection[thisID]
if studyPRP.DATE.isdigit():
list(studyPRP.DATE)
if len(date[studyPRP.DATE][0:4]):
pass #if year is seen once, then skip and go to next value in attribute
else:
print studyPRP.DATE[0:4] #print value in this case the year)
date[studyPRP.DATE]=studyPRP.DATE[0:4]
I get a this error:
Traceback (most recent call last):
File "project.py", line 61, in
if len(date[studyPRP.DATE][0:4]):
KeyError: '19770509'
A key error (which means a value isn't in a list? but it is for my data) can be fixed by using a set function (or so I've read), but I have 30,000 pieces of information I'm dealing with and it seems like you have to manually type in that info so that's not an option for me.
Any help at all would be appreciated
Sorry if this is confusing or nonsensical as I'm extremely new to python.
Replace this
if len(date[studyPRP.DATE][0:4])
by this
if len(date[studyPRP.DATE[0:4]]):
Explanation :
In the first line you are selecting the whole date as the key KeyError: '19770509' in the 4 first entry of date
In the correction you send the the first 4 character of the date(the year) in the dictionary
Don't know what exactly you want here. I'll reply based on I can help you on what.
Your error is because you are accessing your year in data before you are adding it.
Also, what you are adding to your collection is like
{
<object>.DATE: <object>
}
I don't know what you need here. Your lower for loop can be written as under:
for thisID in collection:
if thisID.isdigit():
if thisID[0:4] in date and len(date[thisID[0:4]]):
#if year is seen once, then skip and go to next
# value in attribute
pass
else:
print thisID[0:4] #print value in this case the year)
date[thisID[0:4]]=thisID[0:4]
Note your studyPRP.DATE is same as thisID.
I am using a dictionary to add key and values in it. I am checking if the key is already present, and if yes, I am appending the value; if not I add a key and the corresponding value.
I am getting the error message:
AttributeError: 'str' object has no attribute 'append'
Here is the code. I am reading a CSV file:
metastore_dir = collections.defaultdict(list)
with open(local_registry_file_path + data_ext_dt + "_metastore_metadata.csv",'rb') as metastore_metadata:
for line in metastore_metadata:
key = line[2]
key = key.lower().strip()
if (key in metastore_dir):
metastore_dir[key].append(line[0])
else:
metastore_dir[key] = line[0]
I found the answer on stack overflow which says to use defaultdict to resolve the issue, i am getting the error message even after the suggested anwer.
I have pasted my code for reference.
The str type has no append() method.
Replace your call to append with the + operator:
sentry_dir[key] += line[1]
It is a dictionary of strings. To declare it as a list use
if (key not in metastore_dir): ## add key first if not in dict
metastore_dir[key] = [] ## empty list
metastore_dir[key].append(line[0])
""" with defaultdict you don't have to add the key
i.e. "if key in" not necessary
"""
metastore_dir[key].append(line[0])
When you insert a new item into the dictionary, you want to insert it as a list:
...
if (key in metastore_dir):
metastore_dir[key].append(line[0])
else:
metastore_dir[key] = [line[0]] # wrapping it in brackets creates a singleton list
On an unrelated note, it looks like you are not correctly parsing the CSV. Trying splitting each line by commas (e.g. line.split(',')[2] refers to the third column of a CSV file). Otherwise line[0] refers to the first character of the line and line[2] refers to the third character of the line, which I suspect is not what you want.
I am new in python and I would like some help for a small problem. I have a file whose each line has an ID plus an associated number. More than one numbers can be associated to the same ID. How is it possible to get only the ID plus the largest number associated with it in python?
Example:
Input: ID_file.txt
ENSG00000133246 2013 ENSG00000133246 540
ENSG00000133246 2010
ENSG00000253626 465
ENSG00000211829 464
ENSG00000158458 2577
ENSG00000158458 2553
What I want is the following:
ENSG00000133246 2013
ENSG00000253626 465
ENSG00000211829 464
ENSG00000158458 2577
Thanks in advance for any help!
I would think there are many ways to do this I would though use a dictionary
from collections import defaultdict
id_value_dict = defaultdict()
for line in open(idfile.txt).readlines():
id, value = line.strip().split()
if id not in id_value_dict:
id_value_dict[id] = int(value)
else:
if id_value_dict[id] < int(value):
id_value_dict[id] = int(value)
Next step is to get the dictionary written out
out_ref = open(outputfile.txt,'w')
for key, value in id_value_dict:
outref.write(key + '\t' + str(value)
outref.close()
There are slicker ways to do this, I think the dictionary could be written in a one-liner using a lamda or a list-comprehension but I like to start simple
Just in case you need the results sorted there are lots of ways to do it but I think it is critical to understand working with lists and dictionaries in python as I have found that the learning to think about the right data container is usually the key to solving many of my problems but I am still a new. Any way if you need the sorted results a straightforward was is to
id_value_dict.keys().sort()
SO this is one of the slick things about python id_value__dict.keys() is a list of the keys of the dictionary sorted
out_ref = open(outputfile.txt,'w')
for key in id_value_dict.keys():
outref.write(key + '\t' + str(id_value_dict[key])
outref.close()
its really tricky because you might want (I know I always want) to code
my_sorted_list = id_value_dict.keys().sort()
However you will find that my_sorted_list does not exist (NoneType)
Given that your input consists of nothing but contiguous runs for each ID—that is, as soon as you see another ID, you never see the previous ID again—you can just do this:
import itertools
import operator
with open('ID_file.txt') as idfile, open('max_ID_file.txt', 'w') as maxidfile:
keyvalpairs = (line.strip().split(None, 1) for line in idfile)
for key, group in itertools.groupby(keyvalpairs, operator.itemgetter(0)):
maxval = max(int(keyval[1]) for keyval in group)
maxidfile.write('{} {}\n'.format(key, maxval))
To see what this does, let's go over it line by line.
A file is just an iterable full of lines, so for line in idfile means exactly what you'd expect. For each line, we're calling strip to get rid of extraneous whitespace, then split(None, 1) to split it on the first space, so we end up with an iterable full of pairs of strings.
Next, we use groupby to change that into an iterable full of (key, group) pairs. Try printing out list(keyvalpairs) to see what it looks like.
Then we iterate over that, and just use max to get the largest value in each group.
And finally, we print out the key and the max value for the group.