Python: Data Getting - python

Im trying to find out how to get certain data from a file in the easiest way possible. I have searched all over the internet but can't find anything. I want to be able to do this:
File.txt:
data1 = 1
data2 = 2
but i want to get only data1 like so,
p = open('file.txt')
f = p.get(data1)
print(f)
Any Ideas, Thanks in advance.

You can do:
with open("file.txt", "r") as f:
for line in f:
key, val = line.split('=')
key = key.strip()
val = val.strip()
if key == 'data1': # if data1 is not the first line
# do something with value and data
using map:
from operator import methodcaller
with open("file.txt", "r") as f:
for line in f:
key, val = map(methodcaller("strip", " "), line.split('='))
if key == "data1":
# do something with value and data

with open("file.txt", "r") as f:
key, val = f.readline().split('=')
if key.strip() == 'data1': # if data1 is not the first line
# do something with value and data

If you know you only want data1 which is on the first line, you can do
with open('file.txt', 'r') as f:
key, val = tuple(x.strip() for x in f.readline().split('='))
The list comprehension is used to remove the whitespace from each string.

Related

Making python dictionary from a text file with multiple keys

I have a text file named file.txt with some numbers like the following :
1 79 8.106E-08 2.052E-08 3.837E-08
1 80 -4.766E-09 9.003E-08 4.812E-07
1 90 4.914E-08 1.563E-07 5.193E-07
2 2 9.254E-07 5.166E-06 9.723E-06
2 3 1.366E-06 -5.184E-06 7.580E-06
2 4 2.966E-06 5.979E-07 9.702E-08
2 5 5.254E-07 0.166E-02 9.723E-06
3 23 1.366E-06 -5.184E-03 7.580E-06
3 24 3.244E-03 5.239E-04 9.002E-08
I want to build a python dictionary, where the first number in each row is the key, the second number is always ignored, and the last three numbers are put as values. But in a dictionary, a key can not be repeated, so when I write my code (attached at the end of the question), what I get is
'1' : [ '90' '4.914E-08' '1.563E-07' '5.193E-07' ]
'2' : [ '5' '5.254E-07' '0.166E-02' '9.723E-06' ]
'3' : [ '24' '3.244E-03' '5.239E-04' '9.002E-08' ]
All the other numbers are removed, and only the last row is kept as the values. What I need is to have all the numbers against a key, say 1, to be appended in the dictionary. For example, what I need is :
'1' : ['8.106E-08' '2.052E-08' '3.837E-08' '-4.766E-09' '9.003E-08' '4.812E-07' '4.914E-08' '1.563E-07' '5.193E-07']
Is it possible to do it elegantly in python? The code I have right now is the following :
diction = {}
with open("file.txt") as f:
for line in f:
pa = line.split()
diction[pa[0]] = pa[1:]
with open('file.txt') as f:
diction = {pa[0]: pa[1:] for pa in map(str.split, f)}
You can use a defaultdict.
from collections import defaultdict
data = defaultdict(list)
with open("file.txt", "r") as f:
for line in f:
line = line.split()
data[line[0]].extend(line[2:])
Try this:
from collections import defaultdict
diction = defaultdict(list)
with open("file.txt") as f:
for line in f:
key, _, *values = line.strip().split()
diction[key].extend(values)
print(diction)
This is a solution for Python 3, because the statement a, *b = tuple1 is invalid in Python 2. Look at the solution of #cha0site if you are using Python 2.
Make the value of each key in diction be a list and extend that list with each iteration. With your code as it is written now when you say diction[pa[0]] = pa[1:] you're overwriting the value in diction[pa[0]] each time the key appears, which describes the behavior you're seeing.
with open("file.txt") as f:
for line in f:
pa = line.split()
try:
diction[pa[0]].extend(pa[1:])
except KeyError:
diction[pa[0]] = pa[1:]
In this code each value of diction will be a list. In each iteration if the key exists that list will be extended with new values from pa giving you a list of all the values for each key.
To do this in a very simple for loop:
with open('file.txt') as f:
return_dict = {}
for item_list in map(str.split, f):
if item_list[0] not in return_dict:
return_dict[item_list[0]] = []
return_dict[item_list[0]].extend(item_list[1:])
return return_dict
Or, if you wanted to use defaultdict in a one liner-ish:
from collections import defaultdict
with open('file.txt') as f:
return_dict = defaultdict(list)
[return_dict[item_list[0]].extend(item_list[1:]) for item_list in map(str.split, f)]
return return_dict

editing a text file in python and making a new one

I have a text file like this:
>ENST00000511961.1|ENSG00000013561.13|OTTHUMG00000129660.5|OTTHUMT00000370661.3|RNF14-003|RNF14|278
MSSEDREAQEDELLALASIYDGDEFRKAESVQGGETRIYLDLPQNFKIFVSGNSNECLQNSGFEYTICFLPPLVLNFELPPDYPSSSPPSFTLSGKWLSPTQLSALCKHLDNLWEEHRGSVVLFAWMQFLKEETLAYLNIVSPFELKIGSQKKVQRRTAQASPNTELDFGGAAGSDVDQEEIVDERAVQDVESLSNLIQEILDFDQAQQIKCFNSKLFLCSICFCEKLGSECMYFLECRHVYCKACLKDYFEIQIRDGQVQCLNCPEPKCPSVATPGQ
>ENST00000506822.1|ENSG00000013561.13|OTTHUMG00000129660.5|OTTHUMT00000370662.1|RNF14-004|GAPDH|132
MSSEDREAQEDELLALASIYDGDEFRKAESVQGGETRIYLDLPQNFKIFVSGNSNECLQNSGFEYTICFLPPLVLNFELPPDYPSSSPPSFTLSGKWLSPTQLSALCKHLDNLWEEHRGSVVLFAWMQFLKE
>ENST00000513019.1|ENSG00000013561.13|OTTHUMG00000129660.5|OTTHUMT00000370663.1|RNF14-005|ACTB|99
MSSEDREAQEDELLALASIYDGDEFRKAESVQGGETRIYLDLPQNFKIFVSGNSNECLQNSGFEYTICFLPPLVLNFELPPDYPSSSPPSFTLSGKWLS
>ENST00000356143.1|ENSG00000013561.13|OTTHUMG00000129660.5|-|RNF14-202|HELLE|474
MSSEDREAQEDELLALASIYDGDEFRKAESVQGGETRIYLDLPQNFKIFVSGNSNECLQNSGFEYTICFLPPLVLNFELPPDYPSSSPPSFTLSGKWLSPTQLSALCKHLDNLWEEHRGSVVLFAWMQFLKEETLAYLNIVSPFELKIGSQKKVQRRTAQASPNTELDFGGAAGSDVDQEEIVDERAVQDVESLSNLIQEILDFDQAQQIKCFNSKLFLCSICFCEKLGSECMYFLECRHVYCKACLKDYFEIQIRDGQVQCLNCPEPKCPSVATPGQVKELVEAELFARYDRLLLQSSLDLMADVVYCPRPCCQLPVMQEPGCTMGICSSCNFAFCTLCRLTYHGVSPCKVTAEKLMDLRNEYLQADEANKRLLDQRYGKRVIQKAL
I want to make a list in python for the 6th element of the lines that start with ">".
to do so, I first make a dictionary in python and then the keys should be the list that I want. like this:
from itertools import groupby
with open('infile.txt') as f:
groups = groupby(f, key=lambda x: not x.startswith(">"))
d = {}
for k,v in groups:
if not k:
key, val = list(v)[0].rstrip(), "".join(map(str.rstrip,next(groups)[1],""))
d[key] = val
k = d.keys()
res = [el[5:] for s in k for el in s.split("|")]
but it returns all elements in the line starts with ">".
do you know how to fix it?
here is expected output:
["RNF14", "GAPDH", "ACTB", "HELLE"]
This should help. ->Using a simple iterattion, str.startswith and str.split
Demo:
res = []
with open(filename, "r") as infile:
for line in infile:
if line.startswith(">"):
val = line.split("|")
res.append(val[5])
print(res)
Output:
['RNF14', 'GAPDH', 'ACTB', 'HELLE']
In you code Replace
res = [el[5:] for s in k for el in s.split("|")]
with
res = [s.split("|")[5] for s in k ] #Should work.
a solution near yours with filter instead of groupby and map
with open('infile.txt') as f:
lines = f.readlines()
groups = filter(lambda x: x.startswith(">"), lines)
res = list(map(lambda x: x.split('|')[5],groups))

python: read dna sequence text file to dictionary

I'm trying to convert a text file containing DNA sequences to a dictionary in python. The file is setup in columns.
TTT F
TCT S
TAT Y
TGT C
TTC F
import os.path
if os.path.isfile("GeneticCode_2.txt"):
f = open('GeneticCode_2.txt', 'r')
my_dict = eval(f.read())
Trying to get it to:
my_dict = {TTT: F, TCT: S, TAT: Y}
You can use the dict constructor using an iterable of pairs (2-tuples) and pass it the split lines of your file:
with open('GeneticCode_2.txt', 'r') as f:
my_dict = dict(line.split() for line in f)
# works only if file only contains lines that split into exactly 2 tokens
d = {}
with open("GeneticCode_2.txt") as infile:
for line in infile:
k,v = line.strip().split()
d[k] = v
This isn't the most compact way of doing it, but it is very readable.
my_dict = dict()
for line in f.readlines():
parts = line.strip().split()
if not len(parts) < 2:
my_dict[parts[0]] = parts[1]

Sort word frequencies by descending order of frequencies

I have a text file that has word frequencies in the format:
word<space>freq
where freq is a number. I want to sort the file such as the frequencies are in descending order. For that, I have tried the following:
Read the file into a dictionary:
kvp = {}
d = {}
with open("/home/melvyn/word_freq.txt") as myfile:
for line in myfile:
word, freq = line.partition(" ")[::2]
kvp[word.strip()] = int(freq)
Sort the dictionary by values:
d = sorted(kvp.items(), key=lambda x:x[1])
Write the sorted dictionary into another text file:
with open('/home/melvyn/word_freq_sorted.txt', 'w') as f:
json.dump(d, f)
I have the following questions:
1. Sorting is not happening. Why?
2. How can I add new line between every key-value pair while doing a json.dump? Is there a cleaner way to write the dictionary contents into the text file?
Instead of json.dump, try writing to the file with file.write, formatting the strings as needed.
import json
kvp = {}
d = {}
with open("a.txt", "r") as f:
for line in f:
word, freq = line.partition(" ")[::2]
kvp[word.strip()] = int(freq)
d = sorted(kvp.items(), key=lambda x:x[1])
with open("b.txt", "w") as f:
for i, v in d:
f.write(str(i) + " " + str(v) + "\n")

Python dictionary created from CSV file should merge the value (integer) whenever the key repeats

I have a file named report_data.csv that contains the following:
user,score
a,10
b,15
c,10
a,10
a,5
b,10
I am creating a dictionary from this file using this code:
with open('report_data.csv') as f:
f.readline() # Skip over the column titles
mydict = dict(csv.reader(f, delimiter=','))
After running this code mydict is:
mydict = {'a':5,'b':10,'c':10}
But I want it to be:
mydict = {'a':25,'b':25,'c':10}
In other words, whenever a key that already exists in mydict is encountered while reading a line of the file, the new value in mydict associated with that key should be the sum of the old value and the integer that appears on that line of the file. How can I do this?
The most straightforward way is to use defaultdict or Counter from useful collections module.
from collections import Counter
summary = Counter()
with open('report_data.csv') as f:
f.readline()
for line in f:
lbl, n = line.split(",")
n = int(n)
summary[lbl] = summary[lbl] + n
One of the most useful features in Counter class is the most_common() function, that is absent from the plain dictionaries and from defaultdict
This should work for you:
with open('report_data.csv') as f:
f.readline()
mydict = {}
for line in csv.reader(f, delimiter=','):
mydict[line[0]] = mydict.get(line[0], 0) + int(line[1])
try this.
mydict = {}
with open('report_data.csv') as f:
f.readline()
x = csv.reader(f, delimiter=',')
for x1 in x:
if mydict.get(x1[0]):
mydict[x1[0]] += int(x1[1])
else:
mydict[x1[0]] = int(x1[1])
print mydict

Categories