Constructing peculiar dictionary out of file (python) - python

I'd like to automaticaly form a dictionary from files that have the following structure.
str11 str12 str13
str21 str22
str31 str32 str33 str34
...
that is, two, three or four strings each line, with spaces in between. The dictionary I'd like to construct out of this list must have following structure:
{str11:(str12,str13),str21:(str22),str31:(str32,str33,str34), ... }
(that is, all entries str*1 are the keys -- all of them different -- and the remaining ones are the values). What can I use?

>>> with open('abc') as f:
... dic = {}
... for line in f:
... key, val = line.split(None,1)
... dic[key] = tuple(val.split())
...
>>> dic
{'str31': ('str32', 'str33', 'str34'),
'str21': ('str22',),
'str11': ('str12', 'str13')}
If you want the order of items to be preserved then consider using OrderedDict:
>>> from collections import OrderedDict
>>> with open('abc') as f:
dic = OrderedDict()
for line in f:
key, val = line.split(None,1)
dic[key] = tuple(val.split())
...
>>> dic
OrderedDict([
('str11', ('str12', 'str13')),
('str21', ('str22',)),
('str31', ('str32', 'str33', 'str34'))
])

Using a StringIO instance for simplicity:
import io
fobj = io.StringIO("""str11 str12 str13
str21 str22
str31 str32 str33 str34""")
One line does the trick:
>>> {line.split(None, 1)[0]: tuple(line.split()[1:]) for line in fobj}
{'str11': ('str12', 'str13'),
'str21': ('str22',),
'str31': ('str32', 'str33', 'str34')}
Note the line.split(None, 1). This limits the splitting to one item because we have to use .split() twice in a dict comprehension. We cannot store intermediate results for reuse as in a loop. The None means split at any whitespace.
For an OrderedDict you can also get away with one line using a generator expression:
from collections import OrderedDict
>>> OrderedDict((line.split(None, 1)[0], tuple(line.split()[1:]))
for line in fobj)
OrderedDict([('str11', ('str12', 'str13')), ('str21', ('str22',)),
('str31', ('str32', 'str33', 'str34'))])

Related

how to replace the values of a dict in a txt file in python

I have a text file something.txt holds data like :
sql_memory: 300
sql_hostname: server_name
sql_datadir: DEFAULT
i have a dict parameter={"sql_memory":"900", "sql_hostname":"1234" }
I need to replace the values of paramter dict into the txt file , if parameters keys are not matching from keys in txt file then values in txt should left as it is .
For example, sql_datadir is not there in parameter dict . so, no change for the value in txt file.
Here is what I have tried :
import json
def create_json_file():
with open(something.txt_path, 'r') as meta_data:
lines = meta_data.read().splitlines()
lines_key_value = [line.split(':') for line in lines]
final_dict = {}
for lines in lines_key_value:
final_dict[lines[0]] = lines[1]
with open(json_file_path, 'w') as foo:
json.dumps(final_dict,foo, indent=4)
def generate_server_file(parameters):
create_json_file()
with open(json_file_path, 'r') as foo:
server_json_data = json.load(foo)
for keys in parameters:
if keys not in server_json_data:
raise KeyError("Cannot find keys")
# Need to update the paramter in json file
# and convert json file into txt again
x={"sql_memory":"900", "sql_hostname":"1234" }
generate_server_file(x)
Is there a way I can do this without converting the txt file into a JSON ?
Expected output file(something.txt) :
sql_memory: 900
sql_hostname: 1234
sql_datadir: DEFAULT
Using Python 3.6
If you want to import data from a text file use numpy.genfromtxt.
My Code:
import numpy
data = numpy.genfromtxt("something.txt", dtype='str', delimiter=';')
print(data)
something.txt:
Name;Jeff
Age;12
My Output:
[['Name' 'Jeff']
['Age' '12']]
It`s very useful and I use it all of the time.
If your full example is using Python dict literals, a way to do this would be to implement a serializer and a deserializer. Since yours closely follows object literal syntax, you could try using ast.literal_eval, which safely parses a literal from a string. Notice, it will not handle variable names.
import ast
def split_assignment(string):
'''Split on a variable assignment, only splitting on the first =.'''
return string.split('=', 1)
def deserialize_collection(string):
'''Deserialize the collection to a key as a string, and a value as a dict.'''
key, value = split_assignment(string)
return key, ast.literal_eval(value)
def dict_doublequote(dictionary):
'''Print dictionary using double quotes.'''
pairs = [f'"{k}": "{v}"' for k, v in dictionary.items()]
return f'{{{", ".join(pairs)}}}'
def serialize_collection(key, value):
'''Serialize the collection to a string'''
return f'{key}={dict_doublequote(value)}'
And example using the data above produces:
>>> data = 'parameter={"sql_memory":"900", "sql_hostname":"1234" }'
>>> key, value = deserialize_collection(data)
>>> key, value
('parameter', {'sql_memory': '900', 'sql_hostname': '1234'})
>>> serialize_collection(key, value)
'parameter={"sql_memory": "900", "sql_hostname": "1234"}'
Please note you'll probably want to use JSON.dumps rather than the hack I implemented to serialize the value, since it may incorrectly quote some complicated values. If single quotes are fine, a much more preferable solution would be:
def serialize_collection(key, value):
'''Serialize the collection to a string'''
return f'{key}={str(value)}'

Create a list containing files of each transaction

Good evening, I want to create a list while reading a text file (historique.txt) which contains list of files associated to each taskid. Considering the following example: my text file contains these lines:
4,file1
4,file2
5,file1
5,file3
5,file4
6,file3
6,file4
(to explain more the content of the text file: 4 is an idtask and file1 is a file used by idtask=4, so basically, task 4 used (file1,file2).
I want to obtain list Transactions=[[file1,file2],[file1,file3,file4],[file3,file4]]
Any help and thank you.
This will not work if the input file is not ordered
Exactly the same idea as #mad_'s answer, just showing the benefit of turning file_data_list to be a list of lists instead of list of strings. We only need to .split each line once which is more readable and probably a bit faster as well.
Note that this can also be done while reading the file instead of after-the-fact like I show below.
from itertools import groupby
file_data_list = ['4,file1',
'4,file2',
'5,file1',
'5,file3',
'5,file4',
'6,file3',
'6,file4']
file_data_list = [line.split(',') for line in file_data_list]
for k, v in groupby(file_data_list, key=lambda x: x[0]):
print([x[1] for x in v]) # also no need to convert v to list
After reading from the file e.g f.readlines() which will give a list similar to below
file_data_list=['4,file1',
'4,file2',
'5,file1',
'5,file3',
'5,file4',
'6,file3',
'6,file4']
Apply groupby
from itertools import groupby
for k,v in groupby(file_data_list,key=lambda x:x.split(",")[0]):
print([i.split(",")[1] for i in list(v)])
Output
['file1', 'file2']
['file1', 'file3', 'file4']
['file3', 'file4']
you can also create a mapping dict
for k,v in groupby(file_data_list,key=lambda x:x.split(",")[0]):
print({k:[i.split(",")[1] for i in list(v)]})
Output
{'4': ['file1', 'file2']}
{'5': ['file1', 'file3', 'file4']}
{'6': ['file3', 'file4']}
As pointed out by #DeepSpace the above solution will work only if the ids are ordered. Modifying if it not ordered
from collections import defaultdict
d=defaultdict(list)
file_data_list=['4,file1',
'4,file2',
'5,file1',
'5,file3',
'5,file4',
'6,file3',
'6,file4',
'4,file3']
for k,v in groupby(file_data_list,key=lambda x:x.split(",")[0]):
for i in list(v):
d[k].append(i.split(",")[1])
print(d)
Output
defaultdict(list,
{'4': ['file1', 'file2', 'file3'],
'5': ['file1', 'file3', 'file4'],
'6': ['file3', 'file4']})
We can use the csv module to process the lines into lists of values.
csv reads from a file-like object, which we can fake using StringIO for an example:
>>> from io import StringIO
>>> contents = StringIO('''4,file1
... 4,file2
... 5,file1
... 5,file3
... 5,file4
... 6,file3
... 6,file4''')
Just to note: depending upon the version of Python you are using you might need to import StringIO differently. The above code works for Python 3. For Python 2, replace the import with from StringIO import StringIO.
csv.reader returns an iterable object. We can consume the whole thing into a list, just to see how it works. Later we will instead iterate over the reader object one line at a time.
We can use pprint to see the results nicely formatted:
>>> import csv
>>> lines = list(csv.reader(contents))
>>> from pprint import pprint
>>> pprint(lines)
[['4', 'file1'],
['4', 'file2'],
['5', 'file1'],
['5', 'file3'],
['5', 'file4'],
['6', 'file3'],
['6', 'file4']]
These lists can then be unpacked into a task and filename:
>>> task, filename = ['4', 'file1']
>>> task
'4'
>>> filename
'file1'
We want to build lists of filenames having the same task as key.
To efficiently organise this we can use a dictionary. The efficiency is because we can ask the dictionary to find a list of values for a given key. It will store the keys in some sort of a tree and searching the tree is quicker than a linear search.
The first time we look to add a value to the dictionary for a particular key, we would need to check to see whether it already exists.
If not we would add an empty list and append the new value to it. Otherwise we would just add the value to the existing list for the given key.
This pattern is so common that Python's builtin dictionary has a method dict.setdefault to help us achieve this.
However, I don't like the name, or the non-uniform syntax. You can read the linked documentation if you like, but I'd rather use
Python's defaultdict instead. This automatically creates a default value for a key if it doesn't already exist when you query it.
We create a defaultdict with a list as default:
>>> from collections import defaultdict
>>> d = defaultdict(list)
Then for any new key it will create an empty list for us:
>>> d['5']
[]
We can append to the list:
>>> d['5'].append('file1')
>>> d['7'].append('file2')
>>> d['7'].append('file3')
I'll convert the defaultdict to a dict just to make it pprint more nicely:
>>> pprint(dict(d), width=30)
{'5': ['file1'],
'7': ['file2', 'file3']}
So, putting all this together:
import csv
from collections import defaultdict
from io import StringIO
from pprint import pprint
contents = StringIO('''4,file1
4,file2
5,file1
5,file3
5,file4
6,file3
6,file4''')
task_transactions = defaultdict(list)
for row in csv.reader(contents):
task, filename = row
task_transactions[task].append(filename)
pprint(dict(task_transactions))
Output:
{'4': ['file1', 'file2'],
'5': ['file1', 'file3', 'file4'],
'6': ['file3', 'file4']}
Some final notes: In the example we've used StringIO to fake the file contents. You'll probably want to replace that in your actual code with something like:
with open('historique.txt') as contents:
for row in csv.reader(contents):
... # etc
Also, where we take each row from the csv reader, and then unpack it into a task and filename, we could do that all in one go:
for task, filename in csv.reader(contents):
So your whole code (without printing) would be quite simple:
import csv
from collections import defaultdict
task_transactions = defaultdict(list)
with open('historique.txt') as contents:
for task, filename in csv.reader(contents):
task_transactions[task].append(filename)
If you want a list of transactions (as you asked in the question!):
transactions = list(task_transactions.values())
However, this may not be in the same order of tasks as the original file. If that's important to you, clarify the question, and comment so I can help.
An alternate solution without using the groupby library
(This solution does exactly what #mad_'s does, however it is more readable, especially for someone who is a beginner):
As #mad_ said, the read list will be as follows:
data=[
'4,file1',
'4,file2',
'5,file1',
'5,file3',
'5,file4',
'6,file3',
'6,file4']
You could loop over the data, and create a dict
transactions = defaultdict(list)
for element in data: #data[i] is the idtask, data[i+1] is the file
id, file = element.split(',')
transactions[id].append(file)
Transactions will now contain the dictionary:
{'4': ['file1', 'file2']
'5': ['file1', 'file3', 'file4']
'6': ['file3', 'file4']}

Python - convert text file to dict and convert to json

How can I convert this text file to json? Ultimately, I'll be inserting the json blobs into a NoSQL database, but for now I plan to parse the text files and build a python dict, then dump to json.
I think there has to be a way to do this with a dict comprehension that I'm just not seeing/following (I'm new to python).
Example of a file:
file_1.txt
[namespace1] => metric_A = value1
[namespace1] => metric_B = value2
[namespace2] => metric_A = value3
[namespace2] => metric_B = value4
[namespace2] => metric_B = value5
Example of dict I want to build to convert to json:
{ "file1" : {
"namespace1" : {
"metric_A" : "value_1",
"metric_B" : "value_2"
},
"namespace2" : {
"metric_A" : "value_3",
"metric_B" : ["value4", "value5"]
}
}
I currently have this working, but my code is a total mess (and much more complex than this example w/ clean up etc). I'm basically going line by line through the file, building a python dict. I check each namespace for existence in the dict, if it exists, i check the metric. If the metric exists already, I know I have duplicates and need to convert the value to an array that contains the existing value and my new value(s). There has to be a more simple/clean way.
import glob
import json
answer = {}
for fname in glob.glob(file_*.txt): # loop over all filenames
answer[fname] = {}
with open(fname) as infile:
for line in infile:
line = line.strip()
if not line: continue
splits = line.split()[::2]
splits[0] = splits[0][1:-1]
namespace, metric, value = splits # all the values in the line that we're interested in
answer[fname].get(namespace, {})[metric] = value # populate the dict
required_json = json.dumps(answer) # turn the dict into proper JSON
You can use regex for that. re.findall('\w+', line) will find all text groups which you are after, then the rest is saving it in the dictionary of dictionary. The simplest way to do that is to use defaultdict from collections.
import re
from collections import defaultdict
answer = defaultdict(lambda: defaultdict(lambda: []))
with open('file_1.txt', 'r') as f:
for line in f:
namespace, metric, value = re.findall(r'\w+', line)
answer[namespace][metric].append(value)
As we know, that we expect exactly 3 alphanum groups, we assign it to 3 variable, i.e. namespace, metric, value. Finally, defaultdict will return defaultdict for the case when we see namespace first time, and the inner defaultdict will return an empty array for first append, making code more compact.

python dictionary creating keys on letters and values on frequency it appears

I have to read text that looks like
TCCATCTACT
GGGCCTTCCT
TCCATCTACC
etc...
I want to create a dictionary, how can I read through this and set T, C, A, or G as the key and the values is the frequency that letter
appeared throughout the text?
Simply pass the whole string to a collections.Counter() object and it'll count each character.
It may be more efficient to do so line by line, so as not to require too much memory:
from collections import Counter
counts = Counter()
with open('inputtextfilename') as infh:
for line in infh:
counts.update(line.strip())
The str.strip() call removes any whitespace (such as the newline character).
A quick demo using your sample input:
>>> from collections import Counter
>>> sample = '''\
... TCCATCTACT
... GGGCCTTCCT
... TCCATCTACC
... '''.splitlines(True)
>>> counts = Counter()
>>> for line in sample:
... counts.update(line.strip())
...
>>> for letter, count in counts.most_common():
... print(letter, count)
...
C 13
T 10
A 4
G 3
I used the Counter.most_common() method to get a sorted list of letter-count pairs (in order from most to least common).

How to maintain order with finditer ()

There seem to be some problem with finditer(), I am repeatedly searching for a pattern in a line using finditer() and I need to maintain the order in which they are gathered, following is my code for it,
names = collections.OrderedDict()
line1 = 'XPAC3出口$<zho>$ASDSA1出口$<chn>$ExitA2$<eng>$YUTY1出口$<fre>'
names = {n.group(2):n.group(1) for n in re.finditer("\$?(.*?)\$<(.*?)>", line1, re.UNICODE)}
And then I am printing it out,
for key, value in names.iteritems():
print key, ' ',value
And the output turns out to be
fre YUTY1出口
chn ASDSA1出口
zho XPAC3出口
eng ExitA2
But I need the following order,
zho XPAC3出口
chn ASDSA1出口
eng ExitA2
fre YUTY1出口
How to go ahead? DO i need to change regex or use something other than finditer()
You rewrite the names dictionary with your dictionary comprehension and regular dictionary doesnt preserve the insert order. To preserve the order return list and give it to OrderedDict like this:
import collection
import re
line1 = 'XPAC3出口$<zho>$ASDSA1出口$<chn>$ExitA2$<eng>$YUTY1出口$<fre>'
names = [(n.group(2), n.group(1)) for n in re.finditer("\$?(.*?)\$<(.*?)>", line1, re.UNICODE)]
names = collections.OrderedDict(names)
for key, value in names.iteritems():
print key, ' ',value
When you say
names = {...}
You are dropping the reference to the empty OrderedDict (which will be garbage collected) and rebinding names to a regular dict (which is unordered of course)
You should pass your matches to the constructor of the OrderedDict
names = collections.OrderedDict((n.group(2), n.group(1)) for n in re.finditer("\$?(.*?)\$<(.*?)>", line1, re.UNICODE))

Categories