I have a text file something.txt holds data like :
sql_memory: 300
sql_hostname: server_name
sql_datadir: DEFAULT
i have a dict parameter={"sql_memory":"900", "sql_hostname":"1234" }
I need to replace the values of paramter dict into the txt file , if parameters keys are not matching from keys in txt file then values in txt should left as it is .
For example, sql_datadir is not there in parameter dict . so, no change for the value in txt file.
Here is what I have tried :
import json
def create_json_file():
with open(something.txt_path, 'r') as meta_data:
lines = meta_data.read().splitlines()
lines_key_value = [line.split(':') for line in lines]
final_dict = {}
for lines in lines_key_value:
final_dict[lines[0]] = lines[1]
with open(json_file_path, 'w') as foo:
json.dumps(final_dict,foo, indent=4)
def generate_server_file(parameters):
create_json_file()
with open(json_file_path, 'r') as foo:
server_json_data = json.load(foo)
for keys in parameters:
if keys not in server_json_data:
raise KeyError("Cannot find keys")
# Need to update the paramter in json file
# and convert json file into txt again
x={"sql_memory":"900", "sql_hostname":"1234" }
generate_server_file(x)
Is there a way I can do this without converting the txt file into a JSON ?
Expected output file(something.txt) :
sql_memory: 900
sql_hostname: 1234
sql_datadir: DEFAULT
Using Python 3.6
If you want to import data from a text file use numpy.genfromtxt.
My Code:
import numpy
data = numpy.genfromtxt("something.txt", dtype='str', delimiter=';')
print(data)
something.txt:
Name;Jeff
Age;12
My Output:
[['Name' 'Jeff']
['Age' '12']]
It`s very useful and I use it all of the time.
If your full example is using Python dict literals, a way to do this would be to implement a serializer and a deserializer. Since yours closely follows object literal syntax, you could try using ast.literal_eval, which safely parses a literal from a string. Notice, it will not handle variable names.
import ast
def split_assignment(string):
'''Split on a variable assignment, only splitting on the first =.'''
return string.split('=', 1)
def deserialize_collection(string):
'''Deserialize the collection to a key as a string, and a value as a dict.'''
key, value = split_assignment(string)
return key, ast.literal_eval(value)
def dict_doublequote(dictionary):
'''Print dictionary using double quotes.'''
pairs = [f'"{k}": "{v}"' for k, v in dictionary.items()]
return f'{{{", ".join(pairs)}}}'
def serialize_collection(key, value):
'''Serialize the collection to a string'''
return f'{key}={dict_doublequote(value)}'
And example using the data above produces:
>>> data = 'parameter={"sql_memory":"900", "sql_hostname":"1234" }'
>>> key, value = deserialize_collection(data)
>>> key, value
('parameter', {'sql_memory': '900', 'sql_hostname': '1234'})
>>> serialize_collection(key, value)
'parameter={"sql_memory": "900", "sql_hostname": "1234"}'
Please note you'll probably want to use JSON.dumps rather than the hack I implemented to serialize the value, since it may incorrectly quote some complicated values. If single quotes are fine, a much more preferable solution would be:
def serialize_collection(key, value):
'''Serialize the collection to a string'''
return f'{key}={str(value)}'
Good evening, I want to create a list while reading a text file (historique.txt) which contains list of files associated to each taskid. Considering the following example: my text file contains these lines:
4,file1
4,file2
5,file1
5,file3
5,file4
6,file3
6,file4
(to explain more the content of the text file: 4 is an idtask and file1 is a file used by idtask=4, so basically, task 4 used (file1,file2).
I want to obtain list Transactions=[[file1,file2],[file1,file3,file4],[file3,file4]]
Any help and thank you.
This will not work if the input file is not ordered
Exactly the same idea as #mad_'s answer, just showing the benefit of turning file_data_list to be a list of lists instead of list of strings. We only need to .split each line once which is more readable and probably a bit faster as well.
Note that this can also be done while reading the file instead of after-the-fact like I show below.
from itertools import groupby
file_data_list = ['4,file1',
'4,file2',
'5,file1',
'5,file3',
'5,file4',
'6,file3',
'6,file4']
file_data_list = [line.split(',') for line in file_data_list]
for k, v in groupby(file_data_list, key=lambda x: x[0]):
print([x[1] for x in v]) # also no need to convert v to list
After reading from the file e.g f.readlines() which will give a list similar to below
file_data_list=['4,file1',
'4,file2',
'5,file1',
'5,file3',
'5,file4',
'6,file3',
'6,file4']
Apply groupby
from itertools import groupby
for k,v in groupby(file_data_list,key=lambda x:x.split(",")[0]):
print([i.split(",")[1] for i in list(v)])
Output
['file1', 'file2']
['file1', 'file3', 'file4']
['file3', 'file4']
you can also create a mapping dict
for k,v in groupby(file_data_list,key=lambda x:x.split(",")[0]):
print({k:[i.split(",")[1] for i in list(v)]})
Output
{'4': ['file1', 'file2']}
{'5': ['file1', 'file3', 'file4']}
{'6': ['file3', 'file4']}
As pointed out by #DeepSpace the above solution will work only if the ids are ordered. Modifying if it not ordered
from collections import defaultdict
d=defaultdict(list)
file_data_list=['4,file1',
'4,file2',
'5,file1',
'5,file3',
'5,file4',
'6,file3',
'6,file4',
'4,file3']
for k,v in groupby(file_data_list,key=lambda x:x.split(",")[0]):
for i in list(v):
d[k].append(i.split(",")[1])
print(d)
Output
defaultdict(list,
{'4': ['file1', 'file2', 'file3'],
'5': ['file1', 'file3', 'file4'],
'6': ['file3', 'file4']})
We can use the csv module to process the lines into lists of values.
csv reads from a file-like object, which we can fake using StringIO for an example:
>>> from io import StringIO
>>> contents = StringIO('''4,file1
... 4,file2
... 5,file1
... 5,file3
... 5,file4
... 6,file3
... 6,file4''')
Just to note: depending upon the version of Python you are using you might need to import StringIO differently. The above code works for Python 3. For Python 2, replace the import with from StringIO import StringIO.
csv.reader returns an iterable object. We can consume the whole thing into a list, just to see how it works. Later we will instead iterate over the reader object one line at a time.
We can use pprint to see the results nicely formatted:
>>> import csv
>>> lines = list(csv.reader(contents))
>>> from pprint import pprint
>>> pprint(lines)
[['4', 'file1'],
['4', 'file2'],
['5', 'file1'],
['5', 'file3'],
['5', 'file4'],
['6', 'file3'],
['6', 'file4']]
These lists can then be unpacked into a task and filename:
>>> task, filename = ['4', 'file1']
>>> task
'4'
>>> filename
'file1'
We want to build lists of filenames having the same task as key.
To efficiently organise this we can use a dictionary. The efficiency is because we can ask the dictionary to find a list of values for a given key. It will store the keys in some sort of a tree and searching the tree is quicker than a linear search.
The first time we look to add a value to the dictionary for a particular key, we would need to check to see whether it already exists.
If not we would add an empty list and append the new value to it. Otherwise we would just add the value to the existing list for the given key.
This pattern is so common that Python's builtin dictionary has a method dict.setdefault to help us achieve this.
However, I don't like the name, or the non-uniform syntax. You can read the linked documentation if you like, but I'd rather use
Python's defaultdict instead. This automatically creates a default value for a key if it doesn't already exist when you query it.
We create a defaultdict with a list as default:
>>> from collections import defaultdict
>>> d = defaultdict(list)
Then for any new key it will create an empty list for us:
>>> d['5']
[]
We can append to the list:
>>> d['5'].append('file1')
>>> d['7'].append('file2')
>>> d['7'].append('file3')
I'll convert the defaultdict to a dict just to make it pprint more nicely:
>>> pprint(dict(d), width=30)
{'5': ['file1'],
'7': ['file2', 'file3']}
So, putting all this together:
import csv
from collections import defaultdict
from io import StringIO
from pprint import pprint
contents = StringIO('''4,file1
4,file2
5,file1
5,file3
5,file4
6,file3
6,file4''')
task_transactions = defaultdict(list)
for row in csv.reader(contents):
task, filename = row
task_transactions[task].append(filename)
pprint(dict(task_transactions))
Output:
{'4': ['file1', 'file2'],
'5': ['file1', 'file3', 'file4'],
'6': ['file3', 'file4']}
Some final notes: In the example we've used StringIO to fake the file contents. You'll probably want to replace that in your actual code with something like:
with open('historique.txt') as contents:
for row in csv.reader(contents):
... # etc
Also, where we take each row from the csv reader, and then unpack it into a task and filename, we could do that all in one go:
for task, filename in csv.reader(contents):
So your whole code (without printing) would be quite simple:
import csv
from collections import defaultdict
task_transactions = defaultdict(list)
with open('historique.txt') as contents:
for task, filename in csv.reader(contents):
task_transactions[task].append(filename)
If you want a list of transactions (as you asked in the question!):
transactions = list(task_transactions.values())
However, this may not be in the same order of tasks as the original file. If that's important to you, clarify the question, and comment so I can help.
An alternate solution without using the groupby library
(This solution does exactly what #mad_'s does, however it is more readable, especially for someone who is a beginner):
As #mad_ said, the read list will be as follows:
data=[
'4,file1',
'4,file2',
'5,file1',
'5,file3',
'5,file4',
'6,file3',
'6,file4']
You could loop over the data, and create a dict
transactions = defaultdict(list)
for element in data: #data[i] is the idtask, data[i+1] is the file
id, file = element.split(',')
transactions[id].append(file)
Transactions will now contain the dictionary:
{'4': ['file1', 'file2']
'5': ['file1', 'file3', 'file4']
'6': ['file3', 'file4']}
How can I convert this text file to json? Ultimately, I'll be inserting the json blobs into a NoSQL database, but for now I plan to parse the text files and build a python dict, then dump to json.
I think there has to be a way to do this with a dict comprehension that I'm just not seeing/following (I'm new to python).
Example of a file:
file_1.txt
[namespace1] => metric_A = value1
[namespace1] => metric_B = value2
[namespace2] => metric_A = value3
[namespace2] => metric_B = value4
[namespace2] => metric_B = value5
Example of dict I want to build to convert to json:
{ "file1" : {
"namespace1" : {
"metric_A" : "value_1",
"metric_B" : "value_2"
},
"namespace2" : {
"metric_A" : "value_3",
"metric_B" : ["value4", "value5"]
}
}
I currently have this working, but my code is a total mess (and much more complex than this example w/ clean up etc). I'm basically going line by line through the file, building a python dict. I check each namespace for existence in the dict, if it exists, i check the metric. If the metric exists already, I know I have duplicates and need to convert the value to an array that contains the existing value and my new value(s). There has to be a more simple/clean way.
import glob
import json
answer = {}
for fname in glob.glob(file_*.txt): # loop over all filenames
answer[fname] = {}
with open(fname) as infile:
for line in infile:
line = line.strip()
if not line: continue
splits = line.split()[::2]
splits[0] = splits[0][1:-1]
namespace, metric, value = splits # all the values in the line that we're interested in
answer[fname].get(namespace, {})[metric] = value # populate the dict
required_json = json.dumps(answer) # turn the dict into proper JSON
You can use regex for that. re.findall('\w+', line) will find all text groups which you are after, then the rest is saving it in the dictionary of dictionary. The simplest way to do that is to use defaultdict from collections.
import re
from collections import defaultdict
answer = defaultdict(lambda: defaultdict(lambda: []))
with open('file_1.txt', 'r') as f:
for line in f:
namespace, metric, value = re.findall(r'\w+', line)
answer[namespace][metric].append(value)
As we know, that we expect exactly 3 alphanum groups, we assign it to 3 variable, i.e. namespace, metric, value. Finally, defaultdict will return defaultdict for the case when we see namespace first time, and the inner defaultdict will return an empty array for first append, making code more compact.
There seem to be some problem with finditer(), I am repeatedly searching for a pattern in a line using finditer() and I need to maintain the order in which they are gathered, following is my code for it,
names = collections.OrderedDict()
line1 = 'XPAC3出口$<zho>$ASDSA1出口$<chn>$ExitA2$<eng>$YUTY1出口$<fre>'
names = {n.group(2):n.group(1) for n in re.finditer("\$?(.*?)\$<(.*?)>", line1, re.UNICODE)}
And then I am printing it out,
for key, value in names.iteritems():
print key, ' ',value
And the output turns out to be
fre YUTY1出口
chn ASDSA1出口
zho XPAC3出口
eng ExitA2
But I need the following order,
zho XPAC3出口
chn ASDSA1出口
eng ExitA2
fre YUTY1出口
How to go ahead? DO i need to change regex or use something other than finditer()
You rewrite the names dictionary with your dictionary comprehension and regular dictionary doesnt preserve the insert order. To preserve the order return list and give it to OrderedDict like this:
import collection
import re
line1 = 'XPAC3出口$<zho>$ASDSA1出口$<chn>$ExitA2$<eng>$YUTY1出口$<fre>'
names = [(n.group(2), n.group(1)) for n in re.finditer("\$?(.*?)\$<(.*?)>", line1, re.UNICODE)]
names = collections.OrderedDict(names)
for key, value in names.iteritems():
print key, ' ',value
When you say
names = {...}
You are dropping the reference to the empty OrderedDict (which will be garbage collected) and rebinding names to a regular dict (which is unordered of course)
You should pass your matches to the constructor of the OrderedDict
names = collections.OrderedDict((n.group(2), n.group(1)) for n in re.finditer("\$?(.*?)\$<(.*?)>", line1, re.UNICODE))