I want to convert a string into a dictionary. I saved this dictionary previously in a text file.
The problem is now, that I am not sure, how the structure of the keys are. The values are generated with Counter(dictionaryName). The dictionary is really large, so I cannot check every key to see how it would be possible.
The keys can contain simple quotes like ', double quotes ", commas and maybe other characters. So is there any possibility to convert it back into a dictionary?
For example this is stored in the file:
Counter({'element0':512, "'4,5'element1":50, '4:55foobar':23,...})
I found previous solutions with for example json, but I have problems with the double quotes and I cannot simply split for the commas.
If you trust the source, load from collections import Counter and eval() the string
How about something like:
>> from collections import Counter
>> line = '''Counter({'element0':512, "'4,5'element1":50, '4:55foobar':23})'''
>> D = eval(line)
>> D
Counter({"'4,5'element1": 50, '4:55foobar': 23, 'element0': 512})
You could remove the Counter( and ) parts, then parse the rest with ast.literal_eval as long as it only involves basic Python data types:
import ast
def parse_Counter_string(s):
s = s.strip()
if not (s.startswith('Counter(') and s.endswith(')')):
raise ValueError('String does not match expected format')
# Counter( is 8 characters
# 12345678
s = s[8:-1]
return Counter(ast.literal_eval(s))
In the future, I recommend picking a different way to serialize your data.
you can use demjson library for doing this, you can have the text directly in your program
import demjson
counter = demjson.decode("enter your text here")
if it is in the file ,you can do the following steps :
WD = dirname(realpath(__file__))
file = open(WD, "filename"), "r")
counter = demjson.decode(file.read())
file.close()
Related
I have a script that reads through a log file that contains hundreds of these logs, and looks for the ones that have a "On, Off, or Switch" type. Then I output each log into its own list. I'm trying to find a way to extract the Out and In times into a separate list/array and then subtract the two times to find the duration of each separate log. This is what the outputted logs look like:
['2020-01-31T12:04:57.976Z 1234 Out: [2020-01-31T00:30:20.150Z] Id: {"Id":"4-f-4-9-6a"', '"Type":"Switch"', '"In":"2020-01-31T00:30:20.140Z"']
This is my current code:
logfile = '/path/to/my/logfile'
with open(logfile, 'r') as f:
text = f.read()
words = ["On", "Off", "Switch"]
text2 = text.split('\n')
for l in text.split('\n'):
if (words[0] in l or words[1] in l or words[2] in l):
log = l.split(',')[0:3]
I'm stuck on how to target only the Out and In time values from the logs and put them in an array and convert to a time value to find duration.
Initial log before script: everything after the "In" time is useless for what I'm looking for so I only have the first three indices outputted
2020-01-31T12:04:57.976Z 1234 Out: [2020-01-31T00:30:20.150Z] Id: {"Id":"4-f-4-9-6a","Type":"Switch,"In":"2020-01-31T00:30:20.140Z","Path":"interface","message":"interface changed status from unknown to normal","severity":"INFORMATIONAL","display":true,"json_map":"{\"severity\":null,\"eventId\":\"65e-64d9-45-ab62-8ef98ac5e60d\",\"componentPath\":\"interface_css\",\"displayToGui\":false,\"originalState\":\"unknown\",\"closed\":false,\"eventType\":\"InterfaceStateChange\",\"time\":\"2019-04-18T07:04:32.747Z\",\"json_map\":null,\"message\":\"interface_css changed status from unknown to normal\",\"newState\":\"normal\",\"info\":\"Event created with current status\"}","closed":false,"info":"Event created with current status","originalState":"unknown","newState":"normal"}
Below is a possible solution. The wordmatch line is a bit of a hack, until I find something clearer: it's just a one-liner that create an empty or 1-element set of True if one of the words matches.
(Untested)
import re
logfile = '/path/to/my/logfile'
words = ["On", "Off", "Switch"]
dateformat = r'\d{4}\-\d{2}\-\d{2}T\d{2}:\d{2}:\d{2}\.\d+[Zz]?'
pattern = fr'Out:\s*\[(?P<out>{dateformat})\].*In":\s*\"(?P<in>{dateformat})\"'
regex = re.compile(pattern)
with open(logfile, 'r') as f:
for line in f:
wordmatch = set(filter(None, (word in s for word in words)))
if wordmatch:
match = regex.search(line)
if match:
intime = match.group('in')
outtime = match.group('out')
# whatever to store these strings, e.g., append to list or insert in a dict.
As noted, your log example is very awkward, so this works for the example line, but may not work for every line. Adjust as necessary.
I have also not included (if so wanted), a conversion to a datetime.datetime object. For that, read through the datetime module documentation, in particular datetime.strptime. (Alternatively, you may want to store your results in a Pandas table. In that case, read through the Pandas documentation on how to convert strings to actual datetime objects.)
You also don't need to read nad split on newlines yourself: for line in f will do that for you (provided f is indeed a filehandle).
Regex is probably the way to go (fastness, efficiency etc.) ... but ...
You could take a very simplistic (if very inefficient) approach of cleaning your data:
join all of it into a string
replace things that hinder easy parsing
split wisely and filter the split
like so:
data = ['2020-01-31T12:04:57.976Z 1234 Out: [2020-01-31T00:30:20.150Z] Id: {"Id":"4-f-4-9-6a"', '"Type":"Switch"', '"In":"2020-01-31T00:30:20.140Z"']
all_text = " ".join(data)
# this is inefficient and will create throwaway intermediate strings - if you are
# in a hurry or operate on 100s of MB of data, this is NOT the way to go, unless
# you have time
# iterate pairs of ("bad thing", "what to replace it with") (or list of bad things)
for thing in [ (": ",":"), (list('[]{}"'),"") ]:
whatt = thing[0]
withh = thing[1]
# if list, do so for each bad thing
if isinstance(whatt, list):
for p in whatt:
# replace it
all_text = all_text.replace(p,withh)
else:
all_text = all_text.replace(whatt,withh)
# format is now far better suited to splitting/filtering
cleaned = [a for a in all_text.split(" ")
if any(a.startswith(prefix) or "Switch" in a
for prefix in {"In:","Switch:","Out:"})]
print(cleaned)
Outputs:
['Out:2020-01-31T00:30:20.150Z', 'Type:Switch', 'In:2020-01-31T00:30:20.140Z']
After cleaning your data would look like:
2020-01-31T12:04:57.976Z 1234 Out:2020-01-31T00:30:20.150Z Id:Id:4-f-4-9-6a Type:Switch In:2020-01-31T00:30:20.140Z
You can transform the clean list into a dictionary for ease of lookup:
d = dict( part.split(":",1) for part in cleaned)
print(d)
will produce:
{'In': '2020-01-31T00:30:20.140Z',
'Type': 'Switch',
'Out': '2020-01-31T00:30:20.150Z'}
You can use datetime module to parse the times from your values as shown in 0 0 post.
I need to read a list of strings from a binary file and create a python list.
I'm using the below command to extract data from binary file:
tmp = f.read(100)
abc, = struct.unpack('100c',tmp)
The data that I can see in variable 'abc' is exactly as shown below, but I need to get the below data into a python list as strings.
Data that I need as a list: 'UsrVal' 'VdetHC' 'VcupHC' ..... 'Gravity_Axis'
b'UsrVal\x00VdetHC\x00VcupHC\x00VdirHC\x00HdirHC\x00UpFlwHC\x00UxHC\x00UyHC\x00UzHC\x00VresHC\x00UxRP\x00UyRP\x00UzRP\x00VresRP\x00Gravity_Axis'
Here is how i would suggest you to do it with one liner.
You need to decode binary string and then you can do a split based on "\x00" which will return the list you are looking for.
e.g
my_binary_out = b'UsrVal\x00VdetHC\x00VcupHC\x00VdirHC\x00HdirHC\x00UpFlwHC\x00UxHC\x00UyHC\x00UzHC\x00VresHC\x00UxRP\x00UyRP\x00UzRP\x00VresRP\x00Gravity_Axis'
decoded_list = my_binary_out.decode("latin1", 'ignore').split('\x00')
#or
decoded_list = my_binary_out.decode("cp1252", 'ignore').split('\x00')
Output Will look like this :
['UsrVal', 'VdetHC', 'VcupHC', 'VdirHC', 'HdirHC', 'UpFlwHC', 'UxHC', 'UyHC', 'UzHC', 'VresHC', 'UxRP', 'UyRP', 'UzRP', 'VresRP', 'Gravity_Axis']
Hope this helps
If you're going for a quick and messy way here, AND assuming your string
b'UsrVal\x00VdetHC\x00VcupHC\x00VdirHC\x00HdirHC\x00UpFlwHC\x00UxHC\x00UyHC\x00UzHC\x00VresHC\x00UxRP\x00UyRP\x00UzRP\x00VresRP\x00Gravity_Axis'
is in fact interpreted as
" b'UsrVal\x00VdetHC\x00VcupHC\x00VdirHC\x00HdirHC\x00UpFlwHC\x00UxHC\x00UyHC\x00UzHC\x00VresHC\x00UxRP\x00UyRP\x00UzRP\x00VresRP\x00Gravity_Axis' "
Then the following few lines of code result with 'b' having the array you want.
a = {YourStringHere}
b = a[2:-1].split("\x00")
I'm having trouble getting anything to write in my outut file (word_count.txt).
I expect the script to review all 500 phrases in my phrases.txt document, and output a list of all the words and how many times they appear.
from re import findall,sub
from os import listdir
from collections import Counter
# path to folder containg all the files
str_dir_folder = '../data'
# name and location of output file
str_output_file = '../data/word_count.txt'
# the list where all the words will be placed
list_file_data = '../data/phrases.txt'
# loop through all the files in the directory
for str_each_file in listdir(str_dir_folder):
if str_each_file.endswith('data'):
# open file and read
with open(str_dir_folder+str_each_file,'r') as file_r_data:
str_file_data = file_r_data.read()
# add data to list
list_file_data.append(str_file_data)
# clean all the data so that we don't have all the nasty bits in it
str_full_data = ' '.join(list_file_data)
str_clean1 = sub('t','',str_full_data)
str_clean_data = sub('n',' ',str_clean1)
# find all the words and put them into a list
list_all_words = findall('w+',str_clean_data)
# dictionary with all the times a word has been used
dict_word_count = Counter(list_all_words)
# put data in a list, ready for output file
list_output_data = []
for str_each_item in dict_word_count:
str_word = str_each_item
int_freq = dict_word_count[str_each_item]
str_out_line = '"%s",%d' % (str_word,int_freq)
# populates output list
list_output_data.append(str_out_line)
# create output file, write data, close it
file_w_output = open(str_output_file,'w')
file_w_output.write('n'.join(list_output_data))
file_w_output.close()
Any help would be great (especially if I'm able to actually output 'single' words within the output list.
thanks very much.
Would be helpful if we got more information such as what you've tried and what sorts of error messages you received. As kaveh commented above, this code has some major indentation issues. Once I got around those, there were a number of other logic errors to work through. I've made some assumptions:
list_file_data is assigned to '../data/phrases.txt' but there is then a
loop through all file in a directory. Since you don't have any handling for
multiple files elsewhere, I've removed that logic and referenced the
file listed in list_file_data (and added a small bit of error
handling). If you do want to walk through a directory, I'd suggest
using os.walk() (http://www.tutorialspoint.com/python/os_walk.htm)
You named your file 'pharses.txt' but then check for if the files
that endswith 'data'. I've removed this logic.
You've placed the data set into a list when findall works just fine with strings and ignores special characters that you've manually removed. Test here:
https://regex101.com/ to make sure.
Changed 'w+' to '\w+' - check out the above link
Converting to a list outside of the output loop isn't necessary - your dict_word_count is a Counter object which has an 'iteritems' method to roll through each key and value. Also changed the variable name to 'counter_word_count' to be slightly more accurate.
Instead of manually generating csv's, I've imported csv and utilized the writerow method (and quoting options)
Code below, hope this helps:
import csv
import os
from collections import Counter
from re import findall,sub
# name and location of output file
str_output_file = '../data/word_count.txt'
# the list where all the words will be placed
list_file_data = '../data/phrases.txt'
if not os.path.exists(list_file_data):
raise OSError('File {} does not exist.'.format(list_file_data))
with open(list_file_data, 'r') as file_r_data:
str_file_data = file_r_data.read()
# find all the words and put them into a list
list_all_words = findall('\w+',str_file_data)
# dictionary with all the times a word has been used
counter_word_count = Counter(list_all_words)
with open(str_output_file, 'w') as output_file:
fieldnames = ['word', 'freq']
writer = csv.writer(output_file, quoting=csv.QUOTE_ALL)
writer.writerow(fieldnames)
for key, value in counter_word_count.iteritems():
output_row = [key, value]
writer.writerow(output_row)
Something like this?
from collections import Counter
from glob import glob
def extract_words_from_line(s):
# make this as complicated as you want for extracting words from a line
return s.strip().split()
tally = sum(
(Counter(extract_words_from_line(line))
for infile in glob('../data/*.data')
for line in open(infile)),
Counter())
for k in sorted(tally, key=tally.get, reverse=True):
print k, tally[k]
Here is how I dump a file
with open('es_hosts.json', 'w') as fp:
json.dump(','.join(host_list.keys()), fp)
The results is
"a,b,c"
I would like:
a,b,c
Thanks
Before doing a string replace, you might want to strip the quotation marks:
print '"a,b,c"'.strip('"')
Output:
a,b,c
That's closer to what you want to achieve. Even just removing the first and the last character works: '"a,b,c"'[1:-1].
But have you looked into this question?
To remove the quotation marks in the keys only, which may be important if you are parsing it later (presumably with some tolerant parser or maybe you just pipe it directly into node for bizarre reasons), you could try the following regex.
re.sub(r'(?<!: )"(\S*?)"', '\\1', json_string)
One issue is that this regex expects fields to be seperated key: value and it will fail for key:value. You could make it work for the latter with a minor change, but similarly it won't work for variable amounts of whitespace after :
There may be other edge cases but it will work with outputs of json.dumps, however the results will not be parseable by json. Some more tolerant parsers like yaml might be able to read the results.
import re
regex = r'(?<!: )"(\S*?)"'
o = {"noquotes" : 127, "put quotes here" : "and here", "but_not" : "there"}
s = json.dumps(o)
s2 = json.dumps(o, indent=3)
strip_s = re.sub(regex,'\\1',s)
strip_s2 = re.sub(regex,'\\1',s2)
print(strip_s)
print(strip_s2)
assert(json.loads(strip_s) == json.loads(s) == json.loads(strip_s2) == json.loads(s2) == object)
Will raise a ValueError but prints what you want.
Well, that's not valid json, so the json module won't help you to write that data. But you can do this:
import json
with open('es_hosts.json', 'w') as fp:
data = ['a', 'b', 'c']
fp.write(json.dumps(','.join(data)).replace('"', ''))
That's because you asked for json, but since that's not json, this should suffice:
with open('es_hosts.json', 'w') as fp:
data = ['a', 'b', 'c']
fp.write(','.join(data))
Use python's built-in string replace function
with open('es_hosts.json', 'w') as fp:
json.dump(','.join(host_list.keys()).replace('\"',''), fp)
Just use for loop to assign list to string.
import json
with open('json_file') as f:
data = json.loads(f.read())
for value_wo_bracket in data['key_name']:
print(value_wo_bracket)
Note there is difference between json.load and json.loads
So I have a bunch of line of codes like these in a row in my program:
str = str.replace('ten', '10s')
str = str.replace('twy', '20s')
str = str.replave('fy', '40s')
...
I want to make it such that I don't have to manually open my source file to add new cases. For example ('sy', '70'). I know I have to put all these in a function somehow, but I'd like to map cases that are not in my "mapper lib" from the command line. Configuration file maybe? how?
Thanks!
You could use a config file in json format like this:
[
["ten", "10s"],
["twy", "20s"],
["fy", "40s"]
]
Save it as 'replacements.json' and then use it this way:
import json
with open('replacements.json') as i:
replacements = json.load(i)
text = 'ten, twy, fy'
for r in replacements:
text = text.replace(r[0], r[1])
Then when you need to change the values just edit the replacements.json file without touching any Python code.
The format for you replacements file could be anything but json is easy to use and edit.
a simple solution could be to put those in a file, read them in your program and do your replaces in a loop..
Many ways to do this, if it's a rarely changing thing you could consider doing it with a Python dict:
mappings = {
'ten': '10s',
'twy': '20s',
'fy': '40s',
}
def replace(str_):
for s, r in mappings.iteritems():
str_.replace(s, r)
return str_
Alternatively in a Text file (make sure you use a safe delimiter which isn't used in any of the keys!)
mappings.txt
ten|10s
twy|20s
fy|40s
And the Python part:
mappings = {}
for line in open('mappings.txt'):
k, v = line.split('|', 1)
mappings[k] = v
And use the replace from above :)
You could use csv to store the replacements in a human-editable form in a file:
import csv
with open('replacements.csv', 'rb') as f:
replacements = list(csv.reader(f))
for old, new in replacements:
your_string = your_string.replace(old, new)
where replacements.csv:
ten,10s
twy,20s
fy,40s
It avoids unnecessary markup such as ", [] in the json format and allows a delimiter (,) to be present in a string itself unlike the plain text format from #WoLpH's answer.
(live example)