Converting a raw list to JSON data with Python - python

I have a raw list sorteddict in the form of :
["with", 1]
["witches", 1]
["witchcraft", 3]
and I want to generate more legible data by making it a JSON object that looks like:
"Frequencies": {
"with": 1,
"witches": 1,
"witchcraft": 3,
"will": 2
}
Unfortunately so far, I have only found a manual way to create data as shown above, and was wondering if there was a much more eloquent way of generating the data rather than my messy script. I got to the point where I needed to retrieve the last item in the list and ensure that there was no comma on the last line before I thought I should seek some advice. Here's what I had:
comma_count = 0
for i in sorteddict:
comma_count += 1
with open("frequency.json", 'w') as f:
json_head = "\"Frequencies\": {\n"
f.write(json_head)
while comma_count > 0:
for s in sorteddict:
f.write('\t\"' + s[0] + '\"' + ":" + str(s[1]) + ",\n")
comma_count -= 1
f.write("}")
I have used json.JSONEncode.encode() which I thought that was what I was looking for, but what ended up happening is "Frequencies" would be prepended to each s[0] item. Any ideas to clean the code?

You need to make a nested dict out of your current one, and use json.dumps. Not sure how sorteddict works, but:
json.dumps({"Frequencies": mySortedDict})
should work.
Additionally, you say that you want something json encoded, but your example is not valid json. So I will assume that you actually want legitimate json.
Here's some example code:
In [4]: import json
In [5]: # No idea what a sorteddict is, we assume it has the same interface as a normal dict.
In [6]: the_dict = dict([
...: ["with", 1],
...: ["witches", 1],
...: ["witchcraft", 3],
...: ])
In [7]: the_dict
Out[7]: {'witchcraft': 3, 'witches': 1, 'with': 1}
In [8]: json.dumps({"Frequencies": the_dict})
Out[8]: '{"Frequencies": {"with": 1, "witches": 1, "witchcraft": 3}}'

I may not be understanding you correctly - but do you just want to turn a list of [word, frequency] lists into a dictionary?
frequency_lists = [
["with", 1],
["witches", 1],
["witchcraft", 3],
]
frequency_dict = dict(frequency_lists)
print(frequency_dict) # {'with': 1, 'witches': 1, 'witchcraft': 3}
If you then want to write this to a file:
import json
with open('frequency.json', 'w') as f:
f.write(json.dumps(frequency_dict))

Related

Removing line break and writing lists without square brackets and comas to a text file in python

I'm facing a few issues with regard to writing some arguments to a text file. Below are the outputs I need to see in my text file.
I want to write an output like this to the text file.
Input:
Hello
World
Output:
HelloWorld
2. I want to write an output like this into a text file.
Input:
[1, 2, 3, 4, 5]
Output:
1,2,3,4,5
I tried several ways to do this but couldn't find a proper way.
Hope to seek some help.
The Code:
progressList = [120, 0, 0] #A variable which wont change. (ie this variable only has this value))
resultList = ['Progress', 'Trailer'] #Each '' represents one user input
#loop for progress
with open("data.txt", "a") as f: # Used append as per my requirement
i = 0 #iterator
while i < len(resultList):
# f.write(resultList)
if resultList[i] == "Progress":
j = 0
f.write("Progress - ")
for j in range(3):
while j < 2:
f.write(', ', join(progressList[j]))
break
if j == 2:
f.write(progressList[j], end='')
break
Output (textfile):
Progress - 120, 0, 0
Thanks.
1st case would be something like this
>>> s = '''hello
... world'''
>>> ''.join(s.split())
'helloworld'
>>>
2nd one is funny
>>> s = "[1, 2, 3, 4, 5]"
>>> exec ('a = ' + s)
>>> ','.join([str(i) for i in a])
'1,2,3,4,5'
hope it helps

data accumulating from csv file using python

out_gate,useless_column,in_gate,num_connect
a,u,b,1
a,s,b,3
b,e,a,2
b,l,c,4
c,e,a,5
c,s,b,5
c,s,b,3
c,c,a,4
d,o,c,2
d,l,c,3
d,u,a,1
d,m,b,2
shown above is a given, sample csv file. First of all, My final goal is to get the answer as a form of csv file like below:
,a,b,c,d
a,0,4,0,0
b,2,0,4,0
c,9,8,0,0
d,1,2,5,0
I am trying to match this each data (a,b,c,d) one by one to the in_gate so, for example when out_gate 'c'-> in_gate 'b', number of connections is 8 and 'c'->'a' becomes 9.
I want to solve it with lists(or tuple, Dictionary, set) or collections. defaultdict WITHOUT USING PANDAS OR NUMPY, and I want a solution that can be applied to many gates (around 10 to 40) as well.
I understand there is a similar question and It helped a lot, but I still have some troubles in compiling. Lastly, Is there any way with using lists of columns and for loop?
((ex) list1=[a,b,c,d],list2=[b,b,a,c,a,b,b,a,c,c,a,b])
what if there are some useless columns that are not related to the data but the final goal remains same?
thanks
I'd use a Counter for this task. To keep the code simple, I'll read the data from a string. And I'll let you figure out how to produce the output as a CSV file in the format of your choice.
import csv
from collections import Counter
data = '''\
out_gate,in_gate,num_connect
a,b,1
a,b,3
b,a,2
b,c,4
c,a,5
c,b,5
c,b,3
c,a,4
d,c,2
d,c,3
d,a,1
d,b,2
'''.splitlines()
reader = csv.reader(data)
#skip header
next(reader)
# A Counter to accumulate the data
counts = Counter()
# Accumulate the data
for ogate, igate, num in reader:
counts[ogate, igate] += int(num)
# We could grab the keys from the data, but it's easier to hard-code them
keys = 'abcd'
# Display the accumulated data
for ogate in keys:
print(ogate, [counts[ogate, igate] for igate in keys])
output
a [0, 4, 0, 0]
b [2, 0, 4, 0]
c [9, 8, 0, 0]
d [1, 2, 5, 0]
If I understand your problem correctly, you could try and using a nested collections.defaultdict for this:
import csv
from collections import defaultdict
d = defaultdict(lambda : defaultdict(int))
with open('gates.csv') as in_file:
csv_reader = csv.reader(in_file)
next(csv_reader)
for row in csv_reader:
outs, ins, connect = row
d[outs][ins] += int(connect)
gates = sorted(d)
for outs in gates:
print(outs, [d[outs][ins] for ins in gates])
Which Outputs:
a [0, 4, 0, 0]
b [2, 0, 4, 0]
c [9, 8, 0, 0]
d [1, 2, 5, 0]

Why is re not working on my file?

I am using a regular expression to remove all the apostrophes in my textfile. I need to encode it in utf-8 for my other functions to work. So when I try this:
import re
import codecs
dataset=[]
with codecs.open(sys.argv[1], 'r', 'utf8') as fil:
for line in fil:
lines=[re.sub("'","",line) for line in fil]
print(lines)
dataset.append(lines.lower().strip().split())
Output:
[] #on printing lines
Traceback (most recent call last):
File "preproc.py", line 112, in <module>
dataset.append(lines.lower().strip().split())
AttributeError: 'list' object has no attribute 'lower'
Textfile contains a string like this: It's an amazing day she's said
It returns the same thing back to me on printing line.
So after a SO chat session, the question is really this. Given a list of lists of words, how do you replace the unicode apostrophe's and maintain the original data structure.
Given this data structure, strip out the \u2019 unicode characters
s = [[u'wasn\u2019t', u'right', u'part', u'say', u'things',
u'she\u2019s', u'hurt', u'terribly', u'she\u2019s',
u'speaking']]
Here's one working example of how to do this:
quotes_to_remove = [u"'", u"\u2019", u"\u2018"]
new_s = []
for line in s:
new_line = []
for word in line:
for quote in quotes_to_remove:
word = word.replace(quote, "")
new_line.append(word)
new_s.append(new_line)
print(new_s)
produces:
[[u'wasnt', u'right', u'part', u'say', u'things', u'shes',
u'hurt', u'terribly', u'shes', u'speaking']]
Also worth noting is that the asker is working in python 2.7.10 and the code provided in this answer is not tested on python 3.
I think it can work like this:
import re
import codecs
with codecs.open("textfile.txt", "r", "utf-8") as f:
for i, line in enumerate(f):
f[i] = re.sub("'","",line)
print(line)
You original method will not assign value to list f successfully.
I have make two easy experiment for you.
1.
list1 = [2,3,5,4,1,1,1,2,2,5,1]
for num in list1:
num = 1
print(list1)
output: [2, 3, 5, 4, 1, 1, 1, 2, 2, 5, 1]
2.
list1 = [2,3,5,4,1,1,1,2,2,5,1]
for i, num in enumerate(list1):
list1[i] = 1
print(list1)
output: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
So that is why your result is wrong. This is not regex question! Hope it helps. :)

Python split list if sequence of numbers is found

I've been trying to find a relevant question, though I can't seem to search for the right words and all I'm finding is how to check if a list contains an intersection.
Basically, I need to split a list once a certain sequence of numbers is found, similar to doing str.split(sequence)[0], but with lists instead. I have working code, though it doesn't seem very efficient (also no idea if raising an error was the right way to go about it), and I'm sure there must be a better way to do it.
For the record, long_list could potentially have a length of a few million values, which is why I think iterating through them all might not be the best idea.
long_list = [2,6,4,2,7,98,32,5,15,4,2,6,43,23,95,10,31,5,1,73]
end_marker = [6,43,23,95]
end_marker_len = len(end_marker)
class SuccessfulTruncate(Exception):
pass
try:
counter = 0
for i in range(len(long_list)):
if long_list[i] == end_marker[counter]:
counter += 1
else:
counter = 0
if counter == end_marker_len:
raise SuccessfulTruncate()
except SuccessfulTruncate:
long_list = long_list[:2 + i - end_marker_len]
else:
raise IndexError('sequence not found')
>>> long_list
[2,6,4,2,7,98,32,5,15,4,2]
Ok, timing a few answers with a big list of 1 million values (the marker is very near the end):
Tim: 3.55 seconds
Mine: 2.7 seconds
Dan: 0.55 seconds
Andrey: 0.28 seconds
Kasramvd: still executing :P
I have working code, though it doesn't seem very efficient (also no idea if raising an error was the right way to go about it), and I'm sure there must be a better way to do it.
I commented on the exception raising in my comment
Instead of raising an exception and catching it in the same try/except you can just omit the try/except and do if counter == end_marker_len: long_list = long_list[:2 + i - end_marker_len]. Successful is not a word thats fitting for an exception name. Exceptions are used to indicate that something failed
Anyway, here is a shorter way:
>>> long_list = [2,6,4,2,7,98,32,5,15,4,2,6,43,23,95,10,31,5,1,73]
>>> end_marker = [6,43,23,95]
>>> index = [i for i in range(len(long_list)) if long_list[i:i+len(end_marker)] == end_marker][0]
>>> long_list[:index]
[2, 6, 4, 2, 7, 98, 32, 5, 15, 4, 2]
List comprehension inspired by this post
As a more pythonic way instead of multiple slicing you can use itertools.islice within a list comprehension :
>>> from itertools import islice
>>> M,N=len(long_list),len(end_maker)
>>> long_list[:next((i for i in range(0,M) if list(islice(long_list,i,i+N))==end_marker),0)]
[2, 6, 4, 2, 7, 98, 32, 5, 15, 4, 2]
Note that since the default value of next function is 0 if it doesn't find any match it will returns the whole of long_list.
In my solution used approach with index method:
input = [2,6,4,2,7,98,32,5,15,4,2,6,43,23,95,10,31,5,1,73]
brk = [6,43,23,95]
brk_len = len(brk)
brk_idx = 0
brk_offset = brk_idx + brk_len
try:
while input[brk_idx:brk_offset] != brk:
brk_idx = input.index(brk[0], brk_idx + 1)
brk_offset = brk_idx + brk_len
except ValueError:
print("Not found")
else:
print(input[:brk_idx])
If the values are of limited range, say fit in bytes (this can also be adapted to larger types), why not then encode the lists so that the string method find could be used:
long_list = [2,6,4,2,7,98,32,5,15,4,2,6,43,23,95,10,31,5,1,73]
end_marker = [6,43,23,95]
import struct
long_list_p = struct.pack('B'*len(long_list), *long_list)
end_marker_p = struct.pack('B'*len(end_marker), *end_marker)
print long_list[:long_list_p.find(end_marker_p)]
Prints:
[2, 6, 4, 2, 7, 98, 32, 5, 15, 4, 2]
I tried using bytes as in but the find method they had didn't work:
print long_list[:bytes(long_list).find(bytes(end_marker))]

Converting strings read from a file into a array or list

I have python code that generates sets of numbers as arrays and stores them in a file, the file looks like follows
set([0, 2, 3])
set([0, 1, 3])
set([0, 1, 2])
I have another python code that reads this file and needs to convert the text line back to a array.
Method to read the file
def get_sets_from_file (self,file_name):
file_handle = open(file_name, "r")
all_sets_from_file = file_handle.read()
print all_sets_from_file
Once the text line is read, I need a mechasism to convert the textline back to a array.
Thanks,
Bhavesh.
EDIT-1:
Based on the suggestions given below, i have changed the file format to use comma-seperated file
set([8, 6, 7]),
set([8, 5, 7]),
set([8, 4, 7]),
set([8, 3, 7]),
you can apply this to each line in your file:
>>> line = "set([0, 2, 3])" #or "set([0, 2, 3]),"
>>> import re
>>> r = "set\(\[(.*)\]\)"
>>> m = re.search(r, line)
>>> match = m.group(1)
>>> a = [int(item.strip()) for item in match.split(',')]
>>> a
[0, 2, 3]
>>>
that could be implemented in your code as:
def get_sets_from_file (self,file_name):
total = []
with open(file_name, "r") as fhdl:
for line in fhdl:
a = do_the_regex_thing_above
total.append(a)
return total
edit (based on the comments from #Droogans):
this code will work perfectly with no change for the csv version of your document as you depicted it in the new edit.
However, the problem would be greatly simplified if you have access to the code that produces the current output. If this is the case, it would be more effective to pickling or jsoning your data. In this way you could recover your sets of list simply by pickle- or json-loading the generated output
It looks like your file contains properly formed python code. You can use this:
read each line of the file into a variable (m)
>>> m = "set([1, 3, 2])"
>>> eval(m)
set([1, 2, 3])
>>>
eval is considered very dangerous because it will do anything you ask it to (like reformat your disk or whatever). But since you know what is in the file you want to evaluate this might be the way for you to go.
If you just want to read and write simple lists of integers to/from a file:
import os
sets = [
set([0, 2, 3]),
set([0, 1, 3]),
set([0, 1, 2]),
]
def write_sets(path, sets):
with open(path, 'wb') as stream:
for item in sets:
item = ' '.join(str(number) for number in item)
stream.write(item + os.linesep)
def read_sets(path, sets):
sets = []
with open(path, 'rb') as stream:
for line in stream:
sets.append(set(int(number) for number in line.split()))
return sets
path = 'tmp/sets.txt'
write_sets(path, sets)
print read_sets(path, sets)
# [set([0, 2, 3]), set([0, 1, 3]), set([0, 1, 2])]
Why don't you serialize your data in something that you can easily deserialize from a string ?
JSON perfectly fits here.

Categories