How to sort file contents into list

How to sort file contents into list - python

I need a solution to sort my file like the following:
Super:1,4,6
Superboy:2,4,9
My file at the moment looks like this:
Super:1
Super:4
Super:6
I need help to keep track of the scores for each member of the class obtains in the quiz. There are
three classes in the school and the data needs to be kept separately for each class.
My code is below:
className = className +(".txt")#This adds .txt to the end of the file so the user is able to create a file under the name of their chosen name.
file = open(className , 'a') #opens the file in 'append' mode so you don't delete all the information
name = (name)
file.write(str(name + " : " )) #writes the information to the file
file.write(str(score))
file.write('\n')
file.close() #safely closes the file to save the information

You can use a dict to group the data, in particular a collections.OrderedDict to keep the order the names are seen in the original file:
from collections import OrderedDict
with open("class.txt") as f:
od = OrderedDict()
for line in f:
# n = name, s = score
n,s = line.rstrip().split(":")
# if n in dict append score to list
# or create key/value pairing and append
od.setdefault(n, []).append(s)
It is just a matter of writing the dict keys and values to a file to get the output you want using the csv module to give you nice comma separated output.
from collections import OrderedDict
import csv
with open("class.txt") as f, open("whatever.txt","w") as out:
od = OrderedDict()
for line in f:
n,s = line.rstrip().split(":")
od.setdefault(n, []).append(s)
wr = csv.writer(out)
wr.writerows([k]+v for k,v in od.items())
If you want to update the original files, you can write to a tempfile.NamedTemporaryFile and replace the original with the updated using shutil.move:
from collections import OrderedDict
import csv
from tempfile import NamedTemporaryFile
from shutil import move
with open("class.txt") as f, NamedTemporaryFile("w",dir=".",delete=False) as out:
od = OrderedDict()
for line in f:
n, s = line.rstrip().split(":")
od.setdefault(n, []).append(s)
wr = csv.writer(out)
wr.writerows([k]+v for k,v in od.items())
# replace original file
move(out.name,"class.txt")
If you have more than one class just use a loop:
classes = ["foocls","barcls","foobarcls"]
for cls in classes:
with open("{}.txt".format(cls)) as f, NamedTemporaryFile("w",dir=".",delete=False) as out:
od = OrderedDict()
for line in f:
n, s = line.rstrip().split(":")
od.setdefault(n, []).append(s)
wr = csv.writer(out)
wr.writerows([k]+v for k,v in od.items())
move(out.name,"{}.txt".format(cls))

I'll provide some pseudocode to help you out.
First your data structure should look like this:
data = {'name': [score1, score2, score3]}
Then the logic you should follow should be something like this:
Read the file line-by-line
if name is already in dict:
append score to list. example: data[name].append(score)
if name is not in dict:
create new dict entry. example: data[name] = [score]
Iterate over dictionary and write each line to file

Related

Making dictionary in dictionary to separate data by the same values in one column and then from second column

I am new in Python and I am stuck with one problem for a few days now. I made a script that:
-takes data from CSV file -sort it by same values in first column of data file
-instert sorted data in specifield line in different template text file
-save the file in as many copies as there are different values in first column from data file This picture below show how it works:
But there are two more things I need to do. When in separate files as showed above, there are some of the same values from second column of the data file, then this file should insert value from third column instead of repeating the same value from second column. On the picture below I showed how it should look like:
What I also need is to add somewhere separeted value of first column from data file by "_".
There is datafile:
111_0,3005,QWE
111_0,3006,SDE
111_0,3006,LFR
111_1,3005,QWE
111_1,5345,JTR
112_0,3103,JPP
112_0,3343,PDK
113_0,2137,TRE
113_0,2137,OMG
and there is code i made:
import shutil
with open("data.csv") as f:
contents = f.read()
contents = contents.splitlines()
values_per_baseline = dict()
for line in contents:
key = line.split(',')[0]
values = line.split(',')[1:]
if key not in values_per_baseline:
values_per_baseline[key] = []
values_per_baseline[key].append(values)
for file in values_per_baseline.keys():
x = 3
shutil.copyfile("of.txt", (f"of_%s.txt" % file))
filename = f"of_%s.txt" % file
for values in values_per_baseline[file]:
with open(filename, "r") as f:
contents = f.readlines()
contents.insert(x, ' o = ' + values[0] + '\n ' + 'a = ' + values[1] +'\n')
with open(filename, "w") as f:
contents = "".join(contents)
f.write(contents)
f.close()
I have been trying to make something like a dictionary of dictionaries of lists but I can't implement it in correct way to make it works. Any help or suggestion will be much appreciated.

You could try the following:
import csv
from collections import defaultdict
values_per_baseline = defaultdict(lambda: defaultdict(list))
with open("data.csv", "r") as file:
for key1, key2, value in csv.reader(file):
values_per_baseline[key1][key2].append(value)
x = 3
for filekey, content in values_per_baseline.items():
with open("of.txt", "r") as fin,\
open(f"of_{filekey}.txt", "w") as fout:
fout.writelines(next(fin) for _ in range(x))
for key, values in content.items():
fout.write(
f' o = {key}\n'
+ ' a = '
+ ' <COMMA> '.join(values)
+ '\n'
)
fout.writelines(fin)
The input-reading part is using the csv module from the standard library (for convenience) and a defaultdict. The file is read into a nested dictionary.

Content of datafile.csv:
111_0,3005,QWE
111_0,3006,SDE
111_0,3006,LFR
111_1,3005,QWE
111_1,5345,JTR
112_0,3103,JPP
112_0,3343,PDK
113_0,2137,TRE
113_0,2137,OMG
Possible solution is the following:
def nested_list_to_dict(lst):
result = {}
subgroup = {}
if all(len(l) == 3 for l in lst):
for first, second, third in lst:
result.setdefault(first, []).append((second, third))
for k, v in result.items():
for item1, item2 in v:
subgroup.setdefault(item1, []).append(item2.strip())
result[k] = subgroup
subgroup = {}
else:
print("Input data must have 3 items like '111_0,3005,QWE'")
return result
with open("datafile.csv", "r", encoding="utf-8") as f:
content = f.read().splitlines()
data = nested_list_to_dict([line.split(',') for line in content])
print(data)
# ... rest of your code ....
Prints
{'111_0': {'3005': ['QWE'], '3006': ['SDE', 'LFR']},
'111_1': {'3005': ['QWE'], '5345': ['JTR']},
'112_0': {'3103': ['JPP'], '3343': ['PDK']},
'113_0': {'2137': ['TRE', 'OMG']}}

Merging 2 json files

I'm trying to merge both json files but I'm trying to append timestamp from file2 to corresponding frame number in file1.please guide.
JSON_FILE1
{"frameNumber":1,"classifications":[],"objects":[{"featureId":"ckotybs4v00033b68edh8a6o5","schemaId":"ckoto8fzm16gj0y7uesrd0nzt","title":"Person 1","value":"person_1","color":"#1CE6FF","keyframe":true,"bbox":{"top":157,"left":581,"height":390,"width":297},"classifications":[]}]}
{"frameNumber":2,"classifications":[],"objects":[{"featureId":"ckotybs4v00033b68edh8a6o5","schemaId":"ckoto8fzm16gj0y7uesrd0nzt","title":"Person 1","value":"person_1","color":"#1CE6FF","keyframe":false,"bbox":{"top":157,"left":581,"height":390.36,"width":297.16},"classifications":[]}]}
{"frameNumber":3,"classifications":[],"objects":[{"featureId":"ckotybs4v00033b68edh8a6o5","schemaId":"ckoto8fzm16gj0y7uesrd0nzt","title":"Person 1","value":"person_1","color":"#1CE6FF","keyframe":false,"bbox":{"top":157,"left":581,"height":390.72,"width":297.32},"classifications":[]}]}
{"frameNumber":4,"classifications":[],"objects":[{"featureId":"ckotybs4v00033b68edh8a6o5","schemaId":"ckoto8fzm16gj0y7uesrd0nzt","title":"Person 1","value":"person_1","color":"#1CE6FF","keyframe":false,"bbox":{"top":157,"left":581,"height":391.08,"width":297.48},"classifications":[]}]}
{"frameNumber":5,"classifications":[],"objects":[{"featureId":"ckotybs4v00033b68edh8a6o5","schemaId":"ckoto8fzm16gj0y7uesrd0nzt","title":"Person 1","value":"person_1","color":"#1CE6FF","keyframe":false,"bbox":{"top":157,"left":581,"height":391.44,"width":297.64},"classifications":[]}]}
JSON_FILE2
{
"frame1": "0:0:0:66",
"frame2": "0:0:0:100",
"frame3": "0:0:0:133",
"frame4": "0:0:0:166",
"frame5": "0:0:0:200"
}
expected output:
{"frameNumber":1,"frame1": "0:0:0:66",,"classifications":[],"objects":[{"featureId":"ckotybs4v00033b68edh8a6o5","schemaId":"ckoto8fzm16gj0y7uesrd0nzt","title":"Person 1","value":"person_1","color":"#1CE6FF","keyframe":true,"bbox":{"top":157,"left":581,"height":390,"width":297},"classifications":[]}]}
{"frameNumber":2, "frame2": "0:0:0:10,"classifications":[],"objects":[{"featureId":"ckotybs4v00033b68edh8a6o5","schemaId":"ckoto8fzm16gj0y7uesrd0nzt","title":"Person 1","value":"person_1","color":"#1CE6FF","keyframe":false,"bbox":{"top":157,"left":581,"height":390.36,"width":297.16},"classifications":[]}]}
{"frameNumber":3,"frame3": "0:0:0:133,"classifications":[],"objects":[{"featureId":"ckotybs4v00033b68edh8a6o5","schemaId":"ckoto8fzm16gj0y7uesrd0nzt","title":"Person 1","value":"person_1","color":"#1CE6FF","keyframe":false,"bbox":{"top":157,"left":581,"height":390.72,"width":297.32},"classifications":[]}]}
{"frameNumber":4,"frame4": "0:0:0:166","classifications":[],"objects":[{"featureId":"ckotybs4v00033b68edh8a6o5","schemaId":"ckoto8fzm16gj0y7uesrd0nzt","title":"Person 1","value":"person_1","color":"#1CE6FF","keyframe":false,"bbox":{"top":157,"left":581,"height":391.08,"width":297.48},"classifications":[]}]}
{"frameNumber":5,"frame5": "0:0:0:200","classifications":[],"objects":[{"featureId":"ckotybs4v00033b68edh8a6o5","schemaId":"ckoto8fzm16gj0y7uesrd0nzt","title":"Person 1","value":"person_1","color":"#1CE6FF","keyframe":false,"bbox":{"top":157,"left":581,"height":391.44,"width":297.64},"classification
I tried this way but I am unable to achieve.
import json
import glob
result = []
for f in glob.glob("*.json"):
with open(f,"rb") as infile:
result.append(json.load(infile))
with open("merged_file.json","wb") as outfile:
json.dump(result,outfile)

A correct .json needs a pair of [] and than you could json.load it, iterate over ever line and do the same like below but anyway:
The easiest solution is turn every line in a dict, if the framenumber matches add the timestamp and write it back.
def fuse(file1, file2, nTargetPath):
with open(nTargetPath, "wb") as tTargetFile:
with open(file1, "rb") as tSourceFileA:
for tLineA in tSourceFileA.readlines():
tDictA = json.loads(tLineA) #loads dict from a string
tKey = "frame"+tDictA["frameNumber"] #searching the correct entry but why not name this timestampX
with open(file2, "rb") as tSourceFileB:
for tLineB in tSourceFileB.readlines():
tDictB = json.loads(tLineB )
if tKey in tDictB:
tDictA[tKey] = tDictB[tKey]
break #cause there is only one timestamp
tTargetFile.write(json.dumps(tDictA)+'\n')
This code cann easily updated by improve the file accessing for example when you know the key for the timestamp in file2 is everytime in the same row as in file1 and so on.

As was pointed out, one file is ndjson and the other file is json. You need to implement some logic to add the json to the ndjson
# https://pypi.org/project/ndjson/
# pip install ndjson
import ndjson
import json
with open('path/to/file/im_a_ndjson.ndjson') as infile:
ndjson_object = ndjson.load(infile)
with open('path/to/file/json_file2.json') as infile:
dict_object = json.load(infile)
print(type(ndjson_object[0]['frameNumber']))
# output: <class 'int'>
for key in dict_object:
# int needed as you can see above
framenumber = int(key.strip('frame'))
# find the matching ndjson object
for ndjs in ndjson_object:
if ndjs['frameNumber'] == framenumber:
# add the key/value pair
ndjs[key] = dict_object[key]
# we can break as we've found it
break
with open('path/to/file/new_ndjson.ndjson', 'w') as outfile:
ndjson.dump(ndjson_object, outfile)

Same python code block gives different outputs at different time

I want to create a word dictionary. The dictionary looks like
words_meanings= {
"rekindle": "relight",
"pesky":"annoying",
"verge": "border",
"maneuver": "activity",
"accountability":"responsibility",
}
keys_letter=[]
for x in words_meanings:
keys_letter.append(x)
print(keys_letter)
Output: rekindle , pesky, verge, maneuver, accountability
Here rekindle , pesky, verge, maneuver, accountability they are the keys and relight, annoying, border, activity, responsibility they are the values.
Now I want to create a csv file and my code will take input from the file.
The file looks like
rekindle | pesky | verge | maneuver | accountability
relight | annoying| border| activity | responsibility
So far I use this code to load the file and read data from it.
from google.colab import files
uploaded = files.upload()
import pandas as pd
data = pd.read_csv("words.csv")
data.head()
import csv
reader = csv.DictReader(open("words.csv", 'r'))
words_meanings = []
for line in reader:
words_meanings.append(line)
print(words_meanings)
This is the output of print(words_meanings)
[OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]
It looks very odd to me.
keys_letter=[]
for x in words_meanings:
keys_letter.append(x)
print(keys_letter)
Now I create an empty list and want to append only key values. But the output is [OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]
I am confused. As per the first code block it only included keys but now it includes both keys and their values. How can I overcome this situation?

I would suggest that you format your csv with your key and value on the same row. Like this
rekindle,relight
pesky,annoying
verge,border
This way the following code will work.
words_meanings = {}
with open(file_name, 'r') as file:
for line in file.readlines():
key, value = line.split(",")
word_meanings[key] = value.rstrip("\n")
if you want a list of the keys:
list_of_keys = list(word_meanings.keys())
To add keys and values to the file:
def add_values(key:str, value:str, file_name:str):
with open(file_name, 'a') as file:
file.writelines(f"\n{key},{value}")
key = input("Input the key you want to save: ")
value = input(f"Input the value you want to save to {key}:")
add_values(key, value, file_name)```

You run the same block of code but you use it with different objects and this gives different results.
First you use normal dictionary (check type(words_meanings))
words_meanings = {
"rekindle": "relight",
"pesky":"annoying",
"verge": "border",
"maneuver": "activity",
"accountability":"responsibility",
}
and for-loop gives you keys from this dictionary
You could get the same with
keys_letter = list(words_meanings.keys())
or even
keys_letter = list(words_meanings)
Later you use list with single dictionary inside this list (check type(words_meanings))
words_meanings = [OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]
and for-loop gives you elements from this list, not keys from dictionary which is inside this list. So you move full dictionary from one list to another.
You could get the same with
keys_letter = words_meanings.copy()
or even the same
keys_letter = list(words_meanings)
from collections import OrderedDict
words_meanings = {
"rekindle": "relight",
"pesky":"annoying",
"verge": "border",
"maneuver": "activity",
"accountability":"responsibility",
}
print(type(words_meanings))
keys_letter = []
for x in words_meanings:
keys_letter.append(x)
print(keys_letter)
#keys_letter = list(words_meanings.keys())
keys_letter = list(words_meanings)
print(keys_letter)
words_meanings = [OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]
print(type(words_meanings))
keys_letter = []
for x in words_meanings:
keys_letter.append(x)
print(keys_letter)
#keys_letter = words_meanings.copy()
keys_letter = list(words_meanings)
print(keys_letter)

The default field separator for the csv module is a comma. Your CSV file uses the pipe or bar symbol |, and the fields also seem to be fixed width. So, you need to specify | as the delimiter to use when creating the CSV reader.
Also, your CSV file is encoded as Big-endian UTF-16 Unicode text (UTF-16-BE). The file contains a byte-order-mark (BOM) but Python is not stripping it off, so you will notice the string '\ufeffrekindle' contains the FEFF UTF-16-BE BOM. That can be dealt with by specifying encoding='utf16' when you open the file.
import csv
with open('words.csv', newline='', encoding='utf-16') as f:
reader = csv.DictReader(f, delimiter='|', skipinitialspace=True)
for row in reader:
print(row)
Running this on your CSV file produces this:
{'rekindle ': 'relight ', 'pesky ': 'annoying', 'verge ': 'border', 'maneuver ': 'activity ', 'accountability': 'responsibility'}
Notice that there is trailing whitespace in the key and values. skipinitialspace=True removed the leading whitespace, but there is no option to remove the trailing whitespace. That can be fixed by exporting the CSV file from Excel without specifying a field width. If that can't be done, then it can be fixed by preprocessing the file using a generator:
import csv
def preprocess_csv(f, delimiter=','):
# assumes that fields can not contain embedded new lines
for line in f:
yield delimiter.join(field.strip() for field in line.split(delimiter))
with open('words.csv', newline='', encoding='utf-16') as f:
reader = csv.DictReader(preprocess_csv(f, '|'), delimiter='|', skipinitialspace=True)
for row in reader:
print(row)
which now outputs the stripped keys and values:
{'rekindle': 'relight', 'pesky': 'annoying', 'verge': 'border', 'maneuver': 'activity', 'accountability': 'responsibility'}

As I found that no one able to help me with the answer. Finally, I post the answer here. Hope this will help other.
import csv
file_name="words.csv"
words_meanings = {}
with open(file_name, newline='', encoding='utf-8-sig') as file:
for line in file.readlines():
key, value = line.split(",")
words_meanings[key] = value.rstrip("\n")
print(words_meanings)
This is the code to transfer a csv to a dictionary. Enjoy!!!

Import dictionary from txt file

I have a function CalcPearson that needs 2 dictionaries as input. The dictionaries are in txt files in the following format:
(22, 0.4271125909116274)
(14, 0.4212051728881959)
(3, 0.4144765342960289)
(26, 0.41114433561925906)
(39, 0.41043882384484764)
.....
How can I import the data from the files as dictionaries? Do I need to modify them or there is a simple function for this?
I tried with this code:
inf = open('Route_cc.txt','r')
inf2 = open('Route_h2.txt','r')
d1 = eval(inf.read())
d2 = eval(inf2.read())
print(calcPearson(d1,d2))
inf.close()
But I got an invalid syntax error at the second row of the first file that the code opened so I think I need a particular syntax in the file.

If you're certain that you are looking for a dictionary, you can use something like this:
inf = open('Route_cc.txt', 'r')
content = inf.read().splitlines()
for line in range(content):
content[line] = content[line].strip('(').strip(')')
content[line] = content[line].split(', ')
inf_dict = dict(content)
Or more condensed:
inf = open('Route_cc.txt', 'r')
content = inf.read().splitlines()
inf_dict = dict(i.strip('(').strip(')').split(', ') for i in content)
Another option:
import re
inf = open('Route_cc.txt', 'r')
content = inf.read()
inf_dict = dict(i.split(', ') for i in re.findall("[^()\n-]+", content))
Note: Your original use of eval is unsafe and a poor practice.

Since you've mentioned that your dictionaries are in txt files, you'll have to tokenize your input by splitting into key/value pairs.
Read the file, line by line.
Remove the leading and trailing braces.
Split the stripped line using a comma as a delimiter.
Add each line to your dictionary.
I've written this code, and tested it for the sample input you have given. Have a look.
import collections
def addToDictionary(dict, key, value):
if key in dict:
print("Key already exists")
else:
dict[key] = value
def displayDictionary(dict):
dict = collections.OrderedDict(sorted(dict.items()))
for k, v in dict.items():
print(k, v)
filename = "dictionary1.txt"
f = open(filename, 'r')
dict1 = {}
for line in f:
line = line.lstrip('(')
line = line.rstrip(')\n')
tokenizedLine = line.split(', ')
addToDictionary(dict1, tokenizedLine[0], tokenizedLine[1])
displayDictionary(dict1)

Don't use eval it is dangerous (see the dangers of eval). Instead, use ast.literal_eval.
You can't create a dictionary directly from an input as you have given it. You have to go through the lines, one by one, convert them into a zip object and add it to a dictionary.
This process is shown below.
Code:
import ast
inf = open('Route_cc.txt','r')
d1 = {}
for line in inf:
zipi = ast.literal_eval(line)
d1[zipi[0]] = zipi[1]
inf2 = open('Route_h2.txt','r')
d2 = {}
for line1 in inf2:
zipi1 = ast.literal_eval(line1)
d2[zipi1[0]] = zipi1[1]
print(calcPearson(d1, d2))
inf.close()
inf2.close()

How do I create add new items to a dictionary while in a loop?

I'm writing a program that reads names and statistics related to those names from a file. Each line of the file is another person and their stats. For each person, I'd like to make their last name a key and everything else linked to that key in the dictionary. The program first stores data from the file in an array and then I'm trying to get those array elements into the dictionary, but I'm not sure how to do that. Plus I'm not sure if each time the for loop iterates, it will overwrite the previous contents of the dictionary. Here's the code I'm using to attempt this:
f = open("people.in", "r")
tmp = None
people
l = f.readline()
while l:
tmp = l.split(',')
print tmp
people = {tmp[2] : tmp[0])
l = f.readline()
people['Smith']
The error I'm currently getting is that the syntax is incorrect, however I have no idea how to transfer the array elements into the dictionary other than like this.

Use key assignment:
people = {}
for line in f:
tmp = l.rstrip('\n').split(',')
people[tmp[2]] = tmp[0]
This loops over the file object directly, no need for .readline() calls here, and removes the newline.
You appear to have CSV data; you could also use the csv module here:
import csv
people = {}
with open("people.in", "rb") as f:
reader = csv.reader(f)
for row in reader:
people[row[2]] = row[0]
or even a dict comprehension:
import csv
with open("people.in", "rb") as f:
reader = csv.reader(f)
people = {r[2]: r[0] for r in reader}
Here the csv module takes care of the splitting and removing newlines.
The syntax error stems from trying close the opening { with a ) instead of }:
people = {tmp[2] : tmp[0]) # should be }
If you need to collect multiple entries per row[2] value, collect these in a list; a collections.defaultdict instance makes that easier:
import csv
from collections import defaultdict
people = defaultdict(list)
with open("people.in", "rb") as f:
reader = csv.reader(f)
for row in reader:
people[row[2]].append(row[0])

In repsonse to Generalkidd's comment above, multiple people with the same last time, an addition to Martijn Pieter's solution, posted as an answer for better formatting:
import csv
people = {}
with open("people.in", "rb") as f:
reader = csv.reader(f)
for row in reader:
if not row[2] in people:
people[row[2]] = list()
people[row[2]].append(row[0])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to sort file contents into list - python

Related

Making dictionary in dictionary to separate data by the same values in one column and then from second column

Merging 2 json files

Same python code block gives different outputs at different time

Import dictionary from txt file

How do I create add new items to a dictionary while in a loop?

Categories

Resources