Python list data filtering

Python list data filtering - python

I have a list that holds names of files, some of which are almost identical except for their timestamp string section. The list is in the format of [name-subname-timestamp] for example:
myList = ['name1-001-20211202811.txt', 'name1-001-202112021010.txt', 'name1-002-202112021010.txt', 'name2-002-202112020811.txt']
What I need is a list that holds for every name and subname, the most recent file derived by the timestamp. I have started by creating a list that holds every [name-subname]:
name_subname_list = []
for row in myList:
name_subname_list.append((row.rpartition('-')[0]))
name_subname_list = set(name_subname_list) # {'name1-001', 'name2-002', 'name1-002'}
Not sure if it is the right approach, moreover I am not sure how to continue. Any ideas?

This code is what you asked for:
For each name-subname, you will have the corresponding newest file:
from datetime import datetime as dt
dic = {}
for i in myList:
sp = i.split('-')
name_subname = sp[0]+'-'+sp[1]
mytime = sp[2].split('.')[0]
if name_subname not in dic:
dic[name_subname] = mytime
else:
if dt.strptime(mytime, "%Y%m%d%H%M") > dt.strptime(dic[name_subname], "%Y%m%d%H%M"):
dic[name_subname] = mytime
result = []
for name_subname in dic:
result.append(name_subname+'-'+dic[name_subname]+'.txt')
which out puts resutl to be like:
['name1-001-202112021010.txt',
'name1-002-202112021010.txt',
'name2-002-202112020811.txt']

Try this:
myList = ['name1-001-20211202811.txt', 'name1-001-202112021010.txt', 'name1-002-202112021010.txt', 'name2-002-202112020811.txt']
dic = {}
for name in myList:
parts = name.split('-')
dic.setdefault(parts[0] + '-' + parts[1], []).append(parts[2])
unique_list = []
for key,value in dic.items():
unique_list.append(key + '-' + max(value))

Related

How to link lists in order

I have multiple lists, the first index of each list are related the second as well so on and so fourth. I need a way of linking the order of these two lists together. so i have a list of teams (some are duplicate) i need an if statement that says: if theres a duplicate of this, then compare this to the duplicate and take the related value in the other list and choose the better one
import sys
import itertools
from itertools import islice
fileLocation = input("Input the file location of ScoreBoard: ")
T = []
N = []
L = []
timestamps = []
teamids = []
problemids = []
inputids = []
scores = []
dictionary = {}
amountOfLines = len(open('input1.txt').readlines())
with open('input1.txt') as input1:
for line in islice(input1, 2, amountOfLines):
parsed = line.strip().split()
timestamps.append(parsed[0])
teamids.append(parsed[1])
problemids.append(parsed[2])
inputids.append(parsed[3])
scores.append(parsed[4])
def checkIfDuplicates(teamids):
''' Check if given list contains any duplicates '''
if len(teamids) == len(set(teamids)):
return False
else:
return True
for i in teamids:
if checkIfDuplicates(i):
dictionary['team%s' % i] = {}
if dictionary < amountOfTeams:
dictionary['team%s' %]
for i in score:
dictionary[teamid][]
print(dictionary)

loop through each list item
delete item if duplicate
for i in list1:
for k in list2:
if i == k:
list.remove(i)

Trying to input text and values from excel into Python Lists

This is my code:
for i in range(1, maxRows+1):
nameContent = str(sheet.cell(row = i, column = 1).value)
nameList = []
nameList.append(nameContent)
print(nameList)
rateContent = float(sheet.cell(row = i, column = 3).value)
rateList = []
rateList.append(rateContent)
print(rateList)
hoursContent = float(sheet.cell(row = i, column = 2).value)
hoursList = []
hoursList.append(hoursContent)
print(hoursList)
When I print each list, it only print the most recent text/value. How do I keep all values/text in the list so I can work with the lists in later code?
Note: I am using the openpyxl module

You're redefining the list inside of the loop. This means at every iteration it resets the list to an empty list. Try to put the list definitions (namelist = []) outside of the loop.

Replace item in string formatted as csv line

Goal is to replace the second field of csv_line with new_item in an elegant way. This question is different from the topics listed by Rawing because here we are working with a different data structure, though we can use other topics to get inspired.
# Please assume that csv_line has not been imported from a file.
csv_line = 'unknown_item1,unknown_old_item2,unknown_item3'
new_item = 'unknown_new_item2'
goal = 'unknown_item1,unknown_new_item2,unknown_item3'
# Works but error prone. Non-replaced items could be inadvertently swapped.
# In addition, not convenient if string has many fields.
item1, item2, item3 = csv_line.split(',')
result = ','.join([item1, new_item, item3])
print(result) # unknown_item1,unknown_new_item2,unknown_item3
# Less error prone but ugly.
result_list = []
new_item_idx = 1
for i, item in enumerate(csv_line.split(',')):
result_list += [item] if i != new_item_idx else [new_item]
result = ','.join(result_list)
print(result) # unknown_item1,unknown_new_item2,unknown_item3
# Ideal (not-error prone) but not working.
csv_line.split(',')[1] = new_item
print(csv_line) # unknown_item1,unknown_old_item2,unknown_item3

The second item could be replaced using Python's CSV library by making use of io.StringIO() objects. This behave like files but can be read as a string:
import csv
import io
csv_line = 'unknown_item1,unknown_old_item2,unknown_item3'
new_item = 'unknown_new_item2'
row = next(csv.reader(io.StringIO(csv_line)))
row[1] = new_item
output = io.StringIO()
csv.writer(output).writerow(row)
goal = output.getvalue()
print(goal)
This would display goal as:
unknown_item1,unknown_new_item2,unknown_item3

l = csv_line.split(',')
l[1] = new_item
csv_line = ','.join(l)

In the line csv_line.split(',')[1] = new_item, you do not alter the csv_line variable at all. You need to assign the new list created with .split() to a variable before you can change the elements within it:
new_csv = csv_line.split(',')
new_csv[1] = new_item
print(','.join(new_csv))

This seems the most pythonic:
csv_line = 'unknown_item1,old_item2,unknown_item3'
old_index = 1
new_item = 'new_item2'
goal = 'unknown_item1,new_item2,unknown_item3'
items = csv_line.split(',')
items[old_index] = new_item
print(','.join(items))
print(goal)
Output:
unknown_item1,new_item2,unknown_item3
unknown_item1,new_item2,unknown_item3

Keyerror with 2d dictionaries

I'm trying to step through a csv and assign date and time values to their own point in a 2d dictionary.This would be in a form such that an instance of:
'11/02/16' and '23:24' in their respective columns in a row would add '1' to the value in the position marked by 'X' in the dictionary 'Dates{11/01/16{23:X}}'.
Unfortunately I get a KeyError for the following code.
import csv
import sys
from sys import argv
from collections import defaultdict
script, ReadFile = argv
f = open(ReadFile,'r')
l = f.readlines()
f.close()
file_list = [row.replace('\n','').split(',') for row in l]
header = file_list[0]
Total = 0
Dates = defaultdict(dict)
print Dates
index_variable = header.index('Time')
index_variable2 = header.index('# Timestamp')
for row in file_list[1:]:
t = row[index_variable][:2]
d = row[index_variable2][:10]
if row[index_variable2][:10] in Dates:
Dates[d][t] = 1
Total += 1
print "true"
else:
Dates[d] = {}
Dates[d][t] = 1
Total =+ 1
print "false"
print Dates
If I replace the local variable 't' with "'Test'" the code works, but obviously the results are not what I'm after.
Thanks in advance!
Update: If I replace 'd' with 'Test' and keep 't' as it is, the program works completely fine. It's only when the Dictionary is specifically called as 'Dates[d][t]' that the program returns a KeyError.
Update 2: I've updated the code above to show my work. Currently the script will work /as long as no numbers are added/.
Dates[d][t] = 1 #If I change this...
Dates[d][t] += 1 #To this...
A KeyError occurs.
Update 3:
I changed a portion of my code...
for row in file_list[1:]:
t = row[index_variable][:2]
d = row[index_variable2][:10]
if d in Dates and t in Dates[d]:
Dates[d][t] += 1
print "true"
else:
Dates[d][t] = 1
print "false"
And now the script works perfectly fine. I suppose that this means the KeyError was because I was not being specific enough (???).

Assuming that what we see above is just bad formatting of the if by the machine...
I think the problem is in the else:
Dates is a dict with various keys.
The d are the first 10 characters of the 'Date' field in your input
You are wanting to count how many times the minutes got hit on a specific Date.
Dates[d] then is a dictionary whose keys are days.
t is supposed to be a dictionary of minutes that got hit on the specific day
You haven't told python that Dates[d] is a dictionary too.
But you've made a reference to Dates[d][t]. This implies that Dates[d] already exists and it has something that is subscriptable in it.
I tried this on my system
import csv
import sys
from sys import argv
from collections import defaultdict
#script, ReadFile = argv
#f = open(ReadFile,'r')
#l = f.readlines()
#f.close()
#file_list = [row.replace('\n','').split(',') for row in l]
#header = file_list[0]
file_list = [['Date','Time','Otherstuff'],
['2016-02-01','23:12:00','Sillyme1'],
['2016-02-01','23:12:04','Sillyme2'],
['2016-02-02','22:10:00','Sillyme3']]
header = file_list[0]
Dates = defaultdict(dict)
print(Dates)
index_variable = header.index('Time')
index_variable2 = header.index('Date')
for row in file_list[1:]:
t = row[index_variable][:2]
d = row[index_variable2][:10]
if d in Dates.keys():
Dates[d][t] +=1
print("true")
else:
Dates[d] = {} #Now Dates[d] contains a dictionary
Dates[d][t] = 1 ##Now we put the first counter in the Dates[d] dictionary with key t.
print(Dates)
Return was:
defaultdict(, {})
true
defaultdict(, {'2016-02-01': {'23': 2}, '2016-02-02': {'22': 1}})

Dictionaries overwriting in Python

This program is to take the grammar rules found in Binary.text and store them into a dictionary, where the rules are:
N = N D
N = D
D = 0
D = 1
but the current code returns D: D = 1, N:N = D, whereas I want N: N D, N: D, D:0, D:1
import sys
import string
#default length of 3
stringLength = 3
#get last argument of command line(file)
filename1 = sys.argv[-1]
#get a length from user
try:
stringLength = int(input('Length? '))
filename = input('Filename: ')
except ValueError:
print("Not a number")
#checks
print(stringLength)
print(filename)
def str2dict(filename="Binary.txt"):
result = {}
with open(filename, "r") as grammar:
#read file
lines = grammar.readlines()
count = 0
#loop through
for line in lines:
print(line)
result[line[0]] = line
print (result)
return result
print (str2dict("Binary.txt"))

Firstly, your data structure of choice is wrong. Dictionary in python is a simple key-to-value mapping. What you'd like is a map from a key to multiple values. For that you'll need:
from collections import defaultdict
result = defaultdict(list)
Next, where are you splitting on '=' ? You'll need to do that in order to get the proper key/value you are looking for? You'll need
key, value = line.split('=', 1) #Returns an array, and gets unpacked into 2 variables
Putting the above two together, you'd go about in the following way:
result = defaultdict(list)
with open(filename, "r") as grammar:
#read file
lines = grammar.readlines()
count = 0
#loop through
for line in lines:
print(line)
key, value = line.split('=', 1)
result[key.strip()].append(value.strip())
return result

Dictionaries, by definition, cannot have duplicate keys. Therefor there can only ever be a single 'D' key. You could, however, store a list of values at that key if you'd like. Ex:
from collections import defaultdict
# rest of your code...
result = defaultdict(list) # Use defaultdict so that an insert to an empty key creates a new list automatically
with open(filename, "r") as grammar:
#read file
lines = grammar.readlines()
count = 0
#loop through
for line in lines:
print(line)
result[line[0]].append(line)
print (result)
return result
This will result in something like:
{"D" : ["D = N D", "D = 0", "D = 1"], "N" : ["N = D"]}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python list data filtering - python

Related

How to link lists in order

Trying to input text and values from excel into Python Lists

Replace item in string formatted as csv line

Keyerror with 2d dictionaries

Dictionaries overwriting in Python

Categories

Resources