Tupling floats into pairs and adding them to a python dictionary - python

I have a text file of countries and some describing coordinates, with the following format:
Country
57.7934235704;24.3128625831 58.3834133979;24.42892785 58.2573745795;24.0611983579 58.6127534044;23.4265600929
And i'm having trouble converting the file into a python dictionary with country as the key, and the values as list of lists of float-tuples, like so:
[[(57.7934235704, 24.3128625831), (58.3834133979, 24.42892785), (58.2573745795, 24.0611983579), (58.6127534044, 23.4265600929)]]
I've managed to end up with the following code, which in my understanding manages to add the country as a key, and floats the coordinates individually, so what's missing is a way to tuple the floats in pairs, and add them to their corresponding country.
def read_country_file(filename):
with open(filename) as file:
dict = {}
for line in file:
line = line.rstrip().split(' ')
for element in line:
if re.match('^[A-Z]', element): #if the line starts with a letter make it a key
country = (element[0:])
dict[country] = country
elif re.match('^[-0-9;. ]', element): #if the line starts with a number make it a value
element = element.split(';')
for i in element:
flo = float(i)
#MISSING: Tuple floats in pairs and add them to the dictionary
return dict
If I lookup a country in this dictionary, it will find the country/key correctly, but it has no values attached. And if I type-test my "flo" value it's a float, so i have a feeling I'm almost there.

Let's use tuple comprehension:
element = tuple(float(i) for i in element.split(';'))
Additionally, my solution for your problem:
import re
text = ['Vietnam',
'57.7934235704;24.3128625831 58.3834133979;24.42892785 58.2573745795;24.0611983579 58.6127534044;23.4265600929']
def get_tuples_of_float(string):
return [tuple(map(float, j)) for j in re.findall('([\d.]+);([\d.]+)', string)]
it = iter(text)
output = { i : get_tuples_of_float(next(it)) for i in it if re.match('^[A-Z]', i)}

You can use re.findall:
import re
s = """
57.7934235704;24.3128625831 58.3834133979;24.42892785 58.2573745795;24.0611983579 58.6127534044;23.4265600929
"""
new_data = map(float, re.findall('[\d\.]+', s))
final_data = {new_data[i]:new_data[i+1] for i in range(0, len(new_data), 2)}
Output:
{58.6127534044: 23.4265600929, 58.2573745795: 24.0611983579, 58.3834133979: 24.42892785, 57.7934235704: 24.3128625831}

Why don't you first split each line of text based on spaces and then the array that comes out from it, you then split each individual coordinate pair based on the semicolons that are common to them then you can now add everything to the country key on the dictionary.

Related

How to filter a collection by multiple conditions

I have a csv file named film.csv here is the header line with a few lines to use as an example
Year;Length;Title;Subject;Actor;Actress;Director;Popularity;Awards;*Image
1990;111;Tie Me Up! Tie Me Down!;Comedy;Banderas, Antonio;Abril, Victoria;Almodóvar, Pedro;68;No;NicholasCage.png
1991;113;High Heels;Comedy;Bosé, Miguel;Abril, Victoria;Almodóvar, Pedro;68;No;NicholasCage.png
1983;104;Dead Zone, The;Horror;Walken, Christopher;Adams, Brooke;Cronenberg, David;79;No;NicholasCage.png
1979;122;Cuba;Action;Connery, Sean;Adams, Brooke;Lester, Richard;6;No;seanConnery.png
1978;94;Days of Heaven;Drama;Gere, Richard;Adams, Brooke;Malick, Terrence;14;No;NicholasCage.png
1983;140;Octopussy;Action;Moore, Roger;Adams, Maud;Glen, John;68;No;NicholasCage.png
I am trying to filter, and need to display the move titles, for this criteria: first name contains "Richard", Year < 1985, Awards == "Y"
I am able to filter for the award, but not the rest. can you help?
file_name = "film.csv"
lines = (line for line in open(file_name,encoding='cp1252')) #generator to capture lines
lists = (s.rstrip().split(";") for s in lines) #generators to capture lists containing values from lines
#browse lists and index them per header values, then filter all movies that have been awarded
#using a new generator object
cols=next(lists) #obtains only the header
print(cols)
collections = (dict(zip(cols,data)) for data in lists)
filtered = (col["Title"] for col in collections if col["Awards"][0] == "Y")
for item in filtered:
print(item)
# input()
This works for the award but I don't know how to add additional filters. Also when I try to filter for if col["Year"] < 1985 I get error message because string vs int not compatible. How do I make the years a value?
I believe for the first name I can filter like this:
if col["Actor"].split(", ")[-1] == "Richard"
You know how to add one filter. There is no such thing as "additional" filters. Just add your conditions to the current condition. Since you want all of the conditions to be True to select a record, you'd use the boolean and logic. For example:
filtered = (
col["Title"]
for col in collections
if col["Awards"][0] == "Y"
and col["Actor"].split(", ")[-1] == "Richard"
and int(col["Year"]) < 1985
)
Notice I added an int() around the col["Year"] to convert it to an integer.
You've actually gone and reinvented csv.DictReader in the setup to this problem! Instead of
file_name = "film.csv"
lines = (line for line in open(file_name,encoding='cp1252')) #generator to capture lines
lists = (s.rstrip().split(";") for s in lines) #generators to capture lists containing values from lines
#browse lists and index them per header values, then filter all movies that have been awarded
#using a new generator object
cols=next(lists) #obtains only the header
print(cols)
collections = (dict(zip(cols,data)) for data in lists)
filtered = ...
You could have just done:
import csv
file_name = "film.csv"
with open(file_name) as f:
collections = csv.DictReader(delimiter=";")
filtered = ...

Can I import dictionary items with the same values into a list?

I'm importing data from a text file, and then made a dictionary out of that. I'm now trying to make a separate one, with the entries that have the same value only. Is that possible?
Sorry if that's a little confusing! But basically, the text file looks like this:
"Andrew", "Present"
"Christine", "Absent"
"Liz", "Present"
"James", "Present"
I made it into a dictionary first, so I could group them into keys and values, and now I'm trying to make a list of the people who were 'present' only (I don't want to delete the absent ones, I just want a separate list), and then pick one from that list randomly.
This is what I tried:
d = {}
with open('directory.txt') as f:
for line in f:
name, attendance = line.strip().split(',')
d[name.strip()] = attendance.strip()
present_list = []
present_list.append({"name": str(d.keys), "attendance": "Present"})
print(random.choice(present_list))
When I tried running it, I only get:
{'name': '<built-in method keys of dict object at 0x02B26690>', 'attendance': 'Present'}
Which part should I change? Thank you so much in advance!
You can try this:
present_list = [key for key in d if d[key] == "Present"]
first, you have to change the way you the read lines than you can have in your initial dict as key the attendence :
from collections import defaultdict
d = defaultdict(list)
with open('directory.txt') as f:
for line in f.readlines():
name, attendance = line.strip().split(',')
d[attendance.strip()].append(name.strip())
present_list = d["Present"]
print(random.choice(present_list) if present_list else "All absent")
Dict.Keys is a method, not a field. So you must instead do:
d.keys()
This returns an array generator: if you want a comma separated list with square brackets, just calling str() on it is ok. If you want a different formatting, consider ','.join(dict.keys()) to do a simple comma separated list with no square brackets.
UPDATE:
You also have no filtering in place, instead I'd try something like this, where you grab the list of statuses and then compile (new code in BOLD):
d = {}
with open('directory.txt') as f:
for line in f:
name, attendance = line.strip().split(',')
**if name.strip() not in d.keys():
d[attendance.strip()] = [name.strip()]
else:
d[attendance.strip()] = d[attendance.strip()].append(name.strip())**
This way you don't need to go through all those intermediate steps, and you will have something like {"present": "Andrew, Liz, James"}

How to transfer plain text headings and listings to Python dictionary object?

My question:
I want to parse a plain text with headings and listings into a single Python object, where headings as dict key and listings as list of values. The text is shown below:
Playing cricket is my hobby:
(a) true.
(b) false.
Furthermore, the heading does not include:
(a) Singlets.
(b) fabrics.
(c) Smocks.
My desired output is:
{"Playing cricket is my hobby:":["(a)true.","(b)false."],"Furthermore, the heading does not include:":["(a) Singlets.","(b) Garments.","(c) Smocks."]}
What I have done
I firstly convert text to list of string:
plaintxtlist=['Playing cricket is my hobby:','(a) true.','(b) false.','Furthermore, the heading does not include:','(a) Singlets.',' (b) fabrics.','(c) Smocks.']
I tried to convert the list above into a dictionary which its keys are the index of heading and values and lists of text. Here is the code:
import re
data = {} #dictonary
lst = [] #list
regalter=r"^\s*\(([^\)]+)\).*|^\s*\-.*" #regex to identify (a)(A) or - type of lines
j=0
sub = [] #list
plaintxtlist=['Playing cricket is my hobby:','(a) true.','(b) false.','Furthermore, the heading does not include:','(a) Singlets.',' (b) fabrics.','(c) Smocks.']
for i in plaintxtlist: #the data in text files are converted to list of strings and passed to code
if sub:
match = re.match(regalter, i) # pattern matching using regex
if match:
sub.append(i) #if the line containes (a)or(A) it will be appended to list called sub
else:
j=j+1 #each list of lines will have value from 0 n (n is the last line)
sub = [i] #list of text will be appended to list called sub
data[str(j)] = sub # here the sub list will be added to dictonary named data with o,1,2,3 respectively we are laster converting that to string
else:
if sub:
data[str(j)] = sub #else if sub the content in the sublist will be appended to dictonary named data
sub = [i] #each line will be appended to sub list
data[str(j)] = i # if there is no match with regex the pain text will be appended to dictonary
print(data) #print the
And the output from the code below:
{"0":["Playing cricket is my hobby:","(a)true.","(b)false."],"1":["Furthermore, the heading does not include:","(a) Singlets.","(b) Garments.","(c) Smocks."]}
You don't need to transfer each line to fit into a list at first. To make it simpler, you can firstly organize the raw text content by regex, then parse them into the dictionary you want.
You can find out the grouping relationship by specifying the text content goes before a "period" that isn't followed by a "(" in the next line.
Suppose the text content is saved in a file called a_text_file.txt. The full code lies here:
import re
with open('a_text_file.txt') as f:
s = f.read()
pattern = re.compile(r'[\w\s\().:,]+?\.(?!\n\()')
data = dict()
for m in re.findall(pattern, s):
# Group the raw content by `regex`,
# and fit each line into a list
group = m.strip()
lst = group.split('\n')
# Strip out spaces in `key` and `value`
key = lst[0].strip()
value = [i.strip() for i in lst[1:]]
# Fit into the final output
data.update({key: value})
print(data)
The final output:
{'Playing cricket is my hobby:': ['(a) true.', '(b) false.'], 'Furthermore, the heading does not include:': ['(a) Singlets.', '(b) fabrics.', '(c) Smocks.']}

How to sort a large number of lists to get a top 10 of the longest lists

So I have a text file with around 400,000 lists that mostly look like this.
100005 127545 202036 257630 362970 376927 429080
10001 27638 51569 88226 116422 126227 159947 162938 184977 188045
191044 246142 265214 290507 296858 300258 341525 348922 359832 365744
382502 390538 410857 433453 479170 489980 540746
10001 27638 51569 88226 116422 126227 159947 162938 184977 188045
191044 246142 265214 290507 300258 341525 348922 359832 365744 382502
So far I have a for loop that goes line by line and turns the current line into a temp array list.
How would I create a top ten list that has the list with the most elements of the whole file.
This is the code I have now.
file = open('node.txt', 'r')
adj = {}
top_ten = []
at_least_3 = 0
for line in file:
data = line.split()
adj[data[0]] = data[1:]
And this is what one of the list look like
['99995', '110038', '330533', '333808', '344852', '376948', '470766', '499315']
# collect the lines
lines = []
with open("so.txt") as f:
for line in f:
# split each line into a list
lines.append(line.split())
# sort the lines by length, descending
lines = sorted(lines, key=lambda x: -len(x))
# print the first 10 lines
print(lines[:10])
Why not use collections to display the top 10? i.e.:
import re
import collections
file = open('numbers.txt', 'r')
content = file.read()
numbers = re.findall(r"\d+", content)
counter = collections.Counter(numbers)
print(counter.most_common(10))
Ideone Demo
When wanting to count and then find the one(s) with the highest counts, collections.Counter comes to mind:
from collections import Counter
lists = Counter()
with open('node.txt', 'r') as file:
for line in file:
values = line.split()
lists[tuple(values)] = len(values)
print('Length Data')
print('====== ====')
for values, length in lists.most_common(10):
print('{:2d} {}'.format(length, list(values)))
Output (using sample file data):
Length Data
====== ====
10 ['191044', '246142', '265214', '290507', '300258', '341525', '348922', '359832', '365744', '382502']
10 ['191044', '246142', '265214', '290507', '296858', '300258', '341525', '348922', '359832', '365744']
10 ['10001', '27638', '51569', '88226', '116422', '126227', '159947', '162938', '184977', '188045']
7 ['382502', '390538', '410857', '433453', '479170', '489980', '540746']
7 ['100005', '127545', '202036', '257630', '362970', '376927', '429080']
Use a for loop and max() maybe? You say you've got a for loop that's placing the values into a temp array. From that you could use "max()" to pick out the largest value and put that into a list.
As an open for loop, something like appending max() to a new list:
newlist = []
for x in data:
largest = max(x)
newlist.append(largest)
Or as a list comprehension:
newlist = [max(x) for x in data]
Then from there you have to do the same process on the new list(s) until you get to the desired top 10 scenario.
EDIT: I've just realised that i've misread your question. You want to get the lists with the most elements, not the highest values. Ok.
len() is a good one for this.
for x in data:
if len(templist) > x:
newlist.append(templist)
That would give you the current highest and from there you could create a top 10 list of lengths or of the temp lists themselves, or both.
If your data is really as shown with each number the same length, then I would make a dictionary with key = line, value = length, get the top value / key pairs in the dictionary and voila. Sounds easy enough.

Could you explain this?

Quick question: in Python 3, if I have the following code
def file2dict(filename):
dictionary = {}
data = open(filename, 'r')
for line in data:
[ key, value ] = line.split(',')
dictionary[key] = value
data.close()
return dictionary
It means that file MUST contain exactly 2 strings(or numbers, or whatever) on every line in the file because of this line:
[ key, value ] = line.split(',')
So, if in my file I have something like this
John,45,65
Jack,56,442
The function throws an exception.
The question: why key, value are in square brackets? Why, for example,
adr, port = s.accept()
does not use square brackets?
And how to modify this code if I want to attach 2 values to every key in a dictionary? Thank you.
The [ and ] around key, value aren't getting you anything.
The error that you're getting, ValueError: too many values to unpack is because you are splitting text like John,45,65 by the commas. Do "John,45,65".split(',') in a shell. You get
>>> "John,45,65".split(',')
['John', '45', '65']
Your code is trying to assign 3 values, "John", 45, and 65, to two variables, key and value, thus the error.
There are a few options:
1) str.split has an optional maxsplit parameter:
>>> "John,45,65".split(',', 1)
['John', '45,65']
if "45,65" is the value you want to set for that key in the dictionary.
2) Cut the extra value.
If the 65 isn't what you want, then you can do something either like
>>> name, age, unwanted = "John,45,65".split(',',)
>>> name, age, unwanted
('John', '45', '65')
>>> dictionary[name] = age
>>> dictionary
{'John': '45'}
and just not use the unwanted variable, or split into a list and don't use the last element:
>>> data = "John,45,65".split(',')
>>> dictionary[data[0]] = data[1]
>>> dictionary
{'John': '45'}
you can use three variable's instead of two, make first one key,
def file2dict(filename):
dictionary = {}
data = open(filename, 'r')
for line in data:
key, value1,value2 = line.split(',')
dictionary[key] = [int(value1), int(value2)]
data.close()
return dictionary
When doing a line split to a dictionary, consider limiting the number of splits by specifying maxsplit, and checking to make sure that the line contains a comma:
def file2dict(filename):
data = open(filename, 'r')
dictionary = dict(item.split(",",1) for item in data if "," in item)
data.close()
return dictionary

Categories