How to read line in python using defaultdict? - python

I have a dictionary that searches for an ID name and reads tokens after it. But I want to know if there is a way to read and print out the whole line that contains that ID name as well.
Here is what I have so far:
lookup = defaultdict(list)
wholelookup =defaultdict(list)
mydata = open('summaryfile.txt')
for line in csv.reader(mydata, delimiter='\t'):
code = re.match('[a-z](\d+)[a-z]', line[-1], re.I)
if code:
lookup[line[-2]].append(code.group(1))
wholelookup[line[-2]].append(code.group(0))

Your code calls csv.reader() which will return a parsed version of the whole line. In my test, this returns a list of values. If this list of values will do for the "whole line" then you can save that.
You have a line where you append something called wholelookup. I think you want to just save line there instead of code.group(0). code.group(0) returns everything matched by the regular expression, and this would be identical to line[-1].
So maybe put this line in your code:
wholelookup[line[-2]].append(line)
Or maybe you need to join together the values from line to make a single string:
s = ' '.join(line)
wholelookup[line[-2]].append(s)
If you want the whole line, not the parsed version, then do something like this:
lookup = defaultdict(list)
wholelookup = defaultdict(list)
pat = re.compile('[a-z](\d+)[a-z]', re.I)
with open('summaryfile.txt') as mydata:
for s_line in mydata:
values = s_line.split('\t')
code = re.match(pat, values[-1])
if code:
lookup[values[-2]].append(code.group(1))
wholelookup[values[-2]].append(s_line)
This example pre-compiles the pattern for the slight speed advantage.

If you have enough memory, the easiest way is to simply save the lines in another defaultdict:
wholeline = defaultdict(list)
...
idname = line[-2]
wholeline[idname].append(line)

Related

Using methods to append text file contents into a dictionary in Python?

So I have a .txt file and I want to use methods within this class, Map, to append its contents into a aDictionary.
class Map:
def __init__(self, dataText):
self.dataText = dataText
self.aDictionary = {}
dataFile = open('data.txt', 'r')
c1 = Map(dataFile)
My data.txt file looks something like this:
hello, world
how, are
you, today
and I want aDictionary to print this output:
{how: are, you: today}
Im not very good at manipulating files as I continue to get type errors and what not. Is there an easy way of performing this task using methods within the class?
First you need to read the content of the file. Once you have the content of the file, you could create the dictionary like this (assuming content contains the content of data.txt):
content = """hello, world
how, are
you, today"""
d = {}
for line in content.splitlines():
if line:
key, value = map(str.strip, line.split(','))
d[key] = value
print(d)
Output
{'you': 'today', 'how': 'are', 'hello': 'world'}
The idea is to iterate of over the lines using a for loop, then check if the line is not empty (if line), in case the line is not empty, split on comma (line.split(',')) and remove the trailing whitespaces (str.strip) for each of the values in the list using map.
Or using a dictionary comprehension:
content = """hello, world
how, are
you, today"""
it = (map(str.strip, line.split(',')) for line in content.splitlines() if line)
d = {key: value for key, value in it}
print(d)
To read the content of the file you can do the following:
content = self.dataText.read()
Further
Reading entire file in Python
How to read a file line-by-line into a list?

How to import a special format as a dictionary in python?

I have the text files as below format in single line,
username:password;username1:password1;username2:password2;
etc.
What I have tried so far is
with open('list.txt') as f:
d = dict(x.rstrip().split(None, 1) for x in f)
but I get an error saying that the length is 1 and 2 is required which indicates the file is not being as key:value.
Is there any way to fix this or should I just reformat the file in another way?
thanks for your answers.
What i got so far is:
with open('tester.txt') as f:
password_list = dict(x.strip(":").split(";", 1) for x in f)
for user, password in password_list.items():
print(user + " - " + password)
the results comes out as username:password - username1:password1
what i need is to split username:password where key = user and value = password
Since variable f in this case is a file object and not a list, the first thing to do would be to get the lines from it. You could use the https://docs.python.org/2/library/stdtypes.html?highlight=readline#file.readlines* method for this.
Furthermore, I think I would use strip with the semicolon (";") parameter. This will provide you with a list of strings of "username:password", provided your entire file looks like this.
I think you will figure out what to do after that.
EDIT
* I auto assumed you use Python 2.7 for some reason. In version 3.X you might want to look at the "distutils.text_file" (https://docs.python.org/3.7/distutils/apiref.html?highlight=readlines#distutils.text_file.TextFile.readlines) class.
Load the text of the file in Python with open() and read() as a string
Apply split(;) to that string to create a list like [username:password, username1:password1, username2:password2]
Do a dict comprehension where you apply split(":") to each item of the above list to split those pairs.
with open('list.txt', 'rt') as f:
raw_data = f.readlines()[0]
list_data = raw_data.split(';')
user_dict = { x.split(':')[0]:x.split(':')[1] for x in list_data }
print(user_dict)
Dictionary comprehension is useful here.
One liner to pull all the info out of the text file. As requested. Hope your tutor is impressed. Ask him How it works and see what he says. Maybe update your question to include his response.
If you want me to explain, feel free to comment and I shall go into more detail.
The error you're probably getting:
ValueError: dictionary update sequence element #3 has length 1; 2 is required
is because the text line ends with a semicolon. Splitting it on semicolons then results in a list that contains some pairs, and an empty string:
>>> "username:password;username1:password1;username2:password2;".split(";")
['username:password', 'username1:password1', 'username2:password2', '']
Splitting the empty string on colons then results in a single empty string, rather than two strings.
To fix this, filter out the empty string. One example of doing this would be
[element for element in x.split(";") if element != ""]
In general, I recommend you do the work one step at a time and assign to intermediary variables.
Here's a simple (but long) answer. You need to get the line from the file, and then split it and the items resulting from the split:
results = {}
with open('file.txt') as file:
for line in file:
#Only one line, but that's fine
entries = line.split(';')
for entry in entries:
if entry != '':
#The last item in entries will be blank, due to how split works in this example
user, password = entry.split(':')
results[user] = password
Try this.
f = open('test.txt').read()
data = f.split(";")
d = {}
for i in data:
if i:
value = i.split(":")
d.update({value[0]:value[1]})
print d

Parsing key values pairs from text as dictionary

I'm still quite new to Python and I was wondering how would I convert something that is already in key:value form in a text file into a Python dictionary?
Eg.
2:red
3:orange
5:yellow
6:green
(each key:value on a separate line)
I've looked at other posts but none of them seem to work and I know I'm doing something wrong. So far, I have:
def create_colours_dictionary(filename):
colours_dict = {}
file = open(filename,'r')
contents = file.read()
for key in contents:
#???
return colours_dict
The straight-forward way to do this is to use a traditional for loop, and the str.split method.
Rather than reading from a file, I'll embed the input data into the script as a multi-line string, and use str.splitlines to convert it to a list of strings, so we can loop over it, just like looping over the lines of a file.
# Use a list of strings to simulate the file
contents = '''\
2:red
3:orange
5:yellow
6:green
'''.splitlines()
colours_dict = {}
for s in contents:
k, v = s.split(':')
colours_dict[k] = v
print(colours_dict)
output
{'2': 'red', '3': 'orange', '5': 'yellow', '6': 'green'}
Be aware that this code will only work correctly if there are no spaces surrounding the colon. If there could be spaces (or spaces at the start or end of the line), they you can use the str.strip method to remove them.
There are a couple of ways to make this more compact.
We could use a list comprehension nested inside a dictionary comprehension:
colours_dict = {k: v for k, v in [s.split(':') for s in contents]}
But it's even more compact to use the dict constructor on a generator expression:
colours_dict = dict(s.split(':') for s in contents)
If you aren't familiar with comprehensions, please see
List Comprehensions and Dictionaries in the official tutorial.
Iterate over your file and build a dictionary.
def create_colours_dictionary(filename):
colours_dict = {}
with open(filename) as file:
for line in file:
k, v = line.rstrip().split(':')
colours_dict[k] = v
return colours_dict
dct = create_colours_dictionary('file.txt')
Or, if you're looking for something compact, you can use a dict comprehension with a lambda to split on colons.
colours_dict = {k : v for k, v in (
line.rstrip().split(':') for line in open(filename)
}
This approach will need some modification if the colon is surrounded by spaces—perhaps regex?
Assuming the textfile has the stated 'key:value' and the name of the file is contained in the variable fname you could write a function that will read the file and return a dict or just use a simple with statment.
A function is probably a better choice if this opertion is performed in several places in your code. If only done once a 2-liner will do fine.
# Example with fname being the path to the textfile
def dict_from(fname):
return dict(line.strip().split(':') for line in open(fname))
fname = '...'
# ...
d1 = dict_from(fname)
# Alternative solution
with open(fname) as fd:
d2 = dict(line.strip().split(':') for line in fd)
Both suggested solutions uses a built-in dictconstructor and a generator expression to parse each line. Use strip to remove white space at both start and end of the line. Use split create a (key, value) pair from each line.

Trouble sorting a list with python

I'm somewhat new to python. I'm trying to sort through a list of strings and integers. The lists contains some symbols that need to be filtered out (i.e. ro!ad should end up road). Also, they are all on one line separated by a space. So I need to use 2 arguments; one for the input file and then the output file. It should be sorted with numbers first and then the words without the special characters each on a different line. I've been looking at loads of list functions but am having some trouble putting this together as I've never had to do anything like this. Any takers?
So far I have the basic stuff
#!/usr/bin/python
import sys
try:
infilename = sys.argv[1] #outfilename = sys.argv[2]
except:
print "Usage: ",sys.argv[0], "infile outfile"; sys.exit(1)
ifile = open(infilename, 'r')
#ofile = open(outfilename, 'w')
data = ifile.readlines()
r = sorted(data, key=lambda item: (int(item.partition(' ')[0])
if item[0].isdigit() else float('inf'), item))
ifile.close()
print '\n'.join(r)
#ofile.writelines(r)
#ofile.close()
The output shows exactly what was in the file but exactly as the file is written and not sorted at all. The goal is to take a file (arg1.txt) and sort it and make a new file (arg2.txt) which will be cmd line variables. I used print in this case to speed up the editing but need to have it write to a file. That's why the output file areas are commented but feel free to tell me I'm stupid if I screwed that up, too! Thanks for any help!
When you have an issue like this, it's usually a good idea to check your data at various points throughout the program to make sure it looks the way you want it to. The issue here seems to be in the way you're reading in the file.
data = ifile.readlines()
is going to read in the entire file as a list of lines. But since all the entries you want to sort are on one line, this list will only have one entry. When you try to sort the list, you're passing a list of length 1, which is going to just return the same list regardless of what your key function is. Try changing the line to
data = ifile.readlines()[0].split()
You may not even need the key function any more since numbers are placed before letters by default. I don't see anything in your code to remove special characters though.
since they are on the same line you dont really need readlines
with open('some.txt') as f:
data = f.read() #now data = "item 1 item2 etc..."
you can use re to filter out unwanted characters
import re
data = "ro!ad"
fixed_data = re.sub("[!?#$]","",data)
partition maybe overkill
data = "hello 23frank sam wilbur"
my_list = data.split() # ["hello","23frank","sam","wilbur"]
print sorted(my_list)
however you will need to do more to force numbers to sort maybe something like
numbers = [x for x in my_list if x[0].isdigit()]
strings = [x for x in my_list if not x[0].isdigit()]
sorted_list = sorted(numbers,key=lambda x:int(re.sub("[^0-9]","",x))) + sorted(strings(
Also, they are all on one line separated by a space.
So your file contains a single line?
data = ifile.readlines()
This makes data into a list of the lines in your file. All 1 of them.
r = sorted(...)
This makes r the sorted version of that list.
To get the words from the line, you can .read() the entire file as a single string, and .split() it (by default, it splits on whitespace).

Dictionaries in Python

I have a problem. I want to make a dictionary that translates english words to estonian. I started, but don't know how to continue. Please, help.
Dictionary is a text document where tab separates english and estonian words.
file = open("dictionary.txt","r")
eng = []
est = []
while True :
lines = file.readline()
if lines == "" :
break
pair = lines.split("\t")
eng.append(pair[0])
est.append(pair[1])
for i in range......
Please, help.
For a dictionary, you should use the dictionary type which maps keys to values and is much more efficient for lookups. I also made some other changes to your code, keep them if you wish:
engToEstDict = {}
# The with statement automatically closes the file afterwards. Furthermore, one shouldn't
# overwrite builtin names like "file", "dict" and so on (even though it's possible).
with open("dictionary.txt", "r") as f:
for line in f:
if not line:
break
# Map the Estonian to the English word in the dictionary-typed variable
pair = lines.split("\t")
engToEstDict[pair[0]] = pair[1]
# Then, lookup of Estonian words is simple
print engToEstDict["hello"] # should probably print "tere", says Google Translator
Mind that the reverse lookup (Estonian to English) is not so easy. If you need that, too, you might be better off creating a second dictionary variable with the reversed key-value mapping (estToEngDict[pair[1]] = pair[0]) because lookup will be a lot faster than your list-based approach.
It would be better to use the appropriately named dict instead of two lists:
d = {}
# ...
d[pair[0]] = pair[1]
Then to use it:
translated = d["hello"]
You should also note that when you call readline() that the resulting string includes the trailing new-line so you should strip this before storing the string in the dictionary.

Categories