I have a text file that got some antonyms in the format:
able || unable
unable || able
abaxial || adaxial
adaxial || abaxial
and I need to check if this word is antonyms of another or not.
What i did is a code like this:
def antonyms():
f = open('antonyms.txt', 'r+')
for line in f:
a = line.split('||')
x = a[0]
y = a[1]
return x, y
but what I have got is only the last pairs, then I tried to indent return a bit so I did
def antonyms():
f = open('antonym_adjectives.txt', 'r+')
for line in f:
a = line.split('||')
x = a[0]
y = a[1]
return x, y
But again I got first pairs only
How can I get all of the pairs?
and how can I do something like:
>>> antonyms(x, y)
to tell me if they are True or False?
Because no answer answers BOTH your questions:
Firstly: if you return, the program will stop there. So you want to yield your antonym so the whole function becomes a generator:
def antonyms():
f = open('antonyms.txt', 'r+')
for line in f:
a = line.split(' || ') # Think you also need surrounding spaces here.
x = a[0]
y = a[1]
yield x, y
To use this function to check for is_antonym(a,b):
def is_antonym(a,b):
for x,y in antonyms():
if a == x and b == y:
return True
return False
Other answers have good tips too:
A good replacement for instance would be: [x,y] = line.split(' || ')
Your problem is that with return you are getting out of the function.
Instead, of x = a[0] and y = a[1], append those values to an array, and the return that array.
the_array = []
for line in f:
a = line.split('||')
the_array.append((a[0],a[1]))
return the_array
It would make more sense to write a function that gets the antonym of a given word:
def antonym(word):
with open('antonym_adjectives.txt', 'r+') as f:
for line in f:
a, b = line.split('||')
a = a.strip()
b = b.strip()
if a == word:
return b
if b == word:
return a
You can then write antonym(x) == y to check if x and y are antonyms. (However this assumes each word has a single unique antonym).
This reads the file from the beginning each time. If your list of antonyms is manageable in size it might make more sense to read it in to memory as an array or dictionary.
If you can't assume that each word has a single unique antonym, you could turn this into a generator that will return all the antonyms for a given word.
def antonym(word):
with open('antonym_adjectives.txt', 'r+') as f:
for line in f:
a, b = line.split('||')
a = a.strip()
b = b.strip()
if a == word:
yield b
if b == word:
yield a
Then y in antonyms(x) will tell you whether x and y are antonyms, and list(antonyms(x)) will give you the list of all the antonyms of x.
You could use yield:
def antonyms():
f = open('antonyms.txt', 'r+')
for line in f:
a = line.split('||')
x = a[0]
y = a[1]
yield x, y
for a,b in antonyms():
# rest of code here
By the way, you can assign directly to x and y:
x,y = line.split('||')
A simple check if a and b are antonyms could be:
(a == 'un' + b) or (b == 'un' + a)
that is my code ..
def antonyms(first,second):
f = open('text.txt', 'r')
for line in f.readlines():
lst = [s.strip() for s in line.split('||')]
if lst and len(lst) == 2:
x = lst[0]
y = lst[1]
if first == x and second == y:
return True
return False
I don't think list is a good data structure for this problem. After you've read all the antonym pairs into a list, you still have to search the whole list to find the antonym of a word. A dict would be more efficient.
antonym = {}
with open('antonym_adjectives.txt') as infile:
for line in infile:
x,y = line.split('||)
antonym[x] = y
antonym[y] = x
Now you can just look up an antonym in the dict:
try:
opposite = antonym[word]
except KeyError:
print("%s not found" %word)
Related
I have made a code in python that read text file (Test.text), the text file contains data like in below
10 20
15 90
22 89
12 33
I can read specific line by use this code
particular_line = linecache.getline('Test.txt', 1)
print(particular_line)
I tried to use code to split the text file to x, y value but it got all lines not the specific line that I need
with open('Test.txt') as f:
x,y = [], []
for l in f:
row = l.split()
x.append(row[0])
y.append(row[1])
So how to get specific line and split it to two values x and y
You might do
particular_line = linecache.getline('Test.txt', 1)
print(particular_line)
x, y = particular_line.split()
print(x) # 10
print(y) # 20
.split() does give list with 2 elements, I used so called unpacking to get 1st element into variable x and 2nd into y.
You are missing the readlines in your code
with open('Test.txt') as f:
x,y = [], []
for l in f.readlines():
row = l.split()
x.append(row[0])
y.append(row[1])
import pandas as pd
xy = pd.read_csv("Test.txt", sep=" ", header=None).rename(columns={0:"x", 1:"y"})
Now you can access to all x and y values with xy.x.values and xy.y.values.
Note that I chose sep=" " as separator becose I suppose that your x and y are separated by a single space in you file.
This is a rather condensed example:
with open("input.txt", "r") as f:
data = f.read()
# Puts data into array line by line, then word by word
words = [y.split() for y in data.split("\n")]
# Gets first word (x)
x = [x[0] for x in words]
# Gets second word (y)
y = [x[1] for x in words]
Where [y.split() for y in data.split("\n")] gets each line by splitting at \n and splits the x and y values (.split()) by the space in-between them
To get the specific line you can use this code
with open('Test.txt') as f:
particular_line = f.readlines()[1]
print(particular_line)
Notice the index inside [], it starts from 0 as same as most other languages. For example, if you want to get the first line, you'd change it to 0.
By parsing it into two variables you can use
x, y = particular_line.split()
print(x)
print(y)
So, put them together,
with open('Test.txt') as f:
particular_line = f.readlines()[1]
print(particular_line)
x, y = particular_line.split()
print(x)
print(y)
You might also need it in a function format,
def get_particular_line_to_x_y(filename, line_number):
with open(filename) as f:
particular_line = f.readlines()[line_number]
return particular_line.split()
if __name__ == '__main__':
x, y = get_particular_line_to_x_y('Test.txt', 0)
print(x)
print(y)
I've been trying to figure this out for about a year now and I'm really burnt out on it so please excuse me if this explanation is a bit rough.
I cannot include job data, but it would be accurate to imagine 2 csv files both with the first column populated with values (Serial numbers/phone numbers/names, doesn't matter - just values). Between both csv files, some values would match while other values would only be contained in one or the other (Timmy is in both files and is a match, Robert is only in file 1 and does not match any name in file 2).
I can successfully output a csv value ONCE that exists in the both csv files (I.e. both files contain "Value78", output file will contain "Value78" only once).
When I try to tack on an else statement to my if condition, to handle non-matching items, the program will output 1 entry for every item it does not match with (makes 100% sense, matches happen once but every other comparison result besides the match is a non-match).
I cannot envision a structure or method to hold the fields that don't match back so that they can be output once and not overrun my terminal or output file.
My goal is to output two csv files, matches and non-matches, with the non-matches having only one entry per value.
Anyways, onto the code:
import csv
MYUNITS = 'MyUnits.csv'
VENDORUNITS = 'VendorUnits.csv'
MATCHES = 'Matches.csv'
NONMATCHES = 'NonMatches.csv'
with open(MYUNITS,mode='r') as MFile,
open(VENDORUNITS,mode='r') as VFile,
open(MATCHES,mode='w') as OFile,
open(NONMATCHES,mode'w') as NFile:
MyReader = csv.reader(MFile,delimiter=',',quotechar='"')
MyList = list(MyReader)
VendorReader = csv.reader(VFile,delimiter=',',quotechar='"')
VList = list(VendorReader)
for x in range(len(MyList)):
for y in range(len(VList)):
if str(MyList[x][0]) == str(VList[y][0]):
OFile.write(MyList[x][0] + '\n')
else:
pass
The "else: pass" is where the logic of filtering out non-matches is escaping me. Outputting from this else statement will write the non-matching value (len(VList) - 1) times for an iteration that DOES produce 1 match, the entire len(VList) for an iteration with no match. I've tried using a counter and only outputting if the counter equals the len(VList), (incrementing in the else statement, writing output under the scope of the second for loop), but received the same output as if I tried outputting non-matches.
Below is one way you might go about deduplicating and then writing to a file:
import csv
MYUNITS = 'MyUnits.csv'
VENDORUNITS = 'VendorUnits.csv'
MATCHES = 'Matches.csv'
NONMATCHES = 'NonMatches.csv'
list_of_non_matches = []
with open(MYUNITS,mode='r') as MFile,
open(VENDORUNITS,mode='r') as VFile,
open(MATCHES,mode='w') as OFile,
open(NONMATCHES,mode'w') as NFile:
MyReader = csv.reader(MFile,delimiter=',',quotechar='"')
MyList = list(MyReader)
VendorReader = csv.reader(VFile,delimiter=',',quotechar='"')
VList = list(VendorReader)
for x in range(len(MyList)):
for y in range(len(VList)):
if str(MyList[x][0]) == str(VList[y][0]):
OFile.write(MyList[x][0] + '\n')
else:
list_of_non_matches.append(MyList[x][0])
# Remove duplicates from the non matches
new_list = []
[new_list.append(x) for x in list_of_non_matches if x not in new_list]
# Write the new list to a file
for i in new_list:
NFile.write(i + '\n')
Does this work?
import csv
MYUNITS = 'MyUnits.csv'
VENDORUNITS = 'VendorUnits.csv'
MATCHES = 'Matches.csv'
NONMATCHES = 'NonMatches.csv'
with open(MYUNITS,'r') as MFile,
(VENDORUNITS,'r') as VFile,
(MATCHES,'w') as OFile,
(NONMATCHES,mode,'w') as NFile:
MyReader = csv.reader(MFile,delimiter=',',quotechar='"')
MyList = list(MyReader)
MyVals = [x for x in MyList]
MyVals = [x[0] for x in MyVals]
VendorReader = csv.reader(VFile,delimiter=',',quotechar='"')
VList = list(VendorReader)
vVals = [x for x in VList]
vVals = [x[0] for x in vVals]
for val in MyVals:
if val in vVals:
OFile.write(Val + '\n')
else:
NFile.write(Val + '\n')
#for x in range(len(MyList)):
# for y in range(len(VList)):
# if str(MyList[x][0]) == str(VList[y][0]):
# OFile.write(MyList[x][0] + '\n')
# else:
# pass
Sorry, I had some issues with my PC. I was able to solve my own question the night I posted. The solution I used is so simple I'm kicking myself for not figuring it out way sooner:
import csv
MYUNITS = 'MyUnits.csv'
VENDORUNITS = 'VendorUnits.csv'
MATCHES = 'Matches.csv'
NONMATCHES = 'NonMatches.csv'
with open(MYUNITS,mode='r') as MFile,
open(VENDORUNITS,mode='r') as VFile,
open(MATCHES,mode='w') as OFile,
open(NONMATCHES,mode'w') as NFile:
MyReader = csv.reader(MFile,delimiter=',',quotechar='"')
MyList = list(MyReader)
VendorReader = csv.reader(VFile,delimiter=',',quotechar='"')
VList = list(VendorReader)
for x in range(len(MyList)):
tmpStr = ''
for y in range(len(VList)):
if str(MyList[x][0]) == str(VList[y][0]):
tmpStr = '' #Sets to blank so comparison fails, works because break
OFile.write(MyList[x][0] + '\n')
break
else:
tmp = str(MyList[x][0])
if tmp != '':
NFile.write(tmp + '\n')
I am trying to write a code to compare each string in a list to each other and then generate its regex for similarity
list = ["LONDON-UK-L16-N1",
"LONDON-UK-L17-N1",
"LONDON-UK-L16-N2",
"LONDON-UK-L17-N2",
"PARIS-France-L16-N2"]
I am trying to get an output as below
LONDON-UK-L(16|17)-N(1|2)
is that possible? thanks
Update: just to make it clear i am trying to
input: list, or strings
Action: compare list items to each other, and check for similarity (to fix it-first group of a string), and use regex for any other not similar part of item, so instead of having for items, we can have a single output (using regex)
output: regex to match not similar
input:
tez15-3-s1-y2
tez15-3-s2-y2
bro40-55-s1-y2
output:
tez15-3-s(1|2)-y2
,bro40-55-s1-y2
Its not entirely clear from your question what the exact problem is. Since the data you gave as an example is consistent and well ordered, this problem can be solved easily by simply splitting up the items in the list and categorising them.
loc_list = ["LONDON-UK-L16-N1", "LONDON-UK-L17-N1", "LONDON-UK-L16-N2",
"LONDON-UK-L16-N2", "PARIS-France-L16-N2"]
split_loc_list = [location.split("-") for location in loc_list]
locs = {}
for loc in split_loc_list:
locs.setdefault("-".join(loc[0:2]), {}).\
setdefault("L", set()).add(loc[2].strip("L"))
locs.setdefault("-".join(loc[0:2]), {}).\
setdefault("N", set()).add(loc[3].strip("N"))
for loc, vals in locs.items():
L_vals_sorted = sorted(list(map(int,vals["L"])))
L_vals_joined = "|".join(map(str,L_vals_sorted))
N_vals_sorted = sorted(list(map(int,vals["N"])))
N_vals_joined = "|".join(map(str,N_vals_sorted))
print(f"{loc}-L({L_vals_joined})-N({N_vals_joined})")
will output:
LONDON-UK-L(16|17)-N(1|2)
PARIS-France-L(16)-N(2)
Since there were only two tags here ("L" and "N"), I just wrote them into the code. If there are many tags possible, then you can strip by any letter using:
import re
split = re.findall('\d+|\D+', loc[2])
key, val = split[0], split[1]
locs.setdefault("-".join(loc[0:2]), {}).\
setdefault(key, set()).add(val)
Then iterate through all the tags instead of just fetching "L" and "N" in the second loop.
I post this new (second) implementation on this problem, I think more accurate and hope helpful:
import re
data = [
'LONDON-UK-L16-N1',
'LONDON-UK-L17-N1',
'LONDON-UK-L16-N2',
'LONDON-UK-L17-N2',
'LONDON-UK-L18-N2',
'PARIS-France-L16-N2',
]
def merge(data):
data.sort()
data = [y for y in [x.split('-') for x in data]]
for col in range(len(data[0]) - 1, -1, -1):
result = []
def add_result():
result.append([])
if headstr:
result[-1] += headstr.split('-')
if len(list(findnum)) > 1:
result[-1] += [f'{findstr}({"|".join(sorted(findnum))})']
elif len(list(findnum)) == 1:
result[-1] += [f'{findstr}{findnum[0]}']
if tailstr:
result[-1] += tailstr.split('-')
_headstr = lambda x, y: '-'.join(x[:y])
_tailstr = lambda x, y: '-'.join(x[y + 1:])
_findstr = lambda x: re.findall('(\D+)', x)[0] if re.findall('(\D+)', x) else ''
_findnum = lambda x: re.findall('(\d+)', x)[0] if re.findall('(\d+)', x) else ''
headstr = _headstr(data[0], col)
tailstr = _tailstr(data[0], col)
findstr = _findstr(data[0][col])
findnum = []
for row in data:
if headstr + findstr + tailstr != _headstr(row, col) + _findstr(row[col]) + _tailstr(row, col):
add_result()
headstr = _headstr(row, col)
tailstr = _tailstr(row, col)
findstr = _findstr(row[col])
findnum = []
if _findnum(row[col]) not in findnum:
findnum.append(_findnum(row[col]))
else:
add_result()
data = result[:]
return ['-'.join(x) for x in result]
print(merge(data)) # ['LONDON-UK-L(16|17)-N(1|2)', 'LONDON-UK-L18-N2', 'PARIS-France-L16-N2']
I've implemented the following solution:
import re
data = [
'LONDON-UK-L16-N1',
'LONDON-UK-L17-N1',
'LONDON-UK-L16-N2',
'LONDON-UK-L16-N2',
'PARIS-France-L16-N2'
]
def deconstruct(data):
data = [y for y in [x.split('-') for x in data]]
result = dict()
for x in data:
pointer = result
for y in x:
substr = re.findall('(\D+)', y)
if substr:
substr = substr[0]
if not substr in pointer:
pointer[substr] = {0: set()}
pointer = pointer[substr]
substr = re.findall('(\d+)', y)
if substr:
substr = substr[0]
pointer[0].add(substr)
return result
def construct(data, level=0):
result = []
for key in data.keys():
if key != 0:
if len(data[key][0]) == 1:
nums = list(data[key][0])[0]
elif len(data[key][0]) > 1:
nums = '(' + '|'.join(sorted(list(data[key][0]))) + ')'
else:
nums = ''
deeper_result = construct(data[key], level + 1)
if not deeper_result:
result.append([key + nums])
else:
for d in deeper_result:
result.append([key + nums] + d)
return result if level > 0 else ['-'.join(x) for x in result]
print(construct(deconstruct(data)))
# ['LONDON-UK-L(16|17)-N(1|2)', 'PARIS-France-L16-N2']
Don't use 'list' as a variable name... it's a reserved word.
import re
lst = ['LONDON-UK-L16-N1', 'LONDON-UK-L17-N1', 'LONDON-UK-L16-N2', 'LONDON-UK-L16-N2', 'PARIS-France-L16-N2']
def check_it(string):
return re.search(r'[a-zA-Z\-]*L(\d)*-N(\d)*', string)
[check_it(x).group(0) for x in lst]
will output:
['LONDON-UK-L16-N1',
'LONDON-UK-L17-N1',
'LONDON-UK-L16-N2',
'LONDON-UK-L16-N2',
'PARIS-France-L16-N2']
From there, look into groups and define a group to cover the pieces that you want to use for similarity.
The problem is to read the file, look for integers using the re.findall(), looking for a regular expression of '[0-9]+' and then converting the extracted strings to integers and summing up the integers.
MY CODE: in which sample.txt is my text file
import re
hand = open('sample.txt')
for line in hand:
line = line.rstrip()
x = re.findall('[0-9]+',line)
print x
x = [int(i) for i in x]
add = sum(x)
print add
OUTPUT:
You need to append the find results to another list. So that the number found on current line will be kept back when iterating over to the next line.
import re
hand = open('sample.txt')
l = []
for line in hand:
x = re.findall('[0-9]+',line)
l.extend(x)
j = [int(i) for i in l]
add = sum(j)
print add
or
with open('sample.txt') as f:
print sum(map(int, re.findall(r'\d+', f.read())))
try this
import re
hand = open("a.txt")
x=list()
for line in hand:
y = re.findall('[0-9]+',line)
x = x+y
sum=0
for z in x:
sum = sum + int(z)
print(sum)
How can you get the nth line of a string in Python 3?
For example
getline("line1\nline2\nline3",3)
Is there any way to do this using stdlib/builtin functions?
I prefer a solution in Python 3, but Python 2 is also fine.
Try the following:
s = "line1\nline2\nline3"
print s.splitlines()[2]
a functional approach
>>> import StringIO
>>> from itertools import islice
>>> s = "line1\nline2\nline3"
>>> gen = StringIO.StringIO(s)
>>> print next(islice(gen, 2, 3))
line3
`my_string.strip().split("\n")[-1]`
Use a string buffer:
import io
def getLine(data, line_no):
buffer = io.StringIO(data)
for i in range(line_no - 1):
try:
next(buffer)
except StopIteration:
return '' #Reached EOF
try:
return next(buffer)
except StopIteration:
return '' #Reached EOF
A more efficient solution than splitting the string would be to iterate over its characters, finding the positions of the Nth and the (N - 1)th occurence of '\n' (taking into account the edge case at the start of the string). The Nth line is the substring between those positions.
Here's a messy piece of code to demonstrate it (line number is 1 indexed):
def getLine(data, line_no):
n = 0
lastPos = -1
for i in range(0, len(data) - 1):
if data[i] == "\n":
n = n + 1
if n == line_no:
return data[lastPos + 1:i]
else:
lastPos = i;
if(n == line_no - 1):
return data[lastPos + 1:]
return "" # end of string
This is also more efficient than the solution which builds up the string one character at a time.
From the comments it seems as if this string is very large.
If there is too much data to comfortably fit into memory one approach is to process the data from the file line-by-line with this:
N = ...
with open('data.txt') as inf:
for count, line in enumerate(inf, 1):
if count == N: #search for the N'th line
print line
Using enumerate() gives you the index and the value of object you are iterating over and you can specify a starting value, so I used 1 (instead of the default value of 0)
The advantage of using with is that it automatically closes the file for you when you are done or if you encounter an exception.
Since you brought up the point of memory efficiency, is this any better:
s = "line1\nline2\nline3"
# number of the line you want
line_number = 2
i = 0
line = ''
for c in s:
if i > line_number:
break
else:
if i == line_number-1 and c != '\n':
line += c
elif c == '\n':
i += 1
Wrote into two functions for readability
string = "foo\nbar\nbaz\nfubar\nsnafu\n"
def iterlines(string):
word = ""
for letter in string:
if letter == '\n':
yield word
word = ""
continue
word += letter
def getline(string, line_number):
for index, word in enumerate(iterlines(string),1):
if index == line_number:
#print(word)
return word
print(getline(string, 4))
My solution (effecient and compact):
def getLine(data, line_no):
index = -1
for _ in range(line_no):index = data.index('\n',index+1)
return data[index+1:data.index('\n',index+1)]