I am making a dictionary application using argparse in Python 3. I'm using difflib to find the closest matches to a given word. Though it's a list, and it has newline characters at the end, like:
['hello\n', 'hallo\n', 'hell\n']
And when I put a word in, it gives a output of this:
hellllok could be spelled as hello
hellos
hillock
Question:
I'm wondering if there is a reverse or inverse \n so I can counteract these \n's.
Any help is appreciated.
There's no "reverse newline" in the standard character set but, even if there was, you would have to apply it to each string in turn.
And, if you can do that, you can equally modify the strings to remove the newline. In other words, create a new list using the current one, with newlines removed. That would be something like:
>>> oldlist = ['hello\n', 'hallo\n', 'hell\n']
>>> oldlist
['hello\n', 'hallo\n', 'hell\n']
>>> newlist = [s.replace('\n','') for s in oldlist]
>>> newlist
['hello', 'hallo', 'hell']
That will remove all newlines from each of the strings. If you want to ensure you only replace a single newline at the end of the strings, you can instead use:
newlist = [re.sub('\n$','',s) for s in oldlist]
Related
list1 = ['192,3.2', '123,54.2']
yx = ([float(i) for i in list1])
print(list1)
This is the code I have and I am trying to learn for future reference on how to remove , within a list of string. I tried various things like mapping but the mapping would not work due to the comma within the num.
If you want to remove commas from a string use :
list1 = string.split(",")
the string variable contains your string input, you get your output in the form a list, join the list if you want the original string without the commas.
string_joined = "".join(list1)
string_joined will contain your string without the commas.
If you want your string to just remove the comma and retain the empty space at that position, your syntax :
string = string.replace(","," ")
Also, the fist two syntax I explained, can be shortened to a single syntax :
string = string.replace(",","")
Now if you want to iterate in your list of strings, consider each element(string) in your list one at a time :
for string in list1 :
<your codes go here>
Hope this answers what you are looking for.
we can do regex to remove the non-digits to get rid of other characters
import regex as re
print([float(re.sub("[^0-9|.]", "", s)) for s in list1])
without regex:
[float(s.replace(',','')) for s in list1 ]
output:
[1923.2, 12354.2]
I have a text file test.txt which has in it 'a 2hello 3fox 2hen 1dog'.
I want to read the file and then add all the items into a list, then strip the integers so it will result in the list looking like this 'a hello fox hen dog'
I tried this but my code is not working. The result is ['a 2hello 3foz 2hen 1dog']. thanks
newList = []
filename = input("Enter a file to read: ")
openfile = open(filename,'r')
for word in openfile:
newList.append(word)
for item in newList:
item.strip("1")
item.strip("2")
item.strip("3")
print(newList)
openfile.close()
from python Doc
str.strip([chars])Return a copy of the string with the leading and
trailing characters removed. The chars argument is a string specifying
the set of characters to be removed. If omitted or None, the chars
argument defaults to removing whitespace. The chars argument is not a
prefix or suffix; rather, all combinations of its values are stripped:
Strip wont modify the string, returns a copy of the string after removing the characters mentioned.
>>> text = '132abcd13232111'
>>> text.strip('123')
'abcd'
>>> text
'132abcd13232111'
You can try:
out_put = []
for item in newList:
out_put.append(item.strip("123"))
If you want to remove all 123 then use regular expression re.sub
import re
newList = [re.sub('[123]', '', word) for word in openfile]
Note: This will remove all 123 from the each line
Pointers:
strip returns a new string, so you need to assign that to something. (better yet, just use a list comprehension)
Iterating over a file object gives you lines, not words;
so instead you can read the whole thing then split on spaces.
The with statement saves you from having to call close manually.
strip accepts multiple characters, so you don't need to call it three times.
Code:
filename = input("Enter a file to read: ")
with open(filename, 'r') as openfile:
new_list = [word.strip('123') for word in openfile.read().split()]
print(new_list)
This will give you a list that looks like ['a', 'hello', 'fox', 'hen', 'dog']
If you want to turn it back into a string, you can use ' '.join(new_list)
there are several types of strips in python, basically they strip some specified char in every line. In your case you could use lstrip or just strip:
s = 'a 2hello 3fox 2hen 1dog'
' '.join([word.strip('0123456789') for word in s.split()])
Output:
'a hello fox hen dog'
A function in Python is called in this way:
result = function(arguments...)
This calls function with the arguments and stores the result in result.
If you discard the function call result as you do in your case, it will be lost.
Another way to use it is:
l=[]
for x in range(5):
l.append("something")
l.strip()
This will remove all spaces
I have a list of strings like
lst = ['foo000bar111', 'foo000bar1112', 'foo000bar1113']
and I want to extract the last numbers from each string to get
nums = ['111', '1112', '1113']
I have other numbers earlier in the string that I don't care about (000 in this example). There aren't spaces, so I can't lst.split() and I believe doing something like that without spacing is difficult. The numbers are of different lengths, so I can't just do str[-3:]. For what it's worth, the characters before the numbers I care about are the same in each string, and the numbers are at the end of the string.
I'm looking for a way to say 'ok, read until you find bar and then tell me what's the rest of the string.' The best I've come up with is [str[(str.index('bar')+3):] for str in lst], which works, but I doubt that's the most pythonic way to do it.
Your method is accurate. You can also try using re
>>> import re
>>> lst = ['foo000bar111', 'foo000bar1112', 'foo000bar1113']
>>> [re.search(r'(\d+$)',i).group() for i in lst]
['111', '1112', '1113']
You can also try rindex
>>> [i[i.rindex('r')+1:] for i in lst]
['111', '1112', '1113']
Your solution is not bad at all, but you could improve it in a couple of ways:
Use rindex() instead of index; if bar should happen to occur twice (or more) in a string, you want to find the last instance.
Or you can use rsplit():
[ s.rsplit("bar", 1)[1] for s in lst ]
Edit: #Bas beat me to the second solution by a few seconds! :-)
Your own solution works well enough, but I think the main problem with is that you have to hard-code the length of the search string you are using. This could be solved using a temporary variable like this:
tag = 'bar'
[s[(s.index(tag)+len(tag)):] for s in lst]
One alternative way using rsplit:
[x.rsplit('bar', 1)[1] for x in lst]
This always splits on the last occurrence of bar, even if it occurs more than once.
I wrote a function which takes a .txt file.
The first thing it does is split the file at ',' and add them to a list, which creates a list of lists.I used:
lst = s.split(',')
I do get a list of lists, except every second line has an empty list ['\n']. I need to find a way get rid of these empty lists as they muck up the rest of the code.
Is there any simple way of doing this? Or is it just that I doing something wrong?
Any help would be greatly appreciated.
Thank You!
Sample Data:
1,2,3,4
,3,4,
Expected Output:
['1','2','3','4\n']
['','3','4','\n']
Current Output:
['1','2','3','4\n']
['\n']
['','3','4','\n']
Output after using sshashank124's suggestion:
['1','2','3','4\n']
[]
['','3','4','\n']
Output after using Alex Thornton's suggestion:
['1','2','3','4\n']
[]
['','3','4','\n']
Use strip() (or rstrip()) to get rid of new-line characters:
lst = s.strip().split(',')
See also: How can I remove (chomp) a newline in Python?
You can simply do it as:
lst = [i for i in s.split(',') if i != '\n']
Example
>>> s = 'hello,\n,world,\n,bye,\n,world'
>>> lst = [i for i in s.split(',') if i != '\n']
>>> print lst
['hello', 'world', 'bye', 'world']
lst = filter(lambda a: len(a)==1 and a[0]!='\n' or len(a)>1, lst)
This will clear the list of empty list [ ] or list containing only \n.
But it's probably better to clear the text of unwanted strings BEFORE splitting it to a list.
Edit
note at the end of code you have to replace the lst with your list containing the unwanted characters.
also you could try removing the \n from the original text file with this code.
text_file = text_file.replace('\n', '')
If your sample data doesn't have a blank line between the two populated lines, then it seems to me you are having a line-end issue upon reading the text file in the first place. If you post how you're reading in the text file, you could get answers that fix your problem before it ever gets to the point where you have any empty lists to remove.
That said, if lst really does just have [\n] as every other element, you can simply skip them as follows:
lst = s.strip().split(',')[::2]
(Note I've already incorporated the strip mentioned in previous answers to remove the newline characters.)
I would like to do something like:
temp=a.split()
#do some stuff with this new list
b=" ".join(temp)
where a is the original string, and b is after it has been modified. The problem is that when performing such methods, the newlines are removed from the new string. So how can I do this without removing newlines?
I assume in your third line you mean join(temp), not join(a).
To split and yet keep the exact "splitters", you need the re.split function (or split method of RE objects) with a capturing group:
>>> import re
>>> f='tanto va\nla gatta al lardo'
>>> re.split(r'(\s+)', f)
['tanto', ' ', 'va', '\n', 'la', ' ', 'gatta', ' ', 'al', ' ', 'lardo']
The pieces you'd get from just re.split are at index 0, 2, 4, ... while the odd indices have the "separators" -- the exact sequences of whitespace that you'll use to re-join the list at the end (with ''.join) to get the same whitespace the original string had.
You can either work directly on the even-spaced items, or you can first extract them:
>>> x = re.split(r'(\s+)', f)
>>> y = x[::2]
>>> y
['tanto', 'va', 'la', 'gatta', 'al', 'lardo']
then alter y as you will, e.g.:
>>> y[:] = [z+z for z in y]
>>> y
['tantotanto', 'vava', 'lala', 'gattagatta', 'alal', 'lardolardo']
then reinsert and join up:
>>> x[::2] = y
>>> ''.join(x)
'tantotanto vava\nlala gattagatta alal lardolardo'
Note that the \n is exactly in the position equivalent to where it was in the original, as desired.
You need to use regular expressions to rip your string apart. The resulting match object can give you the character ranges of the parts that match various sub-expressions.
Since you might have an arbitrarily large number of sections separated by whitespace, you're going to have to match the string multiple times at different starting points within the string.
If this answer is confusing to you, I can look up the appropriate references and put in some sample code. I don't really have all the libraries memorized, just what they do. :-)
It depends in what you want to split.
For default split use '\n', ' ' as delimitador, you can use
a.split(" ")
if you only want spaces as delimitador.
http://docs.python.org/library/stdtypes.html#str.split
I don't really understand your question. Can you give an example of what you want to do?
Anyway, maybe this can help:
b = '\n'.join(a)
First of all, I assume that when you say
b = " ".join(a)
You actually mean
b = " ".join(temp)
When you call split() without specifying a separator, the function will interpret whitespace of any length as a separator. I believe whitespace includes newlines, so those dissapear when you split the string. Try explicitly passing a separator (such as a simple " " space character) to split(). If you have multiple spaces in a row, using split this way will remove them all and include a series of "" empty strings in the returned list.
To restore the original spacing, just make sure that you call join() from the same string which you used as your separator in split(), and that you don't remove any elements from your intermediary list of strings.