This code opens the URL and appends the /names at the end and opens the page and prints the string to test1.csv:
import urllib2
import re
import csv
url = ("http://www.example.com")
bios = [u'/name1', u'/name2', u'/name3']
csvwriter = csv.writer(open("/test1.csv", "a"))
for l in bios:
OpenThisLink = url + l
response = urllib2.urlopen(OpenThisLink)
html = response.read()
item = re.search('(JD)(.*?)(\d+)', html)
if item:
JD = item.group()
csvwriter.writerow(JD)
else:
NoJD = "NoJD"
csvwriter.writerow(NoJD)
But I get this result:
J,D,",", ,C,o,l,u,m,b,i,a, ,L,a,w, ,S,c,h,o,o,l,....
If I change the string to ("JD", "Columbia Law School" ....) then I get
JD, Columbia Law School...)
I couldn't find in the documentation how to specify the delimeter.
If I try to use delimeter I get this error:
TypeError: 'delimeter' is an invalid keyword argument for this function
It expects a sequence (eg: a list or tuple) of strings. You're giving it a single string. A string happens to be a sequence of strings too, but it's a sequence of 1 character strings, which isn't what you want.
If you just want one string per row you could do something like this:
csvwriter.writerow([JD])
This wraps JD (a string) with a list.
The csv.writer class takes an iterable as it's argument to writerow; as strings in Python are iterable by character, they are an acceptable argument to writerow, but you get the above output.
To correct this, you could split the value based on whitespace (I'm assuming that's what you want)
csvwriter.writerow(JD.split())
This happens, because when group() method of a MatchObject instance returns only a single value, it returns it as a string. When there are multiple values, they are returned as a tuple of strings.
If you are writing a row, I guess, csv.writer iterates over the object you pass to it. If you pass a single string (which is an iterable), it iterates over its characters, producing the result you are observing. If you pass a tuple of strings, it gets an actual string, not a single character on every iteration.
To put it another way - if you add square brackets around the whole output, it will be treated as one item, so commas won't be added. e.g. instead of:
spamwriter.writerow(matrix[row]['id'],matrix[row]['value'])
use:
spamwriter.writerow([matrix[row]['id'] + ',' + matrix[row]['value']])
Related
I am trying to slice URLs from the last symbol "/".
For example, I have an URL http://google.com/images/54152352.
Now I need the part of that image which is 54152352.
I understand that I could simply slice it with slicing from a certain character, but I have a list of URLs and each of them is different.
Other examples of URLs:
https://google.uk/images/kfakp3ok2 #I would need kfakp3ok2
bing.com/img/3525236236 #I would need 3525236236
wwww.google.com/img/1osdkg23 #I would need 1osdkg23
Is there a way to slice the characters from the last character "/" in a string in Python3? Each part from a different URL has a different length.
All the help will be appreciated.
target=url.split("/")[-1]
split methode returns a list of words separated by the separator specified in the argument
and [-1] is for the last element of that list
You can use the rsplit() functionality.
Syntax:
string.rsplit(separator, maxsplit)
Reference
https://www.w3schools.com/python/ref_string_rsplit.asp
rsplit() splits the string from the right using the delimiter/separator and using maxsplit you can split only once with some performance benefit as compared to split() as you dont need to split more than once.
>>>> url='https://google.uk/images/kfakp3ok2'
>>>>
>>>> url.rsplit('/', 1)[-1]
'kfakp3ok2'
>>>>
I tried to write a very simple code in python because I did not understand how line.split is working if we have two parameters. However, this code returns an error and I also did not understand the purpose of the line.split. Can you please help me?
from operator import itemgetter
import sys
word = None
count = 0
line= 'foo 1' \
''
line = line.strip()
try:
word, count = line.split('\t', 1)
except:
print('error')
count = int(count)
print(word, count)
Split returns a single list of strings split based on the delimiter you pass it.
'hey, you'.split(',') returns ['hey', ' you'] which can be unpacked if you provide two variables.
If the string you are splitting splits into a list of more than two values then you will have an error ValueError: too many values to unpack
In your case just check what the output of split is, it may be a list containing only one element foo 1, which cannot be unpacked.
line.split('\t') is returning only one value, and by word, count you are expecting two, so it results in the error: ValueError: not enough values to unpack (expected 2, got 1)
If you change that line to word, count = line.split() without any parameter in the split function, it will split on any whitespace by default, and it correctly returns the two values you expect with no errors.
What error message is returned, exactly? If it's a ValueError caused by the line.split('\t') line, then it probably means that the string doesn't actually contain a tab character, but a number of spaces. If you want to split the string on either tabs or spaces, omit the argument: line.split()
Easy,,, source
result = your_string.split(separator, max)
separator Optional. Specifies the separator to use when splitting the string. Default value is a whitespace
max Optional. Specifies how many splits to do. Default value is -1, which is "all occurrences"
python string split returns a list of strings after breaking the given string by the specified separator.
So you cannot have word, count = line.split('\t', 1)
just use words= line.split('\t')
f = open('sentences.txt')
lines = [line.lower() for line in f]
print lines[0:5]
words = re.split("\s+", lines[0:5])
with "print" it works perfectly well, but when I try to do the same inside of re.split(), I get an error "TypeError: expected string or buffer"
I think you're searching for join, i.e.:
words = "".join(lines[0:5]).split()
Note:
No need for re module, split() is enough.
Why not just:
words = re.split("\s+", ''.join(lines))
The split function expects a string, which is then split into substrings based on the regex and returned as a list. Passing a list would not make a whole lot of sense. If you're expecting it to take your list of strings and split each string element individually and then return a list of lists of strings, you'll have to do that yourself:
lines_split = []
for line in lines:
lines_split.append(re.split("\s+", line))
As you see, you are getting a TypeError in your function call, which means that you are passing the wrong parameter from what the function is expecting. So you need to think about what you are passing.
If you have a debugger or IDE you can step through and see what type your parameter has, or even use type to print it, via
print(type(lines[0:5]))
which returns
<class 'list'>
so you need to transform that into a String. Each element in your list is a String, so think of a way to get each row out of the list. An example would be
words = [re.split('\s+', line) for line in lines]
where I am using a list comprehension to step through lines and process each row individually.
Your re.split('\s+', line) is the equivalent of line.split() so you can write
words = [line.split() for line in lines]
See the documentation for str.split.
There is a string, for example. EXAMPLE.
How can I remove the middle character, i.e., M from it? I don't need the code. I want to know:
Do strings in Python end in any special character?
Which is a better way - shifting everything right to left starting from the middle character OR creation of a new string and not copying the middle character?
In Python, strings are immutable, so you have to create a new string. You have a few options of how to create the new string. If you want to remove the 'M' wherever it appears:
newstr = oldstr.replace("M", "")
If you want to remove the central character:
midlen = len(oldstr) // 2
newstr = oldstr[:midlen] + oldstr[midlen+1:]
You asked if strings end with a special character. No, you are thinking like a C programmer. In Python, strings are stored with their length, so any byte value, including \0, can appear in a string.
To replace a specific position:
s = s[:pos] + s[(pos+1):]
To replace a specific character:
s = s.replace('M','')
This is probably the best way:
original = "EXAMPLE"
removed = original.replace("M", "")
Don't worry about shifting characters and such. Most Python code takes place on a much higher level of abstraction.
Strings are immutable. But you can convert them to a list, which is mutable, and then convert the list back to a string after you've changed it.
s = "this is a string"
l = list(s) # convert to list
l[1] = "" # "delete" letter h (the item actually still exists but is empty)
l[1:2] = [] # really delete letter h (the item is actually removed from the list)
del(l[1]) # another way to delete it
p = l.index("a") # find position of the letter "a"
del(l[p]) # delete it
s = "".join(l) # convert back to string
You can also create a new string, as others have shown, by taking everything except the character you want from the existing string.
How can I remove the middle character, i.e., M from it?
You can't, because strings in Python are immutable.
Do strings in Python end in any special character?
No. They are similar to lists of characters; the length of the list defines the length of the string, and no character acts as a terminator.
Which is a better way - shifting everything right to left starting from the middle character OR creation of a new string and not copying the middle character?
You cannot modify the existing string, so you must create a new one containing everything except the middle character.
Use the translate() method:
>>> s = 'EXAMPLE'
>>> s.translate(None, 'M')
'EXAPLE'
def kill_char(string, n): # n = position of which character you want to remove
begin = string[:n] # from beginning to n (n not included)
end = string[n+1:] # n+1 through end of string
return begin + end
print kill_char("EXAMPLE", 3) # "M" removed
I have seen this somewhere here.
card = random.choice(cards)
cardsLeft = cards.replace(card, '', 1)
How to remove one character from a string:
Here is an example where there is a stack of cards represented as characters in a string.
One of them is drawn (import random module for the random.choice() function, that picks a random character in the string).
A new string, cardsLeft, is created to hold the remaining cards given by the string function replace() where the last parameter indicates that only one "card" is to be replaced by the empty string...
On Python 2, you can use UserString.MutableString to do it in a mutable way:
>>> import UserString
>>> s = UserString.MutableString("EXAMPLE")
>>> type(s)
<class 'UserString.MutableString'>
>>> del s[3] # Delete 'M'
>>> s = str(s) # Turn it into an immutable value
>>> s
'EXAPLE'
MutableString was removed in Python 3.
Another way is with a function,
Below is a way to remove all vowels from a string, just by calling the function
def disemvowel(s):
return s.translate(None, "aeiouAEIOU")
Here's what I did to slice out the "M":
s = 'EXAMPLE'
s1 = s[:s.index('M')] + s[s.index('M')+1:]
To delete a char or a sub-string once (only the first occurrence):
main_string = main_string.replace(sub_str, replace_with, 1)
NOTE: Here 1 can be replaced with any int for the number of occurrence you want to replace.
You can simply use list comprehension.
Assume that you have the string: my name is and you want to remove character m. use the following code:
"".join([x for x in "my name is" if x is not 'm'])
If you want to delete/ignore characters in a string, and, for instance, you have this string,
"[11:L:0]"
from a web API response or something like that, like a CSV file, let's say you are using requests
import requests
udid = 123456
url = 'http://webservices.yourserver.com/action/id-' + udid
s = requests.Session()
s.verify = False
resp = s.get(url, stream=True)
content = resp.content
loop and get rid of unwanted chars:
for line in resp.iter_lines():
line = line.replace("[", "")
line = line.replace("]", "")
line = line.replace('"', "")
Optional split, and you will be able to read values individually:
listofvalues = line.split(':')
Now accessing each value is easier:
print listofvalues[0]
print listofvalues[1]
print listofvalues[2]
This will print
11
L
0
Two new string removal methods are introduced in Python 3.9+
#str.removeprefix("prefix_to_be_removed")
#str.removesuffix("suffix_to_be_removed")
s='EXAMPLE'
In this case position of 'M' is 3
s = s[:3] + s[3:].removeprefix('M')
OR
s = s[:4].removesuffix('M') + s[4:]
#output'EXAPLE'
from random import randint
def shuffle_word(word):
newWord=""
for i in range(0,len(word)):
pos=randint(0,len(word)-1)
newWord += word[pos]
word = word[:pos]+word[pos+1:]
return newWord
word = "Sarajevo"
print(shuffle_word(word))
Strings are immutable in Python so both your options mean the same thing basically.
x = re.findall(r'FROM\s(.*?\s)(WHERE|INNER|OUTER|JOIN|GROUP,data,re.DOTALL)
I am using above expression to parse oracle sql query and get the result.
I get multiple matches and want to print them each line by line.
How can i do that.
Some result even have "," in between them.
You can try this :
for elt in x:
print('\n'.join(elt.split(',')))
join returns a list of the comma-separated elements, which are then joined again with \n (new line). Therefore, you get one result per line.
Your result is returned in a list.
from https://docs.python.org/2/library/re.html:
re.findall(pattern, string, flags=0) Return all non-overlapping
matches of pattern in string, as a list of strings.
If you are not familiar with data structures, more information here
you should be able to easily iterate on over the returned list with a for loop:
for matchedString in x:
#replace commas
n = matchedString.replace(',','') #to replace commas
#add to new list or print, do something, any other logic
print n