python convert unicode to string - python

I got my results from sqlite by python, it's like this kind of tuples: (u'PR:000017512',)
However, I wanna print it as 'PR:000017512'. At first, I tried to select the first one in tuple by using index [0]. But the print out results is still u'PR:000017512'. Then I used str() to convert and nothing changed. How can I print this without u''?

You're confusing the string representation with its value. When you print a unicode string the u doesn't get printed:
>>> foo=u'abc'
>>> foo
u'abc'
>>> print foo
abc
Update:
Since you're dealing with a tuple, you don't get off this easy: You have to print the members of the tuple:
>>> foo=(u'abc',)
>>> print foo
(u'abc',)
>>> # If the tuple really only has one member, you can just subscript it:
>>> print foo[0]
abc
>>> # Join is a more realistic approach when dealing with iterables:
>>> print '\n'.join(foo)
abc

Don't see the problem:
>>> x = (u'PR:000017512',)
>>> print x
(u'PR:000017512',)
>>> print x[0]
PR:000017512
>>>
You the string is in unicode format, but it still means PR:000017512
Check out the docs on String literals
http://docs.python.org/2/reference/lexical_analysis.html#string-literals

In [22]: unicode('foo').encode('ascii','replace')
Out[22]: 'foo'

Related

output for str.join() method is not consistent

Lets assign two variables:
>>> a_id = 'c99faf24275d476d84e0c8f0ad953582'
>>> u_id = '59958a11a6ad4d8b39707a70'
Right output:
>>> a_id+u_id
'c99faf24275d476d84e0c8f0ad95358259958a11a6ad4d8b39707a70'
Wrong output:
>>> str.join(a_id,u_id)
'5c99faf24275d476d84e0c8f0ad9535829c99faf24275d476d84e0c8f0ad9535829c99faf24275d476d84e0c8f0ad9535825c99faf24275d476d84e0c8f0ad9535828c99faf24275d476d84e0c8f0ad953582ac99faf24275d476d84e0c8f0ad9535821c99faf24275d476d84e0c8f0ad9535821c99faf24275d476d84e0c8f0ad953582ac99faf24275d476d84e0c8f0ad9535826c99faf24275d476d84e0c8f0ad953582ac99faf24275d476d84e0c8f0ad953582dc99faf24275d476d84e0c8f0ad9535824c99faf24275d476d84e0c8f0ad953582dc99faf24275d476d84e0c8f0ad9535828c99faf24275d476d84e0c8f0ad953582bc99faf24275d476d84e0c8f0ad9535823c99faf24275d476d84e0c8f0ad9535829c99faf24275d476d84e0c8f0ad9535827c99faf24275d476d84e0c8f0ad9535820c99faf24275d476d84e0c8f0ad9535827c99faf24275d476d84e0c8f0ad953582ac99faf24275d476d84e0c8f0ad9535827c99faf24275d476d84e0c8f0ad9535820'
Now consider this case, the output is correct now:
>>> a="asdf"
>>> b="asdfsdfsd"
>>> str.join(a,b)
'aasdfsasdfdasdffasdfsasdfdasdffasdfsasdfd'
Confirming the type of all variables in the example:
>>> type(a)
<class 'str'>
>>> type(a_id)
<class 'str'>
>>> type(u_id)
<class 'str'>
Edit
I just realized the second case in the output was not quite what I expected as well. I was using join method in a wrong way.
str.join(a, b) is equivalent to a.join(b), provided a is a str object and b is an iterable. Strings are always iterable, as you will be iterating though each characters in it when you're iterating over a string.
This is basically "insert a copy of a between every element of b (as an iterable)", so if a and b are both strings, a copy of a is inserted into every pair of letters in b. For example:
>>> str.join(".", "123456")
'1.2.3.4.5.6'
If you simply want to concatenate two strings, + is enough, not join:
>>> "." + "123456"
'.123456'
If you really want join, put the strings in a list and use an empty string as "delimiter":
>>> str.join('', ['123', '456', '7890'])
'1234567890'
>>> ''.join(['123', '456', '7890'])
'1234567890'

How to get the first 2 letters of a string in Python?

Let's say I have a string
str1 = "TN 81 NZ 0025"
two = first2(str1)
print(two) # -> TN
How do I get the first two letters of this string? I need the first2 function for this.
It is as simple as string[:2]. A function can be easily written to do it, if you need.
Even this, is as simple as
def first2(s):
return s[:2]
In general, you can get the characters of a string from i until j with string[i:j].
string[:2] is shorthand for string[0:2]. This works for lists as well.
Learn about Python's slice notation at the official tutorial
t = "your string"
Play with the first N characters of a string with
def firstN(s, n=2):
return s[:n]
which is by default equivalent to
t[:2]
Heres what the simple function would look like:
def firstTwo(string):
return string[:2]
In python strings are list of characters, but they are not explicitly list type, just list-like (i.e. it can be treated like a list). More formally, they're known as sequence (see http://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange):
>>> a = 'foo bar'
>>> isinstance(a, list)
False
>>> isinstance(a, str)
True
Since strings are sequence, you can use slicing to access parts of the list, denoted by list[start_index:end_index] see Explain Python's slice notation . For example:
>>> a = [1,2,3,4]
>>> a[0]
1 # first element, NOT a sequence.
>>> a[0:1]
[1] # a slice from first to second, a list, i.e. a sequence.
>>> a[0:2]
[1, 2]
>>> a[:2]
[1, 2]
>>> x = "foo bar"
>>> x[0:2]
'fo'
>>> x[:2]
'fo'
When undefined, the slice notation takes the starting position as the 0, and end position as len(sequence).
In the olden C days, it's an array of characters, the whole issue of dynamic vs static list sounds like legend now, see Python List vs. Array - when to use?
All previous examples will raise an exception in case your string is not long enough.
Another approach is to use
'yourstring'.ljust(100)[:100].strip().
This will give you first 100 chars.
You might get a shorter string in case your string last chars are spaces.
For completeness: Instead of using def you could give a name to a lambda function:
first2 = lambda s: s[:2]

significance of using print() in python

What is the difference between using print() and not using it.
For example, say a = ("first", "second", "third'), what is the difference between
print (a[0] a[2])
and
a[0] a[2]
?
>>> s = 'foo'
>>> s
'foo'
>>> print s
foo
When you type any expression into the Python interpreter, if said expression returns a value, the interpreter will output that value's representation, or repr. reprs are primarily used for debugging, and are intended to show the value in a way that is useful for the programmer. A typical example of a value's repr is how repr('foo') would output 'foo'.
When you use print, you aren't returning a value and so the interpreter is not actually outputting anything; instead, print is writing the value's str to sys.stdout (or an alternative stream, if you specify it with the >> syntax, e.g. print >>sys.stderr, x). strs are intended for general output, not just programmer use, though they may be the same as repr. A typical example of a value's str is how str('foo') would output foo.
The difference between what the interpreter does and what print comes more into play when you write modules or scripts. print statements will continue to produce output, while expression values are not output unless you do so explicitly. You can still output a value's repr, though: print repr(value)
You can also control str and repr in your own objects:
>>> class MyThing(object):
... def __init__(self, value):
... self.value = value
... def __str__(self):
... return str(self.value)
... def __repr__(self):
... return '<MyThing value=' + repr(self.value) + '>'
...
>>> mything = MyThing('foo')
>>> mything
<MyThing value='foo'>
>>> print mything
foo
In interactive mode, the difference is negligible, as the other answers indicate.
However, in a script, print a[0] will actually print output to the screen, while just a[0] will return the value, but that has no visible effect.
For example, consider the following script, printtest.py:
myList = ["first", "second", "third"]
print "with print:", myList[0], myList[2]
"without print:", myList[0], myList[2]
If you run this script in a terminal (python printtest.py), the output is:
with print: first third
>>> a=("first","second")
>>> print a[0],a[1]
first second
>>> a[0],a[1]
('first', 'second')
you can do this
>>> print (a[0], a[2], a[3])
('first', 'second', 'third')
try it :)
print() and not using it?
print prints value (What I mean is in following example, read comets I added):
>>> print a[1]
second # prints without '
>>> a[1]
'second' # prints with '
more useful:
print:
>>> print "a\nb"
a # print value
b
but interpreter
>>> "a\na" # raw strings
'a\na'
that is raw:
>>> print repr("a\na")
'a\na'
difference: print (a[0] a[2]) and a[0] a[2]?
This print two elements of a tuple. as below
>>> print a[0], a[2]
first third
this is similar to print two strings like below:
>>> print "one", "two"
one two
[second]
Where as this first create a tuple (a[0], a[2]) then that will be printed
>>> print (a[0], a[2])
('first', 'third')
first make a tuple of 2 strings then print that like below:
>>> print ("one", "two")
('one', 'two')
Additionally, if you add , then it makes a tuple:
simple string
>>> a[0]
'first'
and this is tuple:
>>> a[0],
('first',)
similarly,
>>> a[0], a[1]
('first', 'second')

Python error: could not convert string to float

I have some Python code that pulls strings out of a text file:
[2.467188005806714e-05, 0.18664554919828535, 0.5026880460053854, ....]
Python code:
v = string[string.index('['):].split(',')
for elem in v:
new_list.append(float(elem))
This gives an error:
ValueError: could not convert string to float: [2.974717463860223e-06
Why can't [2.974717463860223e-06 be converted to a float?
You've still got the [ in front of your "float" which prevents parsing.
Why not use a proper module for that? For example:
>>> a = "[2.467188005806714e-05, 0.18664554919828535, 0.5026880460053854]"
>>> import json
>>> b = json.loads(a)
>>> b
[2.467188005806714e-05, 0.18664554919828535, 0.5026880460053854]
or
>>> import ast
>>> b = ast.literal_eval(a)
>>> b
[2.467188005806714e-05, 0.18664554919828535, 0.5026880460053854]
You may do the following to convert your string that you read from your file to a list of float
>>> instr="[2.467188005806714e-05, 0.18664554919828535, 0.5026880460053854]"
>>> [float(e) for e in instr.strip("[] \n").split(",")]
[2.467188005806714e-05, 0.18664554919828535, 0.5026880460053854]
The reason your code is failing is, you are not stripping of the '[' from the string.
You are capturing the first bracket, change string.index("[") to string.index("[") + 1
This will give you a list of floats without the need for extra imports etc.
s = '[2.467188005806714e-05, 0.18664554919828535, 0.5026880460053854]'
s = s[1:-1]
float_list = [float(n) for n in s.split(',')]
[2.467188005806714e-05, 0.18664554919828535, 0.5026880460053854]
v = string[string.index('[') + 1:].split(',')
index() return index of given character, so that '[' is included in sequence returned by [:].

Interpreting Strings as Other Data Types in Python

I'm reading a file into python 2.4 that's structured like this:
field1: 7
field2: "Hello, world!"
field3: 6.2
The idea is to parse it into a dictionary that takes fieldfoo as the key and whatever comes after the colon as the value.
I want to convert whatever is after the colon to it's "actual" data type, that is, '7' should be converted to an int, "Hello, world!" to a string, etc. The only data types that need to be parsed are ints, floats and strings. Is there a function in the python standard library that would allow one to make this conversion easily?
The only things this should be used to parse were written by me, so (at least in this case) safety is not an issue.
First parse your input into a list of pairs like fieldN: some_string. You can do this easily with re module, or probably even simpler with slicing left and right of the index line.strip().find(': '). Then use a literal eval on the value some_string:
>>> import ast
>>> ast.literal_eval('6.2')
6.2
>>> type(_)
<type 'float'>
>>> ast.literal_eval('"Hello, world!"')
'Hello, world!'
>>> type(_)
<type 'str'>
>>> ast.literal_eval('7')
7
>>> type(_)
<type 'int'>
You can attempt to convert it to an int first using the built-in function int(). If the string cannot be interpreted as an int a ValueError exception is raised. You can then attempt to convert to a float using float(). If this fails also then just return the initial string
def interpret(val):
try:
return int(val)
except ValueError:
try:
return float(val)
except ValueError:
return val
You can use yaml to parse the literals which is better than ast in that it does not throw you an error if strings are not wrapped around extra pairs of apostrophes or quotation marks.
>>> import yaml
>>> yaml.safe_load('7')
7
>>> yaml.safe_load('Hello')
'Hello'
>>> yaml.safe_load('7.5')
7.5
For older python versions, like the one being asked, the eval function can be used but, to reduce evilness, a dict to be the global namespace should be used as second argument to avoid function calls.
>>> [eval(i, {"__builtins__":None}) for i in ['6.2', '"Hello, world!"', '7']]
[6.2, 'Hello, world!', 7]
Since the "only data types that need to be parsed are int, float and str", maybe somthing like this will work for you:
entries = {'field1': '7', 'field2': "Hello, world!", 'field3': '6.2'}
for k,v in entries.items():
if v.isdecimal():
conv = int(v)
else:
try:
conv = float(v)
except ValueError:
conv = v
entries[k] = conv
print(entries)
# {'field2': 'Hello, world!', 'field3': 6.2, 'field1': 7}
There is strconv lib.
In [22]: import strconv
/home/tworec/.local/lib/python2.7/site-packages/strconv.py:200: UserWarning: python-dateutil is not installed. As of version 0.5, this will be a hard dependency of strconv fordatetime parsing. Without it, only a limited set of datetime formats are supported without timezones.
warnings.warn('python-dateutil is not installed. As of version 0.5, '
In [23]: strconv.convert('1.2')
Out[23]: 1.2
In [24]: type(strconv.convert('1.2'))
Out[24]: float
In [25]: type(strconv.convert('12'))
Out[25]: int
In [26]: type(strconv.convert('true'))
Out[26]: bool
In [27]: type(strconv.convert('tRue'))
Out[27]: bool
In [28]: type(strconv.convert('12 Jan'))
Out[28]: str
In [29]: type(strconv.convert('12 Jan 2018'))
Out[29]: str
In [30]: type(strconv.convert('2018-01-01'))
Out[30]: datetime.date
Hope this helps to do what you are trying to do:
#!/usr/bin/python
a = {'field1': 7}
b = {'field2': "Hello, world!"}
c = {'field3': 6.2}
temp1 = type(a['field1'])
temp2 = type(b['field2'])
temp3 = type(c['field3'])
print temp1
print temp2
print temp3
Thanks to wim for helping me figure out what I needed to search for to figure this out.
One can just use eval():
>>> a=eval("7")
>>> b=eval("3")
>>> a+b
10
>>> b=eval("7.2")
>>> a=eval("3.5")
>>> a+b
10.699999999999999
>>> a=eval('"Hello, "')
>>> b=eval('"world!"')
>>> a+b
'Hello, world!'

Categories