Forgive me for the poor title, I really can't come up with a proper title.
Here is my problem. Say I was given a list of strings:
['2010.01.01',
'1b',
'`abc'
'12:20:33.000']
And I want to do a 'type check' so that given the first string it returns type date, second one boolean, third one a symbol, forth one a time... etc. The returned value can be a string or anything since all I want to do is to cast the correct ctypes.
Is there any way to do it?
ps: my python is 2.5
>>> str = ['2010.01.01',
... '1b',
... '`abc'
... '12:20:33.000']
>>> [type(x) for x in str]
[<type 'str'>, <type 'str'>, <type 'str'>]
Suppose that you use an eval for determine this list.
If you are completely certain you can trust the content -- that it's not, say, coming from a user who could sneak code into the list somehow -- you could map the list onto eval, which will catch native types like numbers. However there is no simple way to know what those strings should all mean -- for example, if you try to evel '2010.01.01' python will think you're trying to parse a number and then fail because of the extra decimals.
So you could try a two stage strategy: first cast the list to strings vs numbers using eval:
def try_cast (input_string):
try:
val = eval(input_string)
val_type = type(val)
return val, val_type
except:
return input_string, type('')
cast_list = map (try_cast, original_list)
that would give a list of tuples where the second item is the type and the first is the converted item. For more specialized things like dates you'd need to use the same strategy on the strings left over after the first pass, using a try/except block to attempt converting them to dates using time.strptime(). You'll need to figure out what time formats to expect and generate a parse expression for each one (you can check the python docs or something like http://www.tutorialspoint.com/python/time_strptime.htm) You'd have to try all the options and see which ones correctly converted -- if one worked, the value is a date; if not, its just a string.
Related
Consider the following lines:
a = 5e7
print(str(a))
I get 50000000.0 instead of 5e7 itself, can I do this using str command itself ?
You are able to write in another way too.
I prefer using this method because it is clearer.
a = 5e7
print(f"{a:0.0e}")
However if you have more than one int or float to be used in a sentence, you can write it something like this.
print("{a:0.0e} ... {a:0.0e}".format(a = 5e7))
The code above will give u 2 var "a"
For more codes please check
https://www.w3schools.com/python/ref_string_format.asp
Hopefully this will help u.
There is no way of knowing the format of the original input, since only the type and value are remembered during parsing.
However, if you are looking for how to display a number in exponential form, use the e format specifier, like this:
a = 5e7
print("{:0.0e}".format(a))
I have started learning python three days ago and I want to assign a number or a integer as my key in dictionary but I am getting an error
a= dict(1=one,2=two)
but it is giving me #syntax error:expression cannot contain assignment (pointing at int 1)
but when I do
b=dict(one=1,two=2)
It is perfectly working
I know an alternative of using
a={1:'one',2:'two'}
but it is too time consuming and I want to know what is error in first one
You can't do it with the dict constructor since it receives only valid Python identifiers (numeric values are not).
You can try instead:
dict([(1,"one"),(2,"two")])
This is logic : assignements only work in one way. 1 = myvar will not work, but myvar = 1 will. Here, you say 1=one... It doesn't work ! But when you reverse it, one = 1, it works. So you found by yourself the answer ;)
a= dict(1=one,2=two)
Such type of dictionary constructors accepts only keys that are valid Python identifiers. One of the rules: an identifier cannot start with a digit.
But you can use another versions of the constructor, e.g.:
a = dict([(1, one), (2, two)])
I'm trying to parse user input that specifies a value to look up in a dictionary.
For example, this user input may be the string fieldA[5]. I need to then look up the key fieldA in my dictionary (which would be a list) and then grab the element at position 5. This should be generalizable to any depth of lists, e.g. fieldA[5][2][1][8], and properly fail for invalid inputs fieldA[5[ or fieldA[5][7.
I have investigated doing:
import re
key = "fieldA[5]"
re.split('\[|\]', key)
which results in:
['fieldA', '5', '']
This works as an output (I can lookup in the dictionary with it), but does not actually enforce or check pairing of brackets, and is unintuitive in how it reads and how this output would be used to look up in the dictionary.
Is there a parser library, or alternative approach available?
If you don't expect it to get much more complicated than what you described, you might be able to get away with some regex. I usually wouldn't recommend it for this purpose, but I want you to be able to start somewhere...
import re
key = input('Enter field and indices (e.g. "fieldA[5][3]"): ')
field = re.match(r'[^[]*', key).group(0)
indices = [int(num) for num in re.findall(r'\[(\d+)]', key)]
This will simply not recognize a missing field or incorrect bracketing and return an empty string or list respectively.
Here is a conceptual problem that I have been having regarding the cleaning of data and how to interact with lists and tuples that I'm not sure completely how to explain, but if I can get a fix for it, I can conceptually be better at using python.
Here it is: (using python 3 and sqlite3)
I have an SQLite Database with a date column which has text in it in the format of MM-DD-YY 24:00. when viewed in DB Browser the text looks fine. However, when using a fetchall() in Python the code prints the dates out in the format of 'MM-DD-YY\xa0'. I want to clean out the \xa0 from the code and I tried some code that is a combination of what I think I should do plus another post I read on here. This is the code:
print(dates)
output [('MM-DD-YY\xa0',), ('MM-DD-YY\xa0',)etc.blahblah] i just typed this in here
to show you guys the output
dates_clean = []
for i in dates:
clean = str(i).replace(u'\xa0', u' ')
dates_clean.append(clean)
Now when I print dates_clean I get:
["('MM-DD-YY\xa0',)", "('MM-DD-YY\xa0',)"etc]
so now as you can see when i tried to clean it, it did what I wanted it to do but now the actual tuple that it was originally contained in has become part of the text itself and is contained inside another tuple. Therefore when I write this list back into SQLite using an UPDATE statement. all of the date values are contained inside a tuple.
It frustrates me because I have been facing issues such as this for a while, where I want to edit something inside of a list or a tuple and have the new value just replace the old one instead of keeping all of the characters that say it is a tuple and making them just text. Sorry if that is confusing like I said its hard for me to explain. I always end up making my data more dirty when trying to clean it up.
Any insights in how to efficiently clean data inside lists and tuples would be greatly appreciated. I think I am confused about the difference between accessing the tuple or accessing what is inside the tuple. It could also be helpful if you could suggest the name of the conceptual problem I'm dealing with so I can do more research on my own.
Thanks!
You are garbling the output by calling str() on the tuple, either implicitly when printing the whole array at once, or explicitly when trying to “clean” it.
See (python3):
>>> print("MM-DD-YY\xa024:00")
MM-DD-YY 24:00
but:
>>> print(("MM-DD-YY\xa024:00",))
('MM-DD-YY\xa024:00',)
This is because tuple.__str__ calls repr on the content, escaping the non-ascii characters in the process.
However if you print the tuple elements as separate arguments, the result will be correct. So you want to replace the printing with something like:
for row in dates:
print(*row)
The * expands the tuple to separate parameters. Since these are strings, they will be printed as is:
>>> row = ("MM-DD-YY\xa023:00", "MM-DD-YY\xa024:00")
>>> print(*row)
MM-DD-YY 23:00 MM-DD-YY 24:00
You can add separator if you wish
>>> print(*row, sep=', ')
MM-DD-YY 23:00, MM-DD-YY 24:00
... or you can format it:
>>> print('from {0} to {1}'.format(*row))
from MM-DD-YY 23:00 to MM-DD-YY 24:00
Here I am using the * again to expand the tuple to separate arguments and then simply {0} for zeroth member, {1} for first, {2} for second etc. (you can also use {} for next if you don't need to change the order, but giving the indices is clearer).
Ok, so now if you actually need to get rid of the non-breaking space anyway, replace is the right tool. You just need to apply it to each element of the tuple. There are two ways:
Explicit destructuring; applicable when the number of elements is fixed (should be; it is a row of known query):
Given:
>>> row = ('foo', 2, 5.5)
you can destructure it and construct a new tuple:
>>> (a, b, c) = row
>>> (a.replace('o', '0'), b + 1, c * 2)
('f00', 3, 11.0)
this lets you do different transformation on each column.
Mapping; applicable when you want to do the same transformation on all elements:
Given:
>>> row = ('foo', 'boo', 'zoo')
you just wrap a generator comprehension in a tuple constructor:
>>> tuple(x.replace('o', '0') for x in row)
('f00', 'b00', 'z00')
On a side-note, SQLite has some date and time functions and they expect the timestamps to be in strict IS8601 format, i.e. %Y-%m-%dT%H:%M:%S (optionally with %z at the end; using strftime format; in TR#35 format it is YYYY-MM-ddTHH-mm-ss(xx)).
In your case, dates is actually a list of tuples, with each tuple containing one string element. The , at the end of the date string is how you identify a single element tuple.
The for loop you have needs to work on each element within the tuples, instead of the tuples themselves. Something along the lines of:
for i in dates:
date = i[0]
clean = str(date).replace('\xa0', '')
dates_clean.append(date)
I am not sure this the best solution to your actual problem of manipulating data in the db, but should answer your question.
EDIT: Also, refer the Jan's reply about unicode strings and python 2 vs python 3.
I have a large CSV that is a result of a python script I wrote. Each row contains a list of entries, that when I wrote were strings or ints. Please note that the files from my script are sometimes created on either linux or windows platform (which might be the problem, hence the mention. I'm new at multi-platform python, so please forgive me).
Now, I'm trying to read the .csv in but some of the ints come in as long objects, according to type(whatiwant). I've tried everything I and my google fu can think of to convert these objects to int (int(), str(). replace for " ", "L", and "/r", "/n"). Nevertheless, when I test the list via for loop and type(), output says some things are still long objects.
What am I missing here? I tried looking for background info on long objects but couldn't find anything useful, hence the post.
I'm new at all this, so again, please forgive my ignorance.
When it rains, it pours. Sorry for screwing up the edit rofl:
I'm reading in the values like this (which are writte in rows, as a list containing values that are ints and strings):
Input = [["header"|"subheader"], [15662466|2831811638],
[5662466|27044023]...]
data = []
people_list = []
for entry in input:
data.append(entry)
for row in data:
holder = row.split("|")
person = str(holder([1])
people_list.append(person.replace.("\r", "").replace("\n","").replace("L", "")
people_list.pop(0)
for person in people_list:
strperson = str(person)
intperson = int(strperson)
print intperson
print type(intperson)
output:
2831811683
<type 'long'>
27044023
<type 'int'>
They are being treated as longs. Python as two number types: ints, which have a maximum and minimum value, and longs, which are unbounded. It's not really a problem if the numerical data is a long instead of an int.
A long is a datatype that is just longer than an int.
More formally long is the datatype used when integers would have caused an integer overflow, so anything more than sys.maxint automatically converts your int to a long.
Docs: https://docs.python.org/2/library/stdtypes.html#typesnumeric
Note that in Python 3 there is no significance between the two, as Python3 unifies the two types.