Converting unicode object to list after retrieval from database - python

In Django, I'm getting some values from a select field using request.POST.getlist('tags'), so when I store this information in MySQL I end up with something like this: u"['literature']". I think this is pretty reasonable and even desirable since I don't want to use another table to store this information. Obviously, the problem comes when I try to retrieve that information because, as expected, I get this:
u'['
u'u'
u"'"
u'l'
u'i'
u't'
u'e'
.
.
.
(assuming this tag is literature, for example).
How can I transform this unicode object into a Python list?. Is there a better approach?.
Thanks in advance

Short answer: Create another table.
Databases are designed to be used in a particular way, why try to force them to store information in a way they are not meant to?
There are other solutions to this, but the best answer is use the database as it was intended, it will be easier in the long run.

Use json to convert to JSON before writing, and from JSON after reading. Or use one of the several implementations of JSONField in the wild.

does this help?
>>> import ast
>>> lst = ast.literal_eval(u"['literature']")
>>> lst
['literature']
>>> isinstance(lst, list)
True
but the better approach would be to proper serialize the list before storing as a string. you could use one of the existing pickle implementations, json, or roll your own (since it does not have to be generic, it could be a simple oneliner like "SENTINAL".join(list).... not that I'd recommend the latter, though)

Related

get/access each chunk of dask.dataframe(df, chunksize=100)

I used below code to split a dataframe using dask:
result=dd.from_pandas(df, chunksize=75)
I use below code to create a custom json file:
for z in result:
createjson (z)
It just didnt work! how can I access to each chunk?
There may be a more native way (feels like there should be) but you can do:
for i in range(result.npartitions):
partition = result.get_partition(i)
# your code here
We do not know what your createjson function does, but perhaps it is covered by to_json().
Alternatively, if you really want to do something unique to each of your partition, and this is not unique to JSON, then you will want the method map_partitions().

Extending pyyaml to find and replace like xml ElementTree

I'd like to extend this SO question to treat a non-trivial use-case.
Background: pyyaml is pretty sweet insofar as it eats YAML and poops Python-native data structures. But what if you want to find a specific node in the YAML? The referenced question would suggest that, hey, you just know where in the data structure the node lives and index right into it. In fact pretty much every answer to every pyyaml question on SO seems to give this same advice.
But what if you don't know where the node lives in the YAML in advance?
If I were working with XML I'd solve this problem with an xml.etree.ElementTree. These provide nice facilities for loading an XML document into memory and finding elements based on certain search criteria. See find() and findall().
Questions:
Does pyyaml provide search capabilities analogous to ElementTree? (If yes, feel free to yell at me for being bad at Google.)
If no, does anyone have nice recipe for extending pyyaml to achieve similar things? (Bonus points for not traversing the deserialized YAML all over again.)
Note that one important thing that ElementTree provides in addition to just being able to find things is the ability to modify the XML document given an element reference. I'd like to be able to do this on YAML as well.
The answer to question 1 is: no. PyYAML implements the YAML 1.1 language standard and there is nothing about finding scalars by any path in the standard nor in the library.
However if you safeload a YAML structure, everything is either a mapping, a sequence or a scalar. Even such a simplistic representation (simple, compared to full fledged object instantiation with !typemarkers), can already contain recursive self referencing structures:
&a x: *a
This is not possible in XML without external semantic interpretation. This makes making a generic tree walker much harder in YAML than in XML.
The type loading mechanism of YAML also makes it much more difficult to generic tree walker, even if you exclude the problem of self references.
If you don't know where a node lives in advance, you still need to know how to identify the node, and since you don't know how to you would walk the parent (which might be represented in multiple layers of combined mappings and sequences, it is almost almost useles to have a generic mechanism that depends on context.
Without being able to rely on context (in general) the thing that is left is a uniquely identifiable value (like the HTML id attribute). If all your objects in YAML have such a unique id, then it is possible to search the (safeloaded) tree for such an id value and extract any structure underneath it (mappings, sequences) until you hit a leaf (scalar), or some structure that has an id of its own (another object).
I have been following the YAML development for quite some time now (earliest emails from the YAML mailing list that I have in my YAML folder are from 2004) and I have not seen anything generic evolve since then. I do have some tools to walk the trees and find things that I use for extracting parts of the simplified structure for testing my raumel.yaml library, but no code that is in a releasable shape (it would have already been on PyPI if it was), and nothing near to a generic solution like you can make for XML (which is IMO, on its own, syntactically less complex than YAML).
Do you know how to search through python objects? then you know how to search through the results of a yaml.load()...
YAML is different from XML in two important ways: one is that while every element in XML has a tag and a value, in YAML, there can be some things that are only values. But secondly... again, YAML creates python objects. There is no intermediate in-memory format to use.
E.G. if you load a YAML file like this:
- First
- Second
- Third
you'll get a list like ['First', 'Second', 'Third']. Want to find 'Third' and don't know where it is? You can use [x for x in my_list if 'Third' in x] to find it. Need to lookup an item in a dictionary? Just do it.
If you want to modify an object, you don't modify the YAML, you modify the object. E.G. now I want the second entry to be in German. I just do 'my_list[1] = 'zweite', modifying it in place. Now the python list looks like ['First', 'zweite', 'Third'], and dumping it to YAML looks like
- First
- zweite
- Third
Note that PyYAML is pretty smart... you can even create objects with loops:
>>> a = [1,2,3]
>>> b = {}
>>> b[1] = a
>>> b[2] = a
>>> print yaml.dump(b)
1: &id001 [1, 2, 3]
2: *id001
>>> b[2] = [3,4,5]
>>> print yaml.dump(b)
1: [1, 2, 3]
2: [3, 4, 5]
In the first case, it even figured out that b[1] and b[2] point to the same object, so it created links and automatically put a link from one to the other... in the original object, if you did something like a.pop(), both b[1] and b[2] would show that one entry was gone. If you send that object to YAML, and then load it back in, that will still be true.
(and note in the second one, where they aren't the same, PyYAML doesn't create the extra notations, as it doesn't need to).
In short: Most likely, you're just overthinking it.

List expressions in Python

I'm parsing a xml file ... so there's a field called case: Sometimes it's a single OrderedDict, other times it's a list of OrderedDict. That's it:
OrderedDict([(u'duration', u'2.111'), (u'className', u'foo'), (u'testName', u'runTest'), (u'skipped', u'false'), (u'failedSince', u'0')])
[OrderedDict([(u'duration', u'0.062'), (u'className', u'foo'), (u'testName', u'runTest'), (u'skipped', u'false'), (u'failedSince', u'0')]), OrderedDict([(u'duration', u'0.461'), (u'className', u'bar'), (u'testName', u'runTest'), (u'skipped', u'false'), (u'failedSince', u'0')])]
I want to always have that expression as a single list. The reason is to have a for loop to take care of that. I thought about doing something like:
[case]
But as the later I would have [[case]]. I don't think list joins or concatenations would help me. A trivial solution would be to check if case is of the type list or OrderedDict, however I was looking for a simpler, one line, pythonic solution like the one I described above. How can I accomplish that?
Since list and OrderedDict are both kinds of containers, checking the type sounds like it might be the simplest solution, if you're sure that the xml parse will always use the list type.
There's no reason you can't do this in a one-liner:
case = [case] if not isinstance(case, list) else case

How do I get a formatted list of methods from a given object?

I'm a beginner, and the answers I've found online so far for this have been too complicated to be useful, so I'm looking for an answer in vocabulary and complexity similar to this writing.
I'm using python 2.7 in ipython notebook environment, along with related modules as distributed by anaconda, and I need to learn about the library-specific objects in the course of my daily work. The case I'm using here is a pandas dataframe object but the answer must work for any object of python or of an imported module.
I want to be able to print a list of methods for the given object. Directly from my program, in a concise and readable format. Even if it's just the method names in a list by alphabetical order, that would be great. A bit more detail would be even better, an ordering based on what it does is fine, but I'd like the output to look like a table, one row per method, and not big blocks of text. What i've tried is below, and it fails for me because it's unreadable. It puts copies of my data between each line, and it has no formatting.
(I love stackoverflow. I aspire to have enough points someday to upvote all your wonderful answers.)
import pandas
import inspect
data_json = """{"0":{"comment":"I won\'t go to school"}, "1":{"note":"Then you must stay in bed"}}"""
data_df = pandas.io.json.read_json(data_json, typ='frame',
dtype=True, convert_axes=True,
convert_dates=True, keep_default_dates=True,
numpy=False, precise_float=False,
date_unit=None)
inspect.getmembers(data_df, inspect.ismethod)
Thanks,
- Sharon
Create an object of type str:
name = "Fido"
List all its attributes (there are no “methods” in Python) in alphabetical order:
for attr in sorted(dir(name)):
print attr
Get more information about the lower (function) attribute:
print(name.lower.__doc__)
In an interactive session, you can also use the more convenient
help(name.lower)
function.

Python sas7bdat module usage

I have to dump data from SAS datasets. I found a Python module called sas7bdat.py that says it can read SAS .sas7bdat datasets, and I think it would be simpler and more straightforward to do the project in Python rather than SAS due to the other functionality required. However, the help(sas7bdat) in interactive Python is not very useful and the only example I was able to find to dump a dataset is as follows:
import sas7bdat
from sas7bdat import *
# following line is sas dataset to convert
foo = SAS7BDAT('/support/sas/locked_data.sas7bdat')
#following line is txt file to create
foo.convertFile('/support/textfiles/locked_data.txt','\t')
This doesn't do what I want because a) it uses the SAS variable names as column headers and I need it to use the variable labels, and b) it uses "nan" to denote missing numeric values where I'd rather just leave the value blank.
Can anyone point me to some useful documentation on the methods included in sas7bdat.py? I've Googled every permutation of key words that I could think of, with no luck. If not, can someone give me an example or two of using readColumnAttributes(), readColumnLabels(), and/or readColumnNames()?
Thanks, all.
As time passes, solutions become easier. I think this one is easiest if you want to work with pandas:
import pandas as pd
df = pd.read_sas('/support/sas/locked_data.sas7bdat')
Note that it is easy to get a numpy array by using df.values
This is only a partial answer as I've found no [easy to read] concrete documentation.
You can view the source code here
This shows some basic info regarding what arguments the methods require, such as:
readColumnAttributes(self, colattr)
readColumnLabels(self, collabs, coltext, colcount)
readColumnNames(self, colname, coltext)
I think most of what you are after is stored in the "header" class returned when creating an object with SAS7BDAT. If you just print that class you'll get a lot of info, but you can also access class attributes as well. I think most of what you may be looking for would be under foo.header.cols. I suspect you use various header attributes as parameters for the methods you mention.
Maybe something like this will get you closer?
from sas7bdat import SAS7BDAT
foo = SAS7BDAT(inFile) #your file here...
for i in foo.header.cols:
print '"Atrributes"', i.attr
print '"Labels"', i.label
print '"Name"', i.name
edit: Unrelated to this specific question, but the type() and dir() commands come in handy when trying to figure out what is going on in an unfamiliar class/library
I know I'm late for the answer, but in case someone searches for similar question. The best option is:
import sas7bdat
from sas7bdat import *
foo = SAS7BDAT('/support/sas/locked_data.sas7bdat')
# This converts to dataframe:
ds = foo.to_data_frame()
Personally I think the better approach would be to export the data using SAS then process the external file as needed using Python.
In SAS, you can do this...
libname datalib "/support/sas";
filename sasdump "/support/textfiles/locked_data.txt";
proc export
data = datalib.locked_data
outfile = sasdump
dbms = tab
label
replace;
run;
The downside to this is that while the column labels are used rather than the variable names, the labels are enclosed in double quotes. When processing in Python, you may need to programmatically remove them if they cause a problem. I hope that helps even though it doesn't use Python like you wanted.

Categories