I am currently attempting to modify a series of programs by utilizing dictionaries as opposed to arrays. I have columns of raw information in a file, which is then read into an ASCII csv file. I need to convert this file into a dictionary, so that it can be fed into another program.
I used a numpy.genfromtxt to pull out the information i need from the csv file, following this format:
a,b,c,d = np.genfromtxt("file",delimiter = ',', unpack = true)
this step works completely fine.
I then attempt to build a dictionary:
ouputDict = dict([a,a],[b,b],[c,c],[d,d])
As i understand it, this should make the key "a" in the dictionary a correspond to the array "a".
thus if:
a = [1,2,3,4]
then:
outputDict[a][0] = 1
However, when i attempt to create this dictionary i receive the following error:
TypeError: unhashable type: 'numpy.ndarray'
Why can't I construct an array in this fashion and what is the workaround, if any? Any help will be greatly appreciated!
You can do this even with using collections
Declare your dictionary as:
Dictionary = {}; // {} makes it a key, value pair dictionary
add your value for which you want an array as a key by declaring
Dictionary[a] = [1,2,3,4]; // [] makes it an array
So now your dictionary will look like
{a: [1,2,3,4]}
Which means for key a, you have an array and you can insert data in that which you can access like dictionary[a][0] which will give the value 1 and so on. :)
Btw.. If you look into examples of a dictionary, array and key value pairs, nested dictionary, your concept will get clearer.
Copied from my comment:
Correct dictionary formats:
{'a':a, 'b':b,...}, or
dict(a=a, b=b,...)
dict([('a', a), ('b', b),...])
The goal is to make the strings 'a','b',etc the keys, not the variable values.
Related
I am writing a python script using excel and I am fairly new to programming
Representation of data
I want the use the values in column C, D and E to get the the value in column B.
I tried using a dictionary but it seems you can only use one key with a dictionary.
What data structure can I use for this situation?
You can use a tuple for your dictionary's key. So it would look something like this:
myDict = {}
myDict[(C, D, E)] = A
This is a question that has two parts:
First, I have a python UDF that creates a list of strings of unknown length. The input to the UDF is a map (dict in python) and the number of keys is essentially unknown (it is what I'm trying to obtain).
What I don't know is how to output that in a schema that lets me return it as a list (or some other iterable data structure). This is what I have so far:
#outputSchema("?????") #WHAT SHOULD THE SCHEMA BE!?!?
def test_func(input):
output = []
for k, v in input.items():
output.append(str(key))
return output
Now, the second part of the question. Once in Pig I want to apply a SHA hash to each element in the "list" for all my users. Some Pig pseudo code:
USERS = LOAD 'something' as (my_map:map[chararray])
UDF_OUT = FOREACH USERS GENERATE my_udfs.test_func(segment_map)
SHA_OUT = FOREACH UDF_OUT GENERATE SHA(UDF_OUT)
The last line is likely wrong as I want to apply the SHA to each element in the list, NOT to the whole list.
To answer your question, since you are returning a python list who's contents are a string, you will want your decorator to be
#outputSchema('name_of_bag:{(keys:chararray)}')
It can be confusing when specifying this structure because you only need to define what one element in the bag would look like.
That being said, there is a much simpler way to do what you require. There is a function KEYSET() (You can reference this question I answered) that will extract the keys from a Pig Map. So using the data set from that example and adding a few more keys to the first one since you said your map contents are variable in length
maps
----
[a#1,b#2,c#3,d#4,e#5]
[green#sam,eggs#I,ham#am]
Query:
REGISTER /path/to/jar/datafu-1.2.0.jar;
DEFINE datafu.pig.hash.SHA();
A = LOAD 'data' AS (M:[]);
B = FOREACH A GENERATE FLATTEN(KEYSET(M));
hashed = FOREACH B GENERATE $0, SHA($0);
DUMP hashed;
Output:
(d,18ac3e7343f016890c510e93f935261169d9e3f565436429830faf0934f4f8e4)
(e,3f79bb7b435b05321651daefd374cdc681dc06faa65e374e38337b88ca046dea)
(b,3e23e8160039594a33894f6564e1b1348bbd7a0088d42c4acb73eeaed59c009d)
(c,2e7d2c03a9507ae265ecf5b5356885a53393a2029d241394997265a1a25aefc6)
(a,ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb)
(ham,eccfe263668d171bd19b7d491c3ef5c43559e6d3acf697ef37596181c6fdf4c)
(eggs,46da674b5b0987431bdb496e4982fadcd400abac99e7a977b43f216a98127721)
(green,ba4788b226aa8dc2e6dc74248bb9f618cfa8c959e0c26c147be48f6839a0b088)
I'm new to Python. I need a data structure to contain a tuple of two elements: date and file path. I need to be able to change their values from time to time, hence I'm not sure a tuple is a good idea as it is immutable. Every time I need to change it I must create a new tuple and reference it, instead of really changing its values; so, we may have a memory issue here: a lot of tuples allocated.
On the other hand, I thought of a list , but a list isn't in fixed size, so the user may potentially enter more than 2 elements, which is not ideal.
Lastly, I would also want to reference each element in a reasonable name; that is, instead of list[0] (which maps to the date) and list[1] (which maps to the file path), I would prefer a readable solution, such as associative arrays in PHP:
tuple = array()
tuple['Date'] = "12.6.15"
tuple['FilePath] = "C:\somewhere\only\we\know"
What is the Pythonic way to handle such situation?
Sounds like you're describing a dictionary (dict)
# Creating a dict
>>> d = {'Date': "12.6.15", 'FilePath': "C:\somewhere\only\we\know"}
# Accessing a value based on a key
>>> d['Date']
'12.6.15'
# Changing the value associated with that key
>>> d['Date'] = '12.15.15'
# Displaying the representation of the updated dict
>>> d
{'FilePath': 'C:\\somewhere\\only\\we\\know', 'Date': '12.15.15'}
Why not use a dictionary. Dictionaries allow you to map a 'Key' to a 'Value'.
For example, you can define a dictionary like this:
dict = { 'Date' : "12.6.15", 'Filepath' : "C:\somewhere\only\we\know"}
and you can easily change it like this:
dict['Date'] = 'newDate'
I'm currently storing data into a dictionary as a tuple, but I don't know how to unpack the tuple from the dictionary itself. I get a ValueError saying too many values to unpack in the way I am trying to do it. Here is the code:
for row in csvReader:
if row['de_description'] and row['nh_description']:
if 'XT2R' in row['de_description']:
id = (row['de_description'], row['nh_description']:
if ('TCN' in row['de_description'] and '77880' in row['src_dp']:
rounded_time = int(float(row['rr_polltime']))
dataDict[id].append((rounded_time, row['rr_age']))
timeSet.add(rounded_time)
fileHandle.close()
#unpacking tuple?
for id, (valX,valY) in dataDict.iteritems():
ageSet.add(valY)
print "ageSet=", ageSet
I also realize there is a lot of redundancy in my code but that is not currently my issue. If anyone has ever worked with unpacking tuples from a dictionary, pointing me in the right direction would be great.
Referring to
dataDict[id].append((rounded_time, row['rr_age']))
your dataDict seems to be a dictionary of lists, since you append values (tuples, in this case) to dataDict[id].
dataDict.iteritems(), however, returns an iter object of key-value-pairs, which are the dictionary key and the list.
Trying to unpack the list into (valX, valY) results in the ValueError you experience.
I'm trying to add items to an array in python.
I run
array = {}
Then, I try to add something to this array by doing:
array.append(valueToBeInserted)
There doesn't seem to be a .append method for this. How do I add items to an array?
{} represents an empty dictionary, not an array/list. For lists or arrays, you need [].
To initialize an empty list do this:
my_list = []
or
my_list = list()
To add elements to the list, use append
my_list.append(12)
To extend the list to include the elements from another list use extend
my_list.extend([1,2,3,4])
my_list
--> [12,1,2,3,4]
To remove an element from a list use remove
my_list.remove(2)
Dictionaries represent a collection of key/value pairs also known as an associative array or a map.
To initialize an empty dictionary use {} or dict()
Dictionaries have keys and values
my_dict = {'key':'value', 'another_key' : 0}
To extend a dictionary with the contents of another dictionary you may use the update method
my_dict.update({'third_key' : 1})
To remove a value from a dictionary
del my_dict['key']
If you do it this way:
array = {}
you are making a dictionary, not an array.
If you need an array (which is called a list in python ) you declare it like this:
array = []
Then you can add items like this:
array.append('a')
Arrays (called list in python) use the [] notation. {} is for dict (also called hash tables, associated arrays, etc in other languages) so you won't have 'append' for a dict.
If you actually want an array (list), use:
array = []
array.append(valueToBeInserted)
Just for sake of completion, you can also do this:
array = []
array += [valueToBeInserted]
If it's a list of strings, this will also work:
array += 'string'
In some languages like JAVA you define an array using curly braces as following but in python it has a different meaning:
Java:
int[] myIntArray = {1,2,3};
String[] myStringArray = {"a","b","c"};
However, in Python, curly braces are used to define dictionaries, which needs a key:value assignment as {'a':1, 'b':2}
To actually define an array (which is actually called list in python) you can do:
Python:
mylist = [1,2,3]
or other examples like:
mylist = list()
mylist.append(1)
mylist.append(2)
mylist.append(3)
print(mylist)
>>> [1,2,3]
You can also do:
array = numpy.append(array, value)
Note that the numpy.append() method returns a new object, so if you want to modify your initial array, you have to write: array = ...
Isn't it a good idea to learn how to create an array in the most performant way?
It's really simple to create and insert an values into an array:
my_array = ["B","C","D","E","F"]
But, now we have two ways to insert one more value into this array:
Slow mode:
my_array.insert(0,"A") - moves all values to the right when entering an "A" in the zero position:
"A" --> "B","C","D","E","F"
Fast mode:
my_array.append("A")
Adds the value "A" to the last position of the array, without touching the other positions:
"B","C","D","E","F", "A"
If you need to display the sorted data, do so later when necessary. Use the way that is most useful to you, but it is interesting to understand the performance of each method.
I believe you are all wrong. you need to do:
array = array[] in order to define it, and then:
array.append ["hello"] to add to it.