I have a list of fields an am trying to create an unpivot expression with stack() in pyspark.
stack() requires the params: number, field name, then field value.
stack(30, 'field1', field1...)
I have a list of lists like
[['field1','field1'],['field2','field2']...]
I then can make a single list
['field1','field1','field2','field2']
But i need to remove the single quotes from the second occurence, so it works as the "field value"
unpivot_Expr = "stack(30, 'field1',field1,'field2',field2...)"
So far i'm getting stack(30, 'field1','field1','field2','field2'...)
I'm not sure how, or which is the easiest place to remove the single quotes? Any help is much appreciated.
Edit:
Sorry should've given context, I need to insert this string into a pyspark select expression
unpivot_df = df.select("hashKey", expr(unpivot_Expr))
Currently I drop the list into the string and replace the [] like this
unpivot_Expr = "stack({0}, {1})".format(str(len(fieldList)), str(fieldList).replace("[","").replace("]",""))
How about building up the string unpivot_Expr piece by piece via:
all_fields = [
['field1','field1'],
['field2','field2']
]
unpivot_Expr = "stack(30"
for pair in all_fields:
unpivot_Expr += f", '{pair[0]}', {pair[1]}"
unpivot_Expr += ")"
print(unpivot_Expr)
I think that will give you the tring you seek:
stack(30, 'field1', field1, 'field2', field2)
Related
I'm creating a class that simulates an ATM and I don't know why I'm getting a syntax error on the colon between 'id' and 'zwhite'.
import pickle
import sys
import os
class ATM(object):
def __init__(self):
self.users = ['id':'zwhite','name':'Zack White','pin':'4431','balance':845,
'id':'jzimmerman','name':'John Zimmerman','pin':'6780','balance':59,
'id':'cbenson','Carly Benson','pin':'8991','balance':720]
def check(self,ids):
print Your balance: +str(ids['balance'])
def withdraw(self,person):
for i in self.users:
if i['id'] == person['id']:
print Your balance: +str(i['balance'])
I suspect you want to be creating a list of dictionaries. This would make sense of the duplicated keys on separate lines:
Try:
self.users = [{'id':'zwhite','name':'Zack White','pin':'4431','balance':845},
{'id':'jzimmerman','name':'John Zimmerman','pin':'6780','balance':59},
{'id':'cbenson','name':'Carly Benson','pin':'8991','balance':720}]
I also added the 'name' key for the value 'Carly Benson' in the last row.
You should use a dictionary, not a list.
self.users = {...}
The square brackets ([ ]) are used to create a list. List are used to store elements only. You cannot store keys and values. You can just store elements like [1,2,3,"ab","cd",3.23] etc. If you want to store the data in key-value pairs, then you'll have to use dictionary. By looking at the code, I think you know the concept of dictionary but you're confused with creating dictionary. Curly braces ({ }) are used to create dictionary. Just change the square brackets to curly braces
I have a list of personal data(id_code,birth_year,born_in) and i want to sort the any arrays in list but i have a problem in this work.
my list data :
data = [
'id_code:3211238576;birth_year:1350;born_in:Boushehr',
'id_code:9801233575;born_in:Argentina;birth_year:1360',
'born_in:Portugal;id_code:0219206431;birth_year:1358',
'id_code:0021678913;born_in:Shiraz;birth_year:1120',
'id_code:1101102135;born_in:Gilan;birth_year:1152',
]
The code I wrote and has an bug:
for i in data:
s = ''.join(sorted(i))
print(s)
my code output:
01112233355678:::;;B___abbcddeeehhhiiinnooorrrrstuy
00112333556789:::;;A___aabbcddeeeghiiiinnnnoorrrrtty
00111223345689:::;;P___aabbcddeeghiiilnnooorrrrttuy
00011112236789:::;;S___aabbcddeehhiiiinnoorrrrtyz
00111111122355:::;;G___aabbcddeehiiiilnnnoorrrty
But! The code to i want to have in output(True answer):
id_code:3211238576,born_in:Boushehr,birth_year:1350
id_code:9801233575,born_in:Argentina,birth_year:1360
id_code:0219206431,born_in:Portugal,birth_year:1358
id_code:0021678913,born_in:Shiraz,birth_year:1120
id_code:1101102135,born_in:Gilan,birth_year:1152
Please help me to solve this problem
Assuming you want your fields to be in specific order, try this one: (I put comments in code for clarification):
data = [
'id_code:3211238576;birth_year:1350;born_in:Boushehr',
'id_code:9801233575;born_in:Argentina;birth_year:1360',
'born_in:Portugal;id_code:0219206431;birth_year:1358',
'id_code:0021678913;born_in:Shiraz;birth_year:1120',
'id_code:1101102135;born_in:Gilan;birth_year:1152',
]
def sorter(x: str):
# getting the field name
field = x.split(':')[0]
# returning it's index from "sorted_by" list
return sorted_by.index(field)
# The index of these fields will be used for sorting in "sorter" function.
sorted_by = ['id_code', 'born_in', 'birth_year']
result = []
for item in data:
# splitting the fields
splited = item.split(';')
splited.sort(key=sorter)
# building the line back and append it
result.append(';'.join(splited))
for i in result:
print(i)
output :
id_code:3211238576;born_in:Boushehr;birth_year:1350
id_code:9801233575;born_in:Argentina;birth_year:1360
id_code:0219206431;born_in:Portugal;birth_year:1358
id_code:0021678913;born_in:Shiraz;birth_year:1120
id_code:1101102135;born_in:Gilan;birth_year:1152
Now you can easily change the fields order in sorted_by list and see the result.
Try
out = [';'.join(reversed(sorted(x.split(';')))) for x in data]
print(out)
This takes every element of the data list and splits it in three strings, each of which contains one of the three attributes. Then, it arranges the three strings in reversed alphabetical order and joins them back into one string, separated by ';'
I am working on an advanced search filter for my web app, and want to know if there's a way to implement a filter like this. What I need is to be able to specify a list of substrings, and then filter the query for values that contain any substring from the specified list.
Account.name
[Cheyenne,
Dan,
Fran,
Sharon,
Karen]
filter_list = [‘an’, ‘ar’]
Post Query
[Dan,
Fran,
Sharon,
Karen]
So in the example above, Cheyenne would be removed because that name doesn't contain 'an' or 'ar'.
What I've Tried
What I've tried so far are the contains and the in_ operator although I'm not sure how I can combine the two though. _in will only work if the strings are an exact match, and contains will only work with 1 string at time. Any ideas?
query = session.query(Account).filter(Account.name).in_(filter_list)
query = session.query(Account).filter(Account.name).contains(filter_list[0])
I ended up figuring it out on my own. Here's how I did it, if anybody is curious.
mylist = ['ar', 'an']
filter_list = [Account.first_name.contains(x) for x in mylist]
q = session.query(Account).filter(
or_(
*filter_list
)
)
I have list with one item in it, then I try to dismantle, & rebuild it.
Not really sure if it is the 'right' way, but for now it will do.
I tried using replace \ substitute, other means of manipulating the list, but it didn't go too far, so this is what I came up with:
This is the list I get : alias_account = ['account-12345']
I then use this code to remove the [' in the front , and '] from the back.
NAME = ('%s' % alias_account).split(',')
for x in NAME:
key = x.split("-")[0]
value = x.split("-")[1]
alias_account = value[:-2]
alias_account1 = key[2:]
alias_account = ('%s-%s') % (alias_account1, alias_account)
This works beautifully when running print alias_account.
The problem starts when I have a list that have ['acc-ount-12345'] or ['account']
So my question is, how to include all of the possibilities?
Should I use try\except with other split options?
or is there more fancy split options ?
To access a single list element, you can index its position in square brackets:
alias_account[0]
To hide the quotes marking the result as a string, you can use print():
print(alias_account[0])
For example, given a list of strings prices = ["US$200", "CA$80", "GA$500"],
I am trying to only return ["US", "CA", "GA"].
Here is my code - what am I doing wrong?
def get_country_codes(prices):
prices = ""
list = prices.split()
list.remove("$")
"".join(list)
return list
Since each of the strings in the prices argument has the form '[country_code]$[number]', you can split each of them on '$' and take the first part.
Here's an example of how you can do this:
def get_country_codes(prices):
return [p.split('$')[0] for p in prices]
So get_country_codes(['US$200', 'CA$80', 'GA$500']) returns ['US', 'CA', 'GA'].
Also as a side note, I would recommend against naming a variable list as this will override the built-in value of list, which is the type list itself.
There are multiple problems with your code, and you have to fix all of them to make it work:
def get_country_codes(prices):
prices = ""
Whatever value your caller passed in, you're throwing that away and replacing it with "". You don't want to do that, so just get rid of that last line.
list = prices.split()
You really shouldn't be calling this list list. Also, split with no argument splits on spaces, so what you get may not be what you want:
>>> "US$200, CA$80, GA$500".split()
['US$200,', 'CA$80,', 'GA$500']
I suppose you can get away with having those stray commas, since you're just going to throw them away. But it's better to split with your actual separators, the ', '. So, let's change that line:
prices = prices.split(", ")
list.remove("$")
This removes every value in the list that's equal to the string "$". There are no such values, so it does nothing.
More generally, you don't want to throw away any of the strings in the list. Instead, you want to replace the strings, with strings that are truncated at the $. So, you need a loop:
countries = []
for price in prices:
country, dollar, price = price.partition('$')
countries.append(country)
If you're familiar with list comprehensions, you can rewrite this as a one-liner:
countries = [price.partition('$')[0] for price in prices]
"".join(list)
This just creates a new string and then throws it away. You have to assign it to something if you want to use it, like this:
result = "".join(countries)
But… do you really want to join anything here? It sounds like you want the result to be a list of strings, ['US', 'CA', 'GA'], not one big string 'USCAGA', right? So, just get rid of this line.
return list
Just change the variable name to countries and you're done.
Since your data is structured where the first two characters are the county code you can use simple string slicing.
def get_country_codes(prices):
return [p[:2] for p in prices]
You call the function sending the prices parameter but your first line initialize to an empty string:
prices = ''
I would also suggest using the '$' character as the split character, like:
list = prices.split('$')
try something like this:
def get_country_codes(prices):
list = prices.split('$')
return list[0]