Check if a list contain element of a csv file - python

I have a csv file with items like these: some,foo,bar and i have a different list in python with different item like att1,some,bar,try,other . Is possible for every list, create a row in the same csv file and set 1 in correspondence of the 'key' is present and 0 otherwise? So in this case the csv file result would be:
some,foo,bar
1,0,1

Here's one approach, using Pandas.
Let's say the contexts of example.csv are:
some,foo,bar
Then we can represent sample data with:
import pandas as pd
keys = ["att1","some","bar","try","other"]
data = pd.read_csv('~/Desktop/example.csv', header=None)
print(data)
0 1 2
0 some foo bar
matches = data.apply(lambda x: x.isin(keys).astype(int))
print(matches)
0 1 2
0 1 0 1
newdata = pd.concat([data, matches])
print(newdata)
0 1 2
0 some foo bar
0 1 0 1
Now write back to CSV:
newdata.to_csv('example.csv', index=False, header=False)
# example.csv contents
some,foo,bar
1,0,1
Given data and keys, we can condense it all into one chained command:
(pd.concat([data,
data.apply(lambda x: x.isin(keys).astype(int))
])
.to_csv('example1.csv', index=False, header=False))

You could create a dictionary with the keys as the column names.
csv = {
"some" : [],
"foo" : [],
"bar" : []
}
Now, for each list check if the values are in the dict's keys, and append as necessary.
for list in lists:
for key in csv.keys():
if key in list:
csv[key].append(1)
else:
csv[key].append(0)

Related

name/save data frames that are currently in a dictionary in a for loop pandas

I have a dictionary of dataframes (the key is the name of the data frame and the value is the rows/columns). Each dataframe within the dictionary has just 2 columns and varying numbers of rows. I also have a list that has all of the keys in it.
I need to use a for-loop to iteratively name each dataframe with the key and have it saved outside of the dictionary. I know I can access each data frame using the dictionary, but i don't want to do it that way. I am using Spyder so I like to look at my tables in the Variable Explorer and I do not like printing them to the console. Additionally, I would like to modify some of the completed data frames and I need them to be their own thing for that.
Here is my code to make the dictionary (i did this because I wanted to look at all of the categories in each column with the frequency of those values):
import pandas as pd
mydict = {
"dummy":[1, 1, 1],
"type":["new", "old", "new"],
"location":["AB", "BC", "ON"]
}
mydf = pd.DataFrame(mydict)
colnames = mydf.columns.tolist()
mydict2 = {}
for i in colnames:
mydict2[i] = pd.DataFrame(mydf.groupby([i, 'dummy']).size())
print(mydict2)
mydf looks like this:
dummy
type
location
1
new
AB
1
old
BC
1
new
ON
the output of print(mydict2) looks like this:
{'dummy': 0
dummy dummy
1 1 3, 'type': 0
type dummy
new 1 2
old 1 1, 'location': 0
location dummy
AB 1 1
BC 1 1
ON 1 1}
I want the final output to look like this:
Type:
Type
Dummy
new
2
old
1
Location
Location
Dummy
AB
1
BC
1
ON
1
I am basically just trying to generate a frequency table for each column in the original table, using a loop. Any help would be much appreciated!
i believe this yeilds the correct output :
type_count = mydf[["type", "dummy"]].groupby(by=['type'])['dummy'].sum().reset_index()
loca_count = mydf[["location", "dummy"]].groupby(by=['location'])['dummy'].sum().reset_index()
Edit :
Dynamically, you could add all the dataframes to a loop like below : (assuming that you want to do it based on the dummy column)
df_list = []
for name in colnames:
if name != "dummy":
df_list.append(mydf[[name, "dummy"]].groupby(by=[name])['dummy'].sum().reset_index())

Pandas dataframe writing to excel as list. But I don't want data as list in excel

I have a code which iterate through excel and extract values from excel columns as loaded as list in dataframe. When I write dataframe to excel, I am seeing data with in [] and quotes for string ['']. How can I remove [''] when I write to excel.
Also I want to write only first value in product ID column to excel. how can I do that?
result = pd.DataFrame.from_dict(result) # result has list of data
df_t = result.T
writer = pd.ExcelWriter(path)
df_t.to_excel(writer, 'data')
writer.save()
My output to excel
I am expecting output as below and Product_ID column should only have first value in list
I tried below and getting error
path = path to excel
df = pd.read_excel(path, engine="openpyxl")
def data_clean(x):
for index, data in enumerate(x.values):
item = eval(data)
if len(item):
x.values[index] = item[0]
else:
x.values[index] = ""
return x
new_df = df.apply(data_clean, axis=1)
new_df.to_excel(path)
I am getting below error:
item = eval(data)
TypeError: eval() arg 1 must be a string, bytes or code object
df_t['id'] = df_t['id'].str[0] # this is a shortcut for if you only want the 0th index
df_t['other_columns'] = df_t['other_columns'].apply(lambda x: " ".join(x)) # this is to "unlist" the lists of lists which you have fed into a pandas column
This should be the effect you want, but you have to make sure that the data in each cell is ['', ...] form, and if it's different you can modify the way it's handled in the data_clean function:
import pandas as pd
df = pd.read_excel("1.xlsx", engine="openpyxl")
def data_clean(x):
for index, data in enumerate(x.values):
item = eval(data)
if len(item):
x.values[index] = item[0]
else:
x.values[index] = ""
return x
new_df = df.apply(data_clean, axis=1)
new_df.to_excel("new.xlsx")
The following is an example of df and modified new_df(Some randomly generated data):
# df
name Product_ID xxx yyy
0 ['Allen'] ['AF124', 'AC12414'] [124124] [222]
1 ['Aaszflen'] ['DF124', 'AC12415'] [234125] [22124124,124125]
2 ['Allen'] ['CF1sdv24', 'AC12416'] [123544126] [33542124124,124126]
3 ['Azdxven'] ['BF124', 'AC12417'] [35127] [333]
4 ['Allen'] ['MF124', 'AC12418'] [3528] [12352324124,124128]
5 ['Allen'] ['AF124', 'AC12419'] [122359] [12352324124,124129]
# new_df
name Product_ID xxx yyy
0 Allen AF124 124124 222
1 Aaszflen DF124 234125 22124124
2 Allen CF1sdv24 123544126 33542124124
3 Azdxven BF124 35127 333
4 Allen MF124 3528 12352324124
5 Allen AF124 122359 12352324124

Pandas groupby: Nested loop fails with key error

I have CSV file with the following test content:
Name;N_D;mu;set
A;10;20;0
B;20;30;0
C;30;40;0
x;5;15;1
y;15;25;1
z;25;35;1
I'm reading the file with pandas, group the data and then iterate through the data. Within each group, I want to iterate through the rows of the data set:
import pandas as pd
df = pd.read_csv("samples_test.csv", delimiter=";", header=0)
groups = df.groupby("set")
for name, group in groups:
somestuff = [group["N_D"], group["mu"], name]
for i, txt in enumerate(group["Name"]):
print(txt, group["Name"][i])
The code fails on the line print(txt, group["Name"][i]) at the first element of the second group with an key error. I don't understand, why...
Your code fails since the series index does not match with the enumerator index for each loop hence cannot match the keys for filtering, (Note: Also use .loc[] or .iloc[] and avoid chained indexing group["Name"][i])
groups = df.groupby("set")
for name, group in groups:
somestuff = [group["N_D"], group["mu"], name]
for i, txt in enumerate(group["Name"]):
print(i,group["Name"])
0 0 A
1 B
2 C
Name: Name, dtype: object
1 0 A
1 B
2 C
.......
....
Your code should be changed to below using .iloc[] and get_loc for getting the column index:
groups = df.groupby("set")
for name, group in groups:
somestuff = [group["N_D"], group["mu"], name]
for i, txt in enumerate(group["Name"]):
print(txt,group.iloc[i,group.columns.get_loc('Name')])
A A
B B
C C
x x
y y
z z

Dictionary in Pandas DataFrame, how to split the columns

I have a DataFrame that consists of one column ('Vals') which is a dictionary. The DataFrame looks more or less like this:
In[215]: fff
Out[213]:
Vals
0 {u'TradeId': u'JP32767', u'TradeSourceNam...
1 {u'TradeId': u'UUJ2X16', u'TradeSourceNam...
2 {u'TradeId': u'JJ35A12', u'TradeSourceNam...
When looking at an individual row the dictionary looks like this:
In[220]: fff['Vals'][100]
Out[218]:
{u'BrdsTraderBookCode': u'dffH',
u'Measures': [{u'AssetName': u'Ie0',
u'DefinitionId': u'6dbb',
u'MeasureValues': [{u'Amount': -18.64}],
u'ReportingCurrency': u'USD',
u'ValuationId': u'669bb'}],
u'SnapshotId': 12739,
u'TradeId': u'17304M',
u'TradeLegId': u'31827',
u'TradeSourceName': u'xxxeee',
u'TradeVersion': 1}
How can I split the the columns and create a new DataFrame, so that I get one column with TradeId and another one with MeasureValues?
try this:
l=[]
for idx, row in df['Vals'].iteritems():
temp_df = pd.DataFrame(row['Measures'][0]['MeasureValues'])
temp_df['TradeId'] = row['TradeId']
l.append(temp_df)
pd.concat(l,axis=0)
Here's a way to get TradeId and MeasureValues (using twice your sample row above to illustrate the iteration):
new_df = pd.DataFrame()
for id, data in fff.iterrows():
d = {'TradeId': data.ix[0]['TradeId']}
d.update(data.ix[0]['Measures'][0]['MeasureValues'][0])
new_df = pd.concat([new_df, pd.DataFrame.from_dict(d, orient='index').T])
Amount TradeId
0 -18.64 17304M
0 -18.64 17304M

Create and instantiate python 2d dictionary

I have two python dictionaries:
ccyAr = {'AUDCAD','AUDCHF','AUDJPY','AUDNZD','AUDUSD','CADCHF','CADJPY','CHFJPY','EURAUD','EURCAD','EURCHF','EURGBP','EURJPY','EURNZD','EURUSD','GBPAUD','GBPCAD','GBPCHF','GBPJPY','GBPNZD','GBPUSD','NZDCAD','NZDCHF','NZDJPY','NZDUSD','USDCAD','USDCHF','USDJPY'}
data = {'BTrades', 'BPips', 'BProfit', 'STrades', 'SPips', 'SProfit', 'Trades', 'Pips', 'Profit', 'Won', 'WonPC', 'Lost', 'LostPC'}
I've been trying to get my head round how to most elegantly create a construct in which each of 'data' exists in each of 'ccyAr'. The following are the two I feel are the closest, but the first results (now I realise) in arrays and the latter i more like pseudocode:
1.
table={ { data:[] for d in data } for ccy in ccyAr }
2.
for ccy in ccyAr:
for d in data:
table['ccy']['d'] = 0
I also want to set each of the entries to int 0 and I'd like to do it in one go. I'm struggling with the comprehension method as I end up creating each value of each inside directory member as a list instead of a value 0.
I've seen the autovivification piece but I don't want to mimic perl, I want to do it the pythonic way. Any help = cheers.
for ccy in ccyAr:
for d in data:
table['ccy']['d'] = 0
Is close.
table = {}
for ccy in ccyAr:
table[ccy] = {}
for d in data:
table[ccy][d] = 0
Also, ccyAr and data in your question are sets, not dictionaries.
What you are searching for is a pandas DataFrame of shape data x ccyAr. I give a minimal example here:
import pandas as pd
data = {'1', '2'}
ccyAr = {'a','b','c'}
df = pd.DataFrame(np.zeros((len(data), len(ccyAr))))
Then the most important step is to set both the columns and the index. If your two so-called dictionaries are in fact sets (as it seems in your code), use:
df.columns = ccyAr
df.index = data
If they are indeed dictionaries, you instead have to call their keys method:
df.columns = ccyAr.keys()
df.index = data.keys()
You can print df to see that this is actually what you wanted:
| a | c | b
-------------
1 | 0 0 0
2 | 0 0 0
And now if you try to access via df['a'][1] it returns you 0. It is the best solution to your problem.
How to do this using a dictionary comprehension:
table = {ccy:{d:0 for d in data} for ccy in ccyAr}

Categories