Change substring in a column from a dict (Python Pandas) - python

I've got 2 dataframe from 2 excel sheets (in the same file). I want to change the name of each molecules in the first sheet 1 with the "official id" from a database present in the second sheet 2.
screen first dataframe
screen second dataframe
import pandas as pd
reactions = pd.read_excel ("/Users/Python/reactions.xlsx")
molecules = pd.read_excel ("/Users/Python/reactions.xlsx" ,
sheet_name= 'METS')
d = molecules.set_index('MOLID')['MOLNAME'].to_dict()
#not work
reactions['EQUATION'] = reactions['EQUATION'].str.replace('\d+','').replace(d)
I have the old/new molecules name in a dictionary, that I also created from the 2nd sheet:
d
And it is like
{....'glucose[c]': 'glc_D',
'glucose[s]': 'glc_D',
'glucose[x]': 'glc_D', ....}
In the first database the column where I want to change the molecules name is call EQUATION and it is like: "ATP[c] + glucose[c] => ADP[c] + glucose6phosphate[c]"
I try change with this code, it doesn't error, but the molecules in my dataframe haven't changed.
Thank you for the time

How to replace multiple substrings in a Pandas series using a dictionary?
I think if you adjust the code to something like this you are able to do it
reactions['EQUATION'].apply(lambda x: ' '.join([d.get(i, i) for i in x.split()]))

Related

How to replace the blank cells in Excel with 0 using Python?

I'm trying to replace the blank cells in excel to 0 using Python. I have a loop script that will check 2 excel files with the same WorkSheet, Column Headers and Values. Now, from the picture attached,
enter image description here
the script will write the count to Column Count in Excel 2 if the value of Column A in Excel 2 matches to the value of Column A in Excel 1. Now, the problem that I have are the values in Column A in Excel 2 that doesn't have a match in Column A in Excel 1, it leaves the Column Count in Excel 2 blank.
Below is a part of the script that will check the 2 excel files I have. I'm trying the suggestion from this link Pandas: replace empty cell to 0 but it doesn't work for me and I get result.fillna(0, inplace=True) NameError: name 'result' is not defined error message. Guidance on how to achieve my goal would be very nice. Thank you in advance.
import pandas as pd
import os
import openpyxl
daily_data = openpyxl.load_workbook('C:/Test.xlsx')
master_data = openpyxl.load_workbook('C:/Source.xlsx')
daily_sheet = daily_data['WorkSheet']
master_sheet = master_data['WorkSheet']
for i in daily_sheet.iter_rows():
Column_A = i[0].value
row_number = i[0].row
for j in master_sheet.iter_rows():
if j[0].value == Column_A:
daily_sheet.cell(row=row_number, column=6).value = j[1].value
#print(j[1].value)
daily_data.save('C:/Test.xlsx')
daily_data = pd.read_excel('C:/Test.xlsx')
daily_data.fillna(0, inplace=True)
it seems you've made a few fundamental mistakes in your approach. First off, "result" is an object, specifically its a dataframe that someone else made (from that other post) it is not your dataframe. Thus, you need to run it on your dataframe. In python, we have whats called an object oriented approach, meaning that objects are the key players. .fillna() is a mthod that operates on your object. Thus the usage for a toy example is as follows:
my_df = pd.read_csv(my_path_to_my_df_)
my_df.fillna(0, inplace=True)
also this method is for dataframes thus you will need to convert it from the object the openpyxl library creates, at least thats what i would assume i haven't used this library before. Therefore in your data you would do this:
daily_data = pd.read_excel('C:/Test.xlsx')
daily_data.fillna(0, inplace=True)

Changing data in CSV cells with Python

I'm looking to insert a few characters into the beginning of a cell on a CSV using Python. Python needs to do this with the same cell on each row.
As an example, see:
Inserting values into specific cells in csv with python
So:
row 1 - cell 3 'Qwerty' - add 3 characters (HAL) to beginning of the cell. So cell now reads 'HALQwerty'
row 2 - cell 3 'Qwerty' - add 3 characters (HAL) to beginning of the cell. So cell now reads 'HALQwerty'
row 3 - cell 3 'Qwerty' - add 3 characters (HAL) to beginning of the cell. So cell now reads 'HALQwerty'
Does anyone know how to do this?
I found this link:
https://www.protechtraining.com/blog/post/python-for-beginners-reading-manipulating-csv-files-737
But it doesn't go into enough detail.
Simplest way is probably to use Pandas. First run 'pip install pandas'
import pandas as pd
# read the CSV file and store into dataframe
df = pd.read_csv("test.csv")
# change value of a single cell directly
# this is selecting index 4 (row index) and then the column name
df.at[4,'column-name'] = 'HALQwerty'
# change multiple values simultaneously
# here we have a range of rows (0:4) and a couple column values
df.loc[0:4 ,['Num','NAME']] = [100, 'HALQwerty']
# write out the CSV file
df.to_csv(f"output.csv")
Pandas allows for a lot of control over your CSV files, and its well documented.
https://pandas.pydata.org/docs/index.html
Edit: To allow conditional appending of text:
df = pd.DataFrame({'col1':['a', 'QWERTY', "QWERTY", 'b'], 'col2':['c', 'tortilla', 'giraffe', 'monkey'] })
mask = (df['col1'] == 'QWERTY')
df.loc[mask, 'col1'] = 'HAL' + df['col1'].astype(str)
The mask is the subset of rows that match a condition (where cell value equals "QWERTY"). The ".loc" function identifies where in the dataframe that subset is, and helps to apply whatever change you want.

Generate Excel files for each of element in a column using Python

I want to generate Excel files for each of element in a column :
I will describe what I want with pictures :
I have an excel file like this:
I want for each element of the column "FRUIT", to be on a different excel file like this :
excel-BANANA.xlsx :
excel-CHERRY.xlsx :
excel-APPLE.xlsx :
Is it possible ?
As BoarGules mentioned, you can use pandas.
Try this:
import pandas as pd
# Removes the default header style so it looks like your screenshots
pd.io.formats.excel.header_style = None
df = pd.read_excel(r"fruits.xlsx")
for fruit in df['FRUIT'].unique():
df.loc[df['FRUIT'] == fruit].to_excel(f"excel-{fruit}.xlsx", index=False)
It reads the fruits.xlsx, iterates over the unique values in the FRUIT column (in this case: BANANA, CHERRY, and APPLE), and filters the dataframe to include only that one fruit each iteration.
to_excel(f"excel-{fruit}.xlsx", index=False) saves the dataframe as an Excel file without including an index (row numbers).
See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html
See also https://docs.python.org/3/tutorial/inputoutput.html#formatted-string-literals

How to export a dictionary to excel using Pandas

I am trying to export some data from python to excel using Pandas, and not succeeding. The data is a dictionary, where the keys are a tuple of 4 elements.
I am currently using the following code:
df = pd.DataFrame(data)
df.to_excel("*file location*", index=False)
and I get an exported 2-column table as follows:
I am trying to get an excel table where the first 3 elements of the key are split into their own columns, and the 4th element of the key (Period in this case) becomes a column name, similar to the example below:
I have tried using different additions to the above code but I'm a bit new to this, and so nothing is working so far
Based on what you show us (which is unreplicable), you need pandas.MultiIndex
df_ = df.set_index(0) # `0` since your tuples seem to be located at the first column
df_.index = pd.MultiIndex.from_tuples(df_.index) # We convert your simple index into NDimensional index
# `~.unstack` does the job of locating your periods as columns
df_.unstack(level=-1).droplevel(0, axis=1).to_excel(
"file location", index=True
)
you could try exporting to a csv instead
df.to_csv(r'Path where you want to store the exported CSV file\File Name.csv', index = False)
which can then be converted to an excel file easily

how to only read rows with the first column containing name i

I have two excel worksheets I am reading in Python. The first worksheet has a list of companies names. The second is a sheet with multiple of the same companies' names and data to the right that corresponds to the row.
[![Worksheet 1][1]][1]
[![Worksheet 2][2]][2]
I want to make some kind of condition, if the name in column A WS 2 matches the name in WS 1, then print the data (columns A:F WS 2) only for the rows corresponding to the name.
I am pretty new to coding, so I've been playing with it a lot without finding much luck. Right now I don't have much code because I tried restarting again. Been trying to use just pandas to read, sometimes I've been trying openpyxl.
import pandas as pd
import xlsxwriter as xlw
import openpyxl as xl
TickList = pd.read_excel("C:\\Users\\Ashley\\Worksheet1.xlsx",sheet_name='Tickers', header=None)
stocks = TickList.values.ravel()
Data = pd.read_excel("C:\\Users\\Ashley\\Worksheet2.xlsx", sheet_name='Pipeline', header=None, usecols="A:F")
data = Pipeline.values.ravel()
for i in stocks:
for t in data:
if i == t:
print(data)
[1]: https://i.stack.imgur.com/f6mXI.png
[2]: https://i.stack.imgur.com/4vKGR.png
I would imagine that the first thing you are doing wrong is not stipulating the key value on which the "i" in stocks is meant to match on the values in "t". Remember - "t" are the values - all of them. You have to specify that you wish to match the value of "i" to (probably) the first column of "t". What you appear to be doing here is akin to a vlookup without the target range properly specified.
Whilst I do not know the exact method in which the ravel() function stores the data, I have to believe something like this would be more likely to work:
for i in stocks:
for t in data:
if i == t[0]:
print(t)

Categories