Seeking for Help. Hi Guys i didnt code yet because i think i need some idea to access the csv and the row. so technically i want to replace the text with the id on the CSV file
import pandas as pd
df = pd.read_csv('replace.csv')
print(df)
Please kindly view the photo. so if you see there is 3 column, so i want to replace the D Column if the D Column is Equal to A Column, then replace with the ID (column B). seeking for i idea if what is the first step or guide.. thanks
Photo
In The Photo
name | id | Replace
james | 5 | James,James,Tom
tom | 2 | Tom,James,James
jerry | 10 | Tom,Tom,Tom
What Im Expected Result:
name | id | Replace
james | 5 | 5,5,2
tom | 2 | 2,5,5
jerry | 10 | 2,2,2
Excel 365:
As per my comment, if it's ok to get data in a new column and with ms365, try:
Formula in E2:
=MAP(C2:C4,LAMBDA(x,TEXTJOIN(",",,XLOOKUP(TEXTSPLIT(x,","),A2:A4,B2:B4,"",0))))
Or, if all values will be present anyways:
=MAP(C2:C4,LAMBDA(x,TEXTJOIN(",",,VLOOKUP(TEXTSPLIT(x,","),A2:B4,2,0))))
Google-Sheets:
The Google-Sheets equivalent, as per your request, could be:
=MAP(C2:C4,LAMBDA(x,INDEX(TEXTJOIN(",",,VLOOKUP(SPLIT(x,","),A2:B4,2,0)))))
Python/Pandas:
After some trial and error I came up with:
import pandas as pd
df = pd.read_csv('replace.csv', sep=';')
df['Replace'] = df['Replace'].replace(pd.Series(dict(zip(df.name, df.id))).astype(str), regex=True)
print(df)
Prints:
name id Replace
0 James 5 5,5,2
1 Tom 2 2,5,5
2 Jerry 10 2,2,2
Note: I used the semi-colon as seperator in the function call to open the CSV.
Nested =substitute functions would make this easy.
=substitute(substitute(substitute(d2, a2, b2),a3,b3),a4,b4)
Related
I am somewhat of a beginner to python and have encountered the following problem working with openpyxl. For example I have the sample worksheet below:
Worksheet
| Boat ID | Emp ID | Emp Name | Start Date | Manager |
------------------------------------------------------
| 1 16044 Derrick ASAP Anthony |
| 1 16045 John ASAP Anthony |
| 1 16046 Bill ASAP Anthony |
| 1 16047 Joe ASAP Anthony |
| 2 16048 Justin ASAP Jacob |
| 2 16049 Sandy ASAP Jacob |
| 2 16050 Omar ASAP Jacob |
| 3 16051 Michael ASAP Nathan |
| 3 16052 Bill ASAP Nathan |
What I am trying to do is loop through the Boat ID column and while the values of the cell are the equivalent I want to take the respective row data to the right and open a new worksheet/workbook and copy paste rows in Col B:E.
So in theory, for every Boat ID = 1 we would take every row unique to ID 1 from Cols B:E open a new workbook and paste them accordingly. Next, for every Boat ID = 2 we would take the rows with ID = 2 in cols B:E, open a new workbook and paste accordingly. Similarly, we would repeat the process for every Boat ID = 3.
P.S. To keep it simple I have ordered the table by Boat ID in ascending order, but if someone wants bonus points they could opine on how it would be done if the table was not ordered.
Any help here would be appreciated as I am still learning and a complex problem like this would be beneficial to further enhance my skills.
I know I am way off but this is the logic I have so far.
f
WS
I used a dictionary in order to categorize all of the boats that I read from the original file according to their "boat id"
the keys were the boats IDs
and the values were the boats
as you can see the code will work even if the original XML file is not sorted by the boat IDs
wb = openpyxl.load_workbook("boats.xlsx", read_only=True)
ws = wb.active
boat_dict = {}
for row_index in range(1,ws.max_row+1):
row = [cell.value for cell in ws[row_index]]
boat_id = row[0]
if boat_id in boat_dict:
boat_dict[boat_id].append(row[1:])
else:
boat_dict[boat_id] = [row[1:]]
new_wb=openpyxl.Workbook()
for boat_id , boats in boat_dict.items():
ws = new_wb.create_sheet(title="Boat id %s"%boat_id)
for boat in boats:
ws.append(boat)
new_wb.save("boats_ans.xlsx")
hope I could help :)
I am working on dataframe with python.
in my first dataframe df1 i have :
+------+---------+-------------+-------------------------------+
| ID | PUBLICATION TITLE | DATE | JOURNAL |
+------+---------------------+--------------+------------------+
| 1 "a" "01/10/2000" "book1" |
| 2 "b" "09/03/2005" NaN |
| NaN "b" "09/03/2005" "book2 |
| 5 "z" "21/08/1995" "book4" |
| 6 "n" "15/04/1993" "book9\xc3\x28" |
+--------------------------------------------------------------+
Here I would like to clean my dataframe but I don't know how to do it in this case.
Indeed there are two points which block me.
The first one is that lines 2 and 3 seems to be the same line because the title of the publication is the same and because I think that the title of the publication is unique to a newspaper
The second point is for the last line one to \xc3\x28.
How can I clean my dataframe in a smart way, to be able to use this code for other daataframe if possible?
First you should remove the row with ID = NaN. This can be done by:
df1 = df1[df1['ID'].notna()]
Then update the journal of the 2nd row:
df1.iloc[1, df1.columns.get_loc('JOURNAL')] = 'book2'
Finally, for the entry of 'book9\xc3\x28', you can update it by:
df1.iloc[4, df1.columns.get_loc('JOURNAL')] = 'book9'
What type of encoding are you using.
I recommend using "utf8" encoding for this purpose.
My dataframe like below:
name | salary
Tom | 10200
Kate |
Mi | 32311
The value of kate is ' ' regarding to salary and round_salary, I replace its value with ' ', so it show nothing in cell.
Question:
I want to create a new salary column base on rounding the salary to the nearest 10,000.
The outcome would look like below
name | salary | round_salary
Tom | 10200 | 10000
Kate | |
Mi | 32311 | 30000
my code shows below:
def round_income(salary):
if '' in salary:
return ''
else:
return salary.round(decimals = -4)
income.apply(lambda x: round_salary(x['income']), axis=1)
the output error is :
KeyError: ('salary', 'occurred at index 0')
any one know how to fix it? I found map or apply function can solve it, thanks anyone's help in advance. ~
Solution if no missing values but empty values for non numeric:
income['salary'] = (pd.to_numeric(income['salary'], errors='coerce')
.round(decimals = -4)
.fillna(''))
print (income)
name salary
0 Tom 10000
1 Kate
2 Mi 20000
Solution with missing values - all data in column salary are numeric:
income['salary'] = income['salary'].round(decimals = -4).astype('Int64')
print (income)
name salary
0 Tom 10000
1 Kate NaN
2 Mi 20000
So, I'm new to Python. I've been trying to apply the things I've learnt to real world problems. The task I've set myself is this..
I want to capture the two cell values '01/01/2018 and '08/01/2018' and print them into a new csv.file under the header value dates. I also want to create a new column which shows the value associated with that date in the original csv file.
Any help would be greatly appreciated or a point in the right direction.
Original table
Hierarchy | Dept | Emp | Alpha | Bravo | Charlie | 01/01/2018 | 08/01/2018|
Hierarchy 1 | Dept 1 | JC | h | o | l | 0 | 2 |
New table
Hierarchy |Dept | Emp | Alpha | Bravo | Charlie | Date |Value |
Hierarchy 1 |Dept 1 | JC | h | o | l | 01/01/2018 | 0 |
Hierarchy 1 |Dept 1 | JC | h | o | l | 08/01/2018 | 2 |
As #ChristianSloper mentions in his comment, pd.melt is designed for this. In your case, here is a one-liner:
df
Hierarchy Dept Emp Alpha Bravo Charlie 01/01/2018 08/01/2018
0 Hierarchy_1 Dept_1 JC h o l 0 2
pd.melt(df,
id_vars=df.columns[:-2],
value_vars=df.columns[-2:],
var_name='Date',
value_name='Value')
Hierarchy Dept Emp Alpha Bravo Charlie Date Value
0 Hierarchy_1 Dept_1 JC h o l 01/01/2018 0
1 Hierarchy_1 Dept_1 JC h o l 08/01/2018 2
Ok,
I am just going to go ahead and assume that your table is stored in a csv file. So we will start by reading that in:
import pandas as pd
df = pd.read_csv('mytable.csv',sep='|')
pd.melt(df,
id_vars = ['Hierarchy ', ' Dept ', ' Emp ', ' Alpha ', ' Bravo ', ' Charlie '],
value_vars=[' 01/01/2018 ',' 08/01/2018' ],
var_name='Date',
value_name='Value')
Gives the desired result.
After the help of the contributors I have completed my task, below is the code I've used!
Thanks to the community for offering help!
"""
Transforms Data into Desired Format
"""
#import pandas module
import pandas as pd
#create variable where df = to data.csv
df = pd.read_csv('data.csv')
#create new variable for df.columns
cols = df.columns
#use .melt() function to complete data manipulation
transformed_df = pd.melt(df,
id_vars=cols[:6],
value_vars=cols[6:])
#Assert Data has been formatted correctly
print(transformed_df)
#create new csv file with new data
transformed_df.to_csv('melted_data.csv')
print("\nData has been Melted!")
I've got a basic dictionary that gives me a count of how many times data shows up. e.g. Adam: 10, Beth: 3, ... , Zack: 1
If I do df = pd.DataFrame([dataDict]).T then the keys from the dictionary become the index of the dataframe and I only have 1 true column of data. I've looked by I haven't found a way around this so any help would be appreciated.
Edit: More detail
The dictionary was formed from a count function of another dataframe e.g. dataDict = df1.Name.value_counts().to_dict ()
This is my expected output.
| Name | Count
------ | -----|------
0 | Adam | 10
------ | -----|------
1 | Beth | 3
What I'm getting at the moment is this:
| Count
-----|------
Adam | 10
-----|------
Beth | 3
try reset_index
dataDict = dict(Adam=10, Beth=3, Zack=1)
df = pd.Series(dataDict).rename_axis('Name').reset_index(name='Count')
df