Exporting dataframe to csv not showing first column - python

I'm trying to export my df to a .csv file. The df has just two columns of data: the image name (.jpg), and the 'value_counts' of how many times that .jpg name occurs in the 'concat_Xenos.csv' file, i.e:
M116_13331848_13109013329679.jpg 19
M116_13331848_13109013316679.jpg 14
M116_13331848_13109013350679.jpg 12
M116_13331848_13109013332679.jpg 11
etc. etc. etc....
However, whenever I export the df, the .csv file only displayes the 'value_counts' column. How do I modify this?
My code is as follows:
concat_Xenos = r'C:\file_path\concat_Xenos.csv'
df = pd.read_csv(concat_Xenos, header=None, index_col=False)[0]
counts = df.value_counts()
export_csv = counts.to_csv (r'C:\file_path\concat_Xenos_valuecounts.csv', index=None, header=False)
Thanks! If any clarification is needed please ask :)
R

This is because the first column is set as index.
Use index=True:
export_csv = counts.to_csv (r'C:\file_path\concat_Xenos_valuecounts.csv', index=True, header=False)
or you can reset your index before exporting.
counts.reset_index(inplace=True)

Related

df = pd.read_csv('file.csv') adds random numbers and commas to file. Pandas in python

I am trying to read a csv file using pandas as so:
df = pd.read_csv('file.csv')
Here is the file before:
,schoolId,Name,Meetings Present
0,991,Jimmy Nuetron,2
1,992,Jimmy Fuetron,6
2,993,Cam Nuetron,4
Here is the file after:
,Unnamed: 0,schoolId,Name,Meetings Present
0,0.0,991.0,Jimmy Nuetron,2.0
1,1.0,992.0,Jimmy Fuetron,6.0
2,2.0,993.0,Cam Nuetron,4.0
0,,,,3
Why is it adding the numbers and columns when I run the read_csv method?
How can I prevent this without adding a seperator?
pandas.read_csv is actuallly not adding the column Unnamed: 0 because it already exists in your .csv (who apparently/probably was generated by the method pandas.DataFrame.to_csv).
You can get rid of this (extra) column by making it as an index :
df = pd.read_csv('file.csv', index_col=0)

How remove numbering from output after extract xls file with pandas [Python]

I have a Python Script that extracts a specific column from an Excel .xls file, but the output has a numbering next to the extracted information, so I would like to know how to format the output so that they don't appear.
My actual code is this:
for i in sys.argv:
file_name = sys.argv[1]
workbook = pd.read_excel(file_name)
df = pd.DataFrame(workbook, columns=['NOM_LOGR_COMPLETO'])
df = df.drop_duplicates()
df = df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
print(df)
My current output:
1 Street Alpha <br>
2 Street Bravo
But the result I need is:
Street Alpha <br>
Street Bravo
without the numbering, just the name of the streets.
Thanks!
I believe you want to have a dataframe without the index. Note that you cannot have a DataFrame without the indexes, they are the whole point of the DataFrame. So for your case, you can adopt:
print(df.values)
to see the dataframe without the index column. To save the output without index, use:
writer = pd.ExcelWriter("dataframe.xlsx", engine='xlsxwriter')
df.to_excel(writer, sheet_name = df, index=False)
writer.save()
where file_name = "dataframe.xlsx" for your case.
Further references can be found at:
How to print pandas DataFrame without index
Printing a pandas dataframe without row number/index
disable index pandas data frame
Python to_excel without row names (index)?

Convert excel file with many sheets (with spaces in the name of the shett) in pandas data frame

I would like to convert an excel file to a pandas dataframe. All the sheets name have spaces in the name, for instances, ' part 1 of 22, part 2 of 22, and so on. In addition the first column is the same for all the sheets.
I would like to convert this excel file to a unique dataframe. However I dont know what happen with the name in python. I mean I was hable to import them, but i do not know the name of the data frame.
The sheets are imported but i do not know the name of them. After this i would like to use another 'for' and use a pd.merge() in order to create a unique dataframe
for sheet_name in Matrix.sheet_names:
sheet_name = pd.read_excel(Matrix, sheet_name)
print(sheet_name.info())
Using only the code snippet you have shown, each sheet (each DataFrame) will be assigned to the variable sheet_name. Thus, this variable is overwritten on each iteration and you will only have the last sheet as a DataFrame assigned to that variable.
To achieve what you want to do you have to store each sheet, loaded as a DataFrame, somewhere, a list for example. You can then merge or concatenate them, depending on your needs.
Try this:
all_my_sheets = []
for sheet_name in Matrix.sheet_names:
sheet_name = pd.read_excel(Matrix, sheet_name)
all_my_sheets.append(sheet_name)
Or, even better, using list comprehension:
all_my_sheets = [pd.read_excel(Matrix, sheet_name) for sheet_name in Matrix.sheet_names]
You can then concatenate them into one DataFrame like this:
final_df = pd.concat(all_my_sheets, sort=False)
You might consider using the openpyxl package:
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(filename=file_path, read_only=True)
all_my_sheets = wb.sheetnames
# Assuming your sheets have the same headers and footers
n = 1
for ws in all_my_sheets:
records = []
for row in ws._cells_by_row(min_col=1,
min_row=n,
max_col=ws.max_column,
max_row=n):
rec = [cell.value for cell in row]
records.append(rec)
# Make sure you don't duplicate the header
n = 2
# ------------------------------
# Set the column names
records = records[header_row-1:]
header = records.pop(0)
# Create your df
df = pd.DataFrame(records, columns=header)
It may be easiest to call read_excel() once, and save the contents into a list.
So, the first step would look like this:
dfs = pd.read_excel(["Sheet 1", "Sheet 2", "Sheet 3"])
Note that the sheet names you use in the list should be the same as those in the excel file. Then, if you wanted to vertically concatenate these sheets, you would just call:
final_df = pd.concat(dfs, axis=1)
Note that this solution would result in a final_df that includes column headers from all three sheets. So, ideally they would be the same. It sounds like you want to merge the information, which would be done differently; we can't help you with the merge without more information.
I hope this helps!

How to prepend new rows at the beginning of an existing csv file?

Assume this is my csv file: (df)
id,name,version,ct_id
1,testing,version1,245
2,testing1,version2,246
3,testing2,version3,247
4,testing3,version4,248
5,testing1,version5,249
Now I've performed some operation on the file and write it to another csv file.
df = pd.read_csv('op.csv')
df1 = df.groupby('name').agg({'version': ', '.join, 'ct_id': 'first'}).reset_index()
df1.to_csv('test.csv', index=False)
Now I've another csv file. (df_1)
id,name,version,ct_id
36,testing17,version16,338
37,testing18,version17,339
I want to write this to my existing test.csv file which I created earlier but I want to insert these two rows at the beginning of the file rather than at the end.
I tried something like this.
df_1.iloc[:, 1:].to_csv('test.csv', mode='a', index=False)
# This does append but at the end.
I would appreciate if someone could help?
Prepending A in B is same as appending B to A.
The below code should work for the above case.
test_df = pd.read_csv('test.csv')
df_1 = pd.read_csv('df_1.csv')
df_1 = df_1.append(test_df, sort=False)
df_1.to_csv('test.csv')

Issue with columns in csv using pandas groupby

I have these below columns in my csv . Usually all these columns have value like below and the code works smoothly .
dec list_namme list device Service Gate
12 food cookie 200.56.57.58 Shop 123
Now I encountered issue, I got one csv file that has all these columns but there is no content for them. Here it looks like..
dec list_namme list device Service Gate
and once the code runs over it , it creates new csv with below columns that was not expected. I got new columns name as index and also , instead of 3(device service Gate) columns I am getting wrong 2.
index Gate
For the csv having contents I didnot faced any issue , even the columns are coming correctly.
Below is the code.
The code is :
if os.path.isfile(client_csv_file):
df=pd.read_csv(csv_file) #Read CSV
df['Gate']=df.Gate.astype(str)
df = df.groupby(['device', 'Service'])['Gate'].apply(lambda x: ', '.join(set(x))).reset_index()
df.to_csv(client_out_file, index=False)
Please help me in this code to fix this.
Performing a groupby on an empty dataframe is resulting in a dataframe without groupby-key columns.
One solution is to test if your dataframe is empty before performing manipulations:
if os.path.isfile(client_csv_file):
df = pd.read_csv(csv_file)
if df.empty:
df = df[['device', 'Service', 'Gate']]
else:
df['Gate'] = df.Gate.astype(str)
df = df.groupby(['device', 'Service'])['Gate']\
.apply(lambda x: ', '.join(set(x))).reset_index()
df.to_csv(client_out_file, index=False)

Categories