I have a .csv file that has (45211rows, 1columns).
but i need to create new .scv file with (45211rows, 17columns)
These are the column names
age;"job";"marital";"education";"default";"balance";"housing";"loan";"contact";"day";"month";"duration";"campaign";"pdays";"previous";"poutcome";"y"
I add a screenshot of the .csv file that I already have.
In pandas, the read_csv method has an option for setting the separator, which is , by default. To override, you can:
pandas.read_csv(<PATH_TO_CSV_FILE>, sep=';', header=0)
This will return a new dataframe with the correct format. The header=0 might not be needed, but it will force the returned dataframe to read the first line of the CSV file as column headers
Open the CSV in Excel
Select all the data
Choose the Data tab atop the ribbon.
Select Text to Columns.
Ensure Delimited is selected and click Next.
Clear each box in the Delimiters section and instead choose Semi Colon.
Click Finish.
Related
I have a dataframe with 3 columns, but 1 of the columns contain data that is separated by a semicolon(;) during export. I am trying to export a dataframe into a csv but my csv output data keeps getting separated into the following format when opening in excel:
import pandas as pd
my_dict = { 'name' : ["a", "b"],
'age' : [20,27],
'tag': ["Login Location;Visit Location;Appointment Location", "Login Location;Visit Location;Appointment Location"]}
df=pd.DataFrame(my_dict)
df.to_csv('output.csv',index=False)
print('done')
I would like to have the output in excel to be:
where the data in the tag column is intact. I've tried adding sep=',' or delimiter=',' but it still gives me the same output.
Thank you in advance,
John
Thank you #Alex and #joao for your inputs, this guided me to the right direction. I was able to get the output I needed by forcing excel to use , as the separator. By default, Excel was using tab as the delimiter, that's why it was showing me an incorrect format. Here's the link to forcing excel to use comma as a list separator: https://superuser.com/questions/606272/how-to-get-excel-to-interpret-the-comma-as-a-default-delimiter-in-csv-files
Excel does some stuff based on the fact that your file has a .csvsuffix, probably using ; as a default delimiter, as suggested in the comments.
One workaround is to use the .txt suffix instead:
df.to_csv('output.txt',index=False)
then open the file in Excel, and in the Text Import Wizard specify "Delimited" and comma as separator.
Do not pick the file in the list of previously opened files, if it's there, that won't work, you really need to do File/Open then browse the directory to find your .txt file.
I wanted to use an Excel file in Python, so I converted it to csv. I couldn't read it properly as I usually read csv files, so I read it like a txt file.
I've read each row and appended them to a list, and now I want to make a list with a particular column. When I write my code to retrieve the elements in position "col number", I get a single letter instead of getting a string with the words that are between commas, which I need to put in the column list.
I think it might be a problem from reading the file, but I'm really not sure. Is there a better way to convert an xls to csv for Python, or to read the file into my list? Can I combine letters to make one element out of several elements in a list?
Thank you!
There are no need to change .xlsx to .csv or .txt. you can read excel file in python using pd.read_excel() method.
import pandas as pd
df = pd.read_excel (r'Path where the Excel file is stored\File name.xlsx', sheet_name='your Excel sheet name')
print (df)
Assume we have a file called 'teams.csv'. We want to do the operation below to all the rows in the file 'teams.csv' and return a file with the same name but now with only 3 columns instead of 5. And we also need to name our new column 'sport'. In the file '***' indicate that a person does not play that particular sport.
I have a CSV with the following columns:
And want the CSV file with only 3 cols as shown below
You could use something like this answer to create a list of objects based on the contents of the CSV file, manipulate the data as necessary and then write back to the CSV file.
Sharing the code you have already tried would also be a good idea ;-)
I am trying to code a function where I grab data from my database, which already works correctly.
This is my code for the headers prior to adding the actual records:
with open('csv_template.csv', 'a') as template_file:
#declares the variable template_writer ready for appending
template_writer = csv.writer(template_file, delimiter=',')
#appends the column names of the excel table prior to adding the actual physical data
template_writer.writerow(['Arrangement_ID','Quantity','Cost'])
#closes the file after appending
template_file.close()
This is my code for the records which is contained in a while loop and is the main reason that the two scripts are kept separate.
with open('csv_template.csv', 'a') as template_file:
#declares the variable template_writer ready for appending
template_writer = csv.writer(template_file, delimiter=',')
#appends the data of the current fetched values of the sql statement within the while loop to the csv file
template_writer.writerow([transactionWordData[0],transactionWordData[1],transactionWordData[2]])
#closes the file after appending
template_file.close()
Now once I have got this data ready for excel, I run the file in excel and I would like it to be in a format where I can print immediately, however, when I do print the column width of the excel cells is too small and leads to it being cut off during printing.
I have tried altering the default column width within excel and hoping that it would keep that format permanently but that doesn't seem to be the case and every time that I re-open the csv file in excel it seems to reset completely back to the default column width.
Here is my code for opening the csv file in excel using python and the comment is the actual code I want to use when I can actually format the spreadsheet ready for printing.
#finds the os path of the csv file depending where it is in the file directories
file_path = os.path.abspath("csv_template.csv")
#opens the csv file in excel ready to print
os.startfile(file_path)
#os.startfile(file_path, 'print')
If anyone has any solutions to this or ideas please let me know.
Unfortunately I don't think this is possible for CSV file formats, since they are just plaintext comma separated values and don't support formatting.
I have tried altering the default column width within excel but every time that I re-open the csv file in excel it seems to reset back to the default column width.
If you save the file to an excel format once you have edited it that should solve this problem.
Alternatively, instead of using the csv library you could use xlsxwriter instead which does allow you to set the width of the columns in your code.
See https://xlsxwriter.readthedocs.io and https://xlsxwriter.readthedocs.io/worksheet.html#worksheet-set-column.
Hope this helps!
The csv format is nothing else than a text file, where the lines follow a given pattern, that is, a fixed number of fields (your data) delimited by comma. In contrast an .xlsx file is a binary file that contains specifications about the format. Therefore you may want write to an Excel file instead using the rich pandas library.
You can add space like as it is string so it will automatically adjust the width do it like this:
template_writer.writerow(['Arrangement_ID ','Quantity ','Cost '])
Hi I can export and open the csv file in windows if I do:
y.to_csv('sample.csv').
where y is a pandas dataframe.
However, this output file has an index column. I am able to export the output file to csv by doing:
y.to_csv('sample.csv',index=False)
But when I try to open the file is showing an error message:
"The file format and extension of 'sample.csv' don't match. The file could be corrupted or unsafe. Unless you trust it's source, don't open it. Do you want to open it anyway?"
Sample of y:
Change the name of the ID column. That's a special name that Excel recognizes. If the first cell of the first column of a CSV is ID, Excel will try to interpret the file as another file type. Since when you don't exclude the index, the ID column appears in the second column, it's fine. But when you exclude the index column, ID appears in the first cell of the first column, and Excel gets confused. You can either change the name of the column, keep the index column, or change the order of the columns in the data frame so that the ID column doesn't appear first.