importing a csv file with clean columns using pandas? - python

so i'm trying to import this csv file and each value is seperated by a comma but how do i make new rows and columns from the imported data?
I tried importing it as normal and printing the data frame in different ways.

try the same with
df = pd.read_csv('file_name.csv', sep = ',')
this might work

Related

Problem with csv data imported on jupyter notebook

I'm new on this site so be indulgent if i make a mistake :)
I recently imported a csv file on my Jupyter notebook for a student work. I want use some of data of specific column of this file. The problem is that after import, the file appear as a table with 5286 lines (which represent dates and hours of measures) in a single column (that compiles all variables separated by ; that i want use for my work).
I don't know how to do to put this like a regular table.
I used this code to import my csv from my board :
import pandas as pd
data = pd.read_csv('/work/Weather_data/data 1998-2003.csv','error_bad_lines = false')
Output:
Desired output: the same data in multiple columns, separated on ;.
You can try this:
import pandas as pd
data = pd.read_csv('<location>', sep=';')

Unable to parse string quoted csv data using pandas

I am trying to parse this CSV data which has quotes in between in unusual pattern and semicolon in the end of each row.
I am not able to parse this file correctly using pandas.
Here is the link of data (The pastebin was for some reason not recognizing as text / csv so picked up any random formatting please ignore that)
https://paste.gnome.org/pr1pmw4w2
I have tried using the "," as delimiter, and normal call of pandas dataframe object construction by only giving file name as parameter.
header = ["Organization_Name","Organization_Name_URL","Categories","Headquarters_Location","Description","Estimated_Revenue_Range","Operating_Status","Founded_Date","Founded_Date_Precision","Contact_Email","Phone_Number","Full_Description","Investor_Type","Investment_Stage","Number_of_Investments","Number_of_Portfolio_Organizations","Accelerator_Program_Type","Number_of_Founders_(Alumni)","Number_of_Alumni","Number_of_Funding_Rounds","Funding_Status","Total_Funding_Amount","Total_Funding_Amount_Currency","Total_Funding_Amount_Currency_(in_USD)","Total_Equity_Funding_Amount","Total_Equity_Funding_Amount_Currency","Total_Equity_Funding_Amount_Currency_(in_USD)","Number_of_Lead_Investors","Number_of_Investors","Number_of_Acquisitions","Transaction_Name","Transaction_Name_URL","Acquired_by","Acquired_by_URL","Announced_Date","Announced_Date_Precision","Price","Price_Currency","Price_Currency_(in_USD)","Acquisition_Type","IPO_Status,Number_of_Events","SimilarWeb_-_Monthly_Visits","Number_of_Founders","Founders","Number_of_Employees"]
pd.read_csv("data.csv", sep=",", encoding="utf-8", names=header)
First, you can just read the data normally. Now all data would be in the first column. You can use pyparsing module to split based on ',' and assign it back. I hope this solves your query. You just need to do this for all the rows.
import pyparsing as pp
import pandas as pd
df = pd.read_csv('input.csv')
df.loc[0] = pp.commaSeparatedList.parseString(df['Organization Name'][0]).asList()
Output
df #(since there are 42 columns, pasting just a snipped)

Working with csv files that have delimiters

I have to read data from a csv file and I want to convert two columns by making use of one hot encoding.
The csv files data has one column with ';' in between the data (E.g. CITY;MONTH;SALES_AMOUNT). How do I load this in pandas dataframe in separate columns?
Desired result : E.g CITY MONTH SALES_AMOUNT
Instead of: CITY;MONTH;SALES_AMOUNT
You can use the delimiter parameter when reading the CSV file.
import pandas as pd
pd.read_csv('dataset.csv', delimiter = ';')

Split multiple times?

So I'm currently transferring a txt file into a csv. It's mostly cleaned up, but even after splitting there are still empty columns between some of my data.
Below is my messy CSV file
And here is my current code:
Sat_File = '/Users'
output = '/Users2'
import csv
import matplotlib as plt
import pandas as pd
with open(Sat_File,'r') as sat:
with open(output,'w') as outfile:
if "2004" in line:
line=line.split(' ')
writer=csv.writer(outfile)
writer.writerow(line)
Basically, I'm just trying to eliminate those gaps between columns in the CSV picture I've provided. Thank you!
You can use python Pandas library to clear out the empty columns:
import pandas as pd
df = pd.read_csv('path_to_csv_file').dropna(axis=1, how='all')
df.to_csv('path_to_clean_csv_file')
Basically we:
Import the pandas library.
Read the csv file into a variable called df (stands for data frame).
Than we use the dropna function that allows to discard empty columns/rows. axis=1 means drop columns (0 means rows) and how='all' means drop columns all of the values in them are empty.
We save the clean data frame df to a new, clean csv file.
$$$ Pr0f!t $$$

How to convert data from txt files to Excel files using python

I have a text file that contains data like this. It is is just a small example, but the real one is pretty similar.
I am wondering how to display such data in an "Excel Table" like this using Python?
The pandas library is wonderful for reading csv files (which is the file content in the image you linked). You can read in a csv or a txt file using the pandas library and output this to excel in 3 simple lines.
import pandas as pd
df = pd.read_csv('input.csv') # if your file is comma separated
or if your file is tab delimited '\t':
df = pd.read_csv('input.csv', sep='\t')
To save to excel file add the following:
df.to_excel('output.xlsx', 'Sheet1')
complete code:
import pandas as pd
df = pd.read_csv('input.csv') # can replace with df = pd.read_table('input.txt') for '\t'
df.to_excel('output.xlsx', 'Sheet1')
This will explicitly keep the index, so if your input file was:
A,B,C
1,2,3
4,5,6
7,8,9
Your output excel would look like this:
You can see your data has been shifted one column and your index axis has been kept. If you do not want this index column (because you have not assigned your df an index so it has the arbitrary one provided by pandas):
df.to_excel('output.xlsx', 'Sheet1', index=False)
Your output will look like:
Here you can see the index has been dropped from the excel file.
You do not need python! Just rename your text file to CSV and voila, you get your desired output :)
If you want to rename using python then -
You can use os.rename function
os.rename(src, dst)
Where src is the source file and dst is the destination file
XLWT
I use the XLWT library. It produces native Excel files, which is much better than simply importing text files as CSV files. It is a bit of work, but provides most key Excel features, including setting column widths, cell colors, cell formatting, etc.
saving this is:
df.to_excel("testfile.xlsx")

Categories