Read a csv file in python correctly using pandas - python

I am trying to read this file using read_csv in pandas(python).
But I am not able to capture all columns.
Can you help?
Here is the code:
file = r'path of file'
df = pd.read_csv(file, encoding='cp1252', on_bad_lines='skip')
Thank you

I tried to read your file, and I first noticed that the encoding you specified does not correspond to the one used in your file. I also noticed that the separator is not a comma (,) but a tab (\t).
First, to get the file encoding (in linux), you just need to run:
$ file -i kopie.csv
kopie.csv: text/plain; charset=utf-16le
In Python:
import pandas as pd
path_to_file = 'kopie.csv'
df = pd.read_csv(path_to_file, encoding='utf-16le', sep='\t')
And when I print the shape of the loaded dataframe:
>>> df.shape
(869, 161)

Related

How to pass custom column names to pandas dataframe with Latin1 character encoding?

Goal
I have multiple csv files with latin1 encoding. I want to load them into dataframes with custom colum names, concatenate them, and write them to a new large latin1 csv file that I then import into MSSQL.
Issue
Pandas reads & writes the data just fine, and MSSQL loads the data fine as well. However, when I import the file with MSSQL, column names are removed because of "invalid characters":
How can I make sure not only the data, but also the custom column names are encoded correctly? I have tried passing column names with str.encode(encoding='latin1'), to no avail.
Code
import os
import pandas as pd
cols = ["name", "name2", "etc"]
dfs = []
for folder in dir:
root, _, files = next(os.walk(path_to_dir))
dfs += [pd.read_csv(path_to_dir, names=cols, encoding='latin1') for file in files]
pd.concat(dfs).to_csv(path_to_file, index=False, encoding='latin1')
You can apply unidecode to pandas.DataFrame.columns to get rid of punctuation and/or symbols.
Try this :
#pip install unidecode
from unidecode import unidecode
df = pd.concat(dfs)
df.columns = list(map(lambda x: unidecode(x), df.columns))
df.to_csv(path_to_file, index=False, encoding='latin1')

Pandas : problems when converting .txt to .csv

I am trying to use tables in pandas.
The original data look like that (.txt file):
µm nm
1.34E+00 1.39E+00
1.34E+00 1.61E+00
...
When I manually convert the file from .txt to .csv, by opening it in excel and saving as a .csv file, I obtain something like that:
µm;nm
1.339216;1.388997
1.340324;1.612847
1.341462;1.587352
1.342533;1.686544
...
Which is working fine in pandas, using the following code:
file =('filename.csv')
df = pd.read_csv(file, sep = ";")
df
dataframe from manually obtained .csv file
Which is what I want. But since I am planning to deal with a lot of those files, I need to process them as batch. So I need to obtain the same dataframe from the original files, which come as .txt.
But if I try to do that from the original data, it looks like this:
enter image description here
The code is as follows:
df2 = pd.read_csv('filename.txt', sep = ";", encoding = 'unicode_escape')
df2.to_csv('filename-2.csv', sep='\t', index=None)
df2
Please note that I use the 'unicode_escape' value to avoid the error message "utf-8' codec can't decode byte 0xb5 in position 0: invalid start byte"
I tried to specify various separators, but without success so far.
I hope someone will be able to help.
Thanks,
Sébastien.
While read df2, you should use the same separater as the one given while writing it (assuming it is written from df.
df2 = pd.read_csv('filename.txt', sep = ';')
Or if .txt is a separate file altogether,
df2 = pd.read_csv('filename.txt', sep = '\t')
This should give correctly formatted dataframe.

Python pandas create datafrane from csv embeded within a web txt file

I am trying to import CSV formatted data to Pandas dataframe. The CSV data is located within a .txt file the is located at a web URL. The issue is that I only want to import a part (or parts) of the .txt file that is formatted as CSV (see image below). Essentially I need to skip the first 9 rows and then import rows 10-16 as CSV.
My code
import csv
import pandas as pd
import io
url = "http://www.bom.gov.au/climate/averages/climatology/windroses/wr15/data/086282-3pmMonth.txt"
df = pd.read_csv(io.StringIO(url), skiprows = 9, sep =',', skipinitialspace = True)
df
I get a lengthy error msg that ultimately says "EmptyDataError: No columns to parse from file"
I have looked at similar examples Read .txt file with Python Pandas - strings and floats but this is different.
The code above attempts to read a CSV file from the URL itself rather than the text file fetched from that URL. To see what I mean take out the skiprows parameter and then show the data frame. You'll see this:
Empty DataFrame
Columns: [http://www.bom.gov.au/climate/averages/climatology/windroses/wr15/data/086282-3pmMonth.txt]
Index: []
Note that the columns are the URL itself.
Import requests (you may have to install it first) and then try this:
content = requests.get(url).content
df = pd.read_csv(io.StringIO(content.decode('utf-8')),skiprows=9)

How to convert data from txt files to Excel files using python

I have a text file that contains data like this. It is is just a small example, but the real one is pretty similar.
I am wondering how to display such data in an "Excel Table" like this using Python?
The pandas library is wonderful for reading csv files (which is the file content in the image you linked). You can read in a csv or a txt file using the pandas library and output this to excel in 3 simple lines.
import pandas as pd
df = pd.read_csv('input.csv') # if your file is comma separated
or if your file is tab delimited '\t':
df = pd.read_csv('input.csv', sep='\t')
To save to excel file add the following:
df.to_excel('output.xlsx', 'Sheet1')
complete code:
import pandas as pd
df = pd.read_csv('input.csv') # can replace with df = pd.read_table('input.txt') for '\t'
df.to_excel('output.xlsx', 'Sheet1')
This will explicitly keep the index, so if your input file was:
A,B,C
1,2,3
4,5,6
7,8,9
Your output excel would look like this:
You can see your data has been shifted one column and your index axis has been kept. If you do not want this index column (because you have not assigned your df an index so it has the arbitrary one provided by pandas):
df.to_excel('output.xlsx', 'Sheet1', index=False)
Your output will look like:
Here you can see the index has been dropped from the excel file.
You do not need python! Just rename your text file to CSV and voila, you get your desired output :)
If you want to rename using python then -
You can use os.rename function
os.rename(src, dst)
Where src is the source file and dst is the destination file
XLWT
I use the XLWT library. It produces native Excel files, which is much better than simply importing text files as CSV files. It is a bit of work, but provides most key Excel features, including setting column widths, cell colors, cell formatting, etc.
saving this is:
df.to_excel("testfile.xlsx")

How to fetch input from the csv file in python

I have a csv (input.csv) file as shown below:
VM IP Naa_Dev Datastore
vm1 xx.xx.xx.x1 naa.ab1234 ds1
vm2 xx.xx.xx.x2 naa.ac1234 ds1
vm3 xx.xx.xx.x3 naa.ad1234 ds2
I want to use this csv file as an input file for my python script. Here in this file, first line i.e. (VM IP Naa_Dev Datastore) is the column heading and each value is separated by space.
So my question is how we can use this csv file for input values in python so if I need to search in python script that what is the value of vm1 IP then it should pickup xx.xx.xx.x1 or same way if I am looking for VM which has naa.ac1234 Naa_Dev should take vm2.
I am using Python version 2.7.8
Any help is much appreciated.
Thanks
Working with tabular data like this, the best way is using pandas.
Something like:
import pandas
dataframe = pandas.read_csv('csv_file.csv')
# finding IP by vm
print(dataframe[dataframe.VM == 'vm1'].IP)
# OUTPUT: xx.xx.xx.x1
# or find by Naa_Dev
print(dataframe[dataframe.Naa_Dev == 'xx.xx.xx.x2'].VM)
# OUTPUT: vm2
For importing csv into python you can use pandas, in your case the code would look like:
import pandas as pd
df = pd.read_csv('input.csv', sep=' ')
and for locating certain rows in created dataframe you can multiple options (that you can easily find in pandas or just by googling 'filter data python'), for example:
df['VM'].where(df['Naa_Dev'] == 'naa.ac1234')
Use the pandas module to read the file into a DataFrame. There is a lot of parameters for reading csv files with pandas.read_csv. The dataframe.to_string() function is extremely useful.
Solution:
# import module with alias 'pd'
import pandas as pd
# Open the CSV file, delimiter is set to white space, and then
# we specify the column names.
dframe = pd.read_csv("file.csv",
delimiter=" ",
names=["VM", "IP", "Naa_Dev", "Datastore"])
# print will output the table
print(dframe)
# to_string will allow you to align and adjust content
# e.g justify = left to align columns to the left.
print(dframe.to_string(justify="left"))
Pandas is probably the best answer but you can also:
import csv
your_list = []
with open('dummy.csv') as csvfile:
reader = csv.DictReader(csvfile, delimiter=' ')
for row in reader:
your_list += [row]
print(your_list)

Categories