How to delete rows (NOT columns) in a csv file - python

I am trying to delete a particular row (NOT a column) in a csv file for a
class project. When I deleted columns I put:
r=row
r[22], r[21]
# and so on
So how do I specify that I want to delete rows? I am working with census data and want to get rid of that extra row of headers that are always in census tables.
Thank you for any help.

Convert your csv reader to a list and slice the appropriate indexes off:
import csv
with open('file.csv', 'rb') as f:
reader = csv.reader(f)
rows = list(reader)[1:] # no more header row

Use pandas, it's so easy to handle data and files with it. Use it to edit your data easily.
You can open your csv file and convert it to a pandas dataframe through.
df = pandas.read_csv('file.csv')
After that you can use this function.
df.drop(df.columns[[0]], axis=1)
In this example I'm deleting the row with index 0.
Pandas documentation

Related

How to write multiple arrays into a csv file

I am trying to write a code to write multiple arrays into one single data frame in panda where I append the data frame row by row . For example I have a row of [1,2,3,4,5] and next row of [6,7,8,9,10].
I want to print it as :
1,2,3,4,5
6,7,8,9,10
in a csv file. I want to write multiple rows like this in single csv file but all codes can be found only for appending a data frame column by column. Can I write this array row by row too?
Please help.
I tried using pandas library but couldn't fine relevant command.
the next code snippet might help you:
import csv
with open('file.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows([[1,2,3], [4,5,6]])

What would be an effective way to merge/join a csv file and txt file based on a common key?

Let's say I have a csv file that is such that:
Dog
Cat
Bird
is a common column with a txt file I have that has two columns of interest:
Cat 8.9
Bird 12.2
Dog 2.1
One column being the identifiers (species name), the other being let's say, speed in mph.
I want to parse through the rows of the csv file, and lookup the species speed from my txt file, and then add it as an extra column to the csv file. Ie, I want to join these two files based on the common species key, and I specifically JUST want to append the speed from the txt (and not other columns that may be in that file) to the csv file. The csv file also has superfluous columns that I would like to keep; but I don't want any of the superfluous columns in the txt file (let's say it had heights, weights, and so on. I don't want to add those to my csv file).
Let's say that this file is very large; I have 1,000,000 species in both files.
What would be the shortest and cleanest way to do this?
Should I write some type of a nested loop, should I use a dictionary, or would Pandas be an effective method?
Note: let's say I also want to compute the mean speed of all the species; ie I want to take the mean of the speed column. Would numpy be the most effective method for this?
Additional note: All the species names are in common, but the order in the csv and txt files are different (as indicated in my example). How would I correct for this?
Additional note 2: This is a totally contrived example since I've been learning about dictionaries and input/output files, and reviewing loops that I wrote in the past.
Note 3: The txt file should be tab seperated, but the csv file is (obviously) comma separated.
You could do everything needed with the built-in csv module. As I mentioned in a comment, it can be used to read the text file since it's tab-delimited (as well as read the csv file and write an updated version of it).
You seem to indicate there are other fields besides the "animal" and "speed" in both files, but the code below assumes they only have one or both of them.
import csv
csv_filename = 'the_csv_file.csv'
updated_csv_filename = 'the_csv_file2.csv'
txt_filename = 'the_text_file.txt'
# Create a lookup table from text file.
with open(txt_filename, 'r', newline='') as txt_file:
# Use csv module to read text file.
reader = csv.DictReader(txt_file, ('animal', 'speed'), delimiter='\t')
lookup = {row['animal']: row['speed'] for row in reader}
# Read csv file and create an updated version of it.
with open(csv_filename, 'r', newline='') as csv_file, \
open(updated_csv_filename, 'w', newline='') as updated_csv_file:
reader = csv.reader(csv_file)
writer = csv.writer(updated_csv_file)
for row in reader:
# Insert looked-up value (speed) into the row following the first column
# (animal) and copy any columns following that.
row.insert(1, lookup[row[0]]) # Insert looked-up speed into column 2.
writer.writerow(row)
Given the two input files in your question, here's the contents of the updated csv file (assuming there were no additional columns):
Dog,2.1
Cat,8.9
Bird,12.2
This is probably most easily achieved with pandas DataFrames.
You can load both your CSV and text file using the read_csv function (just adjust the separator for the text file to tab) and use the join function to join the two DataFrames on the columns you want to match, something like:
column = pd.read_csv('data.csv')
data = pd.read_csv('data.txt', sep='\t')
joined = column.join(data, on='species')
result = joined[['species', 'speed', 'other column you want to keep']]
If you want to conduct more in depth analysis of your data or your files are too large for memory, you may want to look into importing your data into a dedicated database management system like PostgreSQL.
EDIT: If your files don't contain column names, you can load them with custom names using pd.read_csv(file_path, header=None, names=['name1','name2']) as described here. Also, columns can be renamed after loading using dataframe.rename(columns = {'oldname':'newname'}, inplace = True) as seen here.
You can just use the merge() method of pandas like this:
import pandas as pd
df_csv = pd.read_csv('csv.csv', header=None)
df_txt = pd.read_fwf('txt.txt', header=None)
result = pd.merge(df_txt,df_csv)
print(result)
Gives the following output:
0 1
0 Cat 8.9
1 Bird 12.2
2 Dog 2.1

how to import filtered rows before reading csv pandas

Hi I have to upload large number of csv files in pandas dataframe. Can I filter out data from these csv files before loading it so as I dont get any memory error.
I the existing set up it gives me memory error
I have a column Location which has 32 values but I only want 3-4 locations to be filtered before importing.
Is this possible?
You can use the csv library to read line by line and keep only the records you need:
import csv
with open('names.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row['first_name'], row['last_name'])
After that you can save your filtered rows to csv files using writerow

Parsing and saving the rows of excel file using pandas

Have an excel file with a column with some text in each row of this column.
I'm using pandas pd.read_excel() to open this .xlsx file. Then I would like to do the following: I would like to save every row of this column as a distinct .txt file (that would have the text from this row inside this file). Is it possible to be done via pandas?
the basic idea would be to use an iterator to loop over the rows, opening a each file and writing the value in, something like:
import pandas as pd
df = pd.read_excel('test.xlsx')
for i, value in enumerate(df['column']):
with open(f'row-{i}.txt', 'w') as fd:
fd.write(value)

read in csv and changing first value from 'ID' then write csv in python3

I am trying to import a csv, change the first value in the file, and then write the file out to another csv. I am doing this as excel opens the csv files as SYLK format files if 'ID' is in the first value. I therefore intend to change 'ID' to "Value_ID'. I can't figure out how to change the value of s[0][0] = 'Value_ID'. Any help would be greatly appreciated.
with open('input1.csv', 'r') as file1:
reader = csv.reader(file1)
s = ('output1.csv')
filewriter = csv.writer(open(s,'w',newline= '\n'))
for row in reader:
filewriter.writerow(row)
s=[0][0] = 'Match_ID'
You can use pandas for doing this and many more operations quite efficiently and easily.
To install pandas
pip install pandas
This will make sure install all its dependencies as well.
Once this is done, open up the python shell
import pandas as pd
df = pd.read_csv('input1.csv')
new_df = df.set_value(index,col,value)
new_df.to_csv('Output1.csv')
In the above snippet, replace your index with row number and column with the colomn name.
If you are unsure what the row and column names are, type
df.head(5)
This shall give you top 5 rows and coloumns of the Pandas Dataframe.
Happy coding. Cheers!

Categories