I am trying to read a CSV file in Python. Further I want to read my whole file but just don't want first two columns. Also I don't have columns name so that I can easily drop or skip it.
What code do I need to read the file without reading first two columns?
I have tried below code:
with open("data2.csv", "r") as file:
lines = [line.split() for line in file]
for i, x in enumerate(lines):
print("line {0} = {1}".format(i,x))
I am just reading file line by line from above code. But how to skip first two columns and then read the file? I don't have names of the columns.
You should use the csv module in the standard library. You might need to pass additional kwargs (keyword arguments) depending on the format of your csv file.
import csv
with open('my_csv_file', 'r') as fin:
reader = csv.reader(fin)
for line in reader:
print(line[2:])
# do something with rest of columns...
if the lines list does getting the data you want you can use slicing to get rid of the columns you don't want:
getting rid of first two:
lines[2:]
getting rid of last two:
lines[:-2]
with open("data2.csv", "r") as file:
lines = [line.split()[2:] for line in file]
for i, x in enumerate(lines):
print("line {0} = {1}".format(i,x))
Related
I am really new to python and I need to change new artikel Ids to the old ones. The Ids are mapped inside a dict. The file I need to edit is a normal txt where every column is sperated by Tabs. The problem is not replacing the values rather then only replacing the ouccurances in the desired column which is set by pos.
I really would appreciate some help.
def replaceArtCol(filename, pos):
with open(filename) as input_file, open('test.txt','w') as output_file:
for each_line in input_file:
val = each_line.split("\t")[pos]
for row in artikel_ID:
if each_line[pos] == pos
line = each_line.replace(val, artikel_ID[val])
output_file.write(line)`
This Code just replaces any occurance of the string in the text file.
supposed your ID mapping dict looks like ID_mapping = {'old_id': 'new_id'}, I think your code is not far from working correctly. A modified version could look like
with open(filename) as input_file, open('test.txt','w') as output_file:
for each_line in input_file:
line = each_line.split("\t")
if line[pos] in ID_mapping.keys():
line[pos] = ID_mapping[line[pos]]
line = '\t'.join(line)
output_file.write(line)
if you're not working in pandas anyway, this can save a lot of overhead.
if your data is tab separated then you must load this data into dataframe.. this way you can have columns and rows structure.. what you are sdoing right now will not allow you to do what you want to do without some complex and buggy logic. you may try these steps
import pandas as pd
df = pd.read_csv("dummy.txt", sep="\t", encoding="latin-1")
df['desired_column_name'] = df['desired_column_name'].replace({"value_to_be_changed": "newvalue"})
print(df.head())
I have a CSV file with the following data:
Date,Profit/Losses
Jan-10,867884
Feb-10,984655
Mar-10,322013
Apr-10,-69417
May-10,310503
Jun-10,522857
Jul-10,1033096
Aug-10,604885
Sep-10,-216386
Oct-10,477532
Nov-10,893810
Dec-10,-80353
I have imported the file in python like so:
with open(csvpath, 'r', errors='ignore') as fileHandle:
lines = fileHandle.read()
I need to loop through these lines such that I extract just the months i.e. "Jan", "Feb", etc. and put it in a different list. I also have to somehow skip the first line i.e. Date, Profit/Losses which is the header.
Here's the code I wrote I so far:
months = []
for line in lines:
months.append(line.split("-")
When I try to print the months list though, it splits every single character in the file!!
Where am I going wrong here??
You can almost always minimize the pain by using specialized tools, such as the csv module and list comprehension:
import csv
with open("yourfile.csv") as infile:
reader = csv.reader(infile) # Create a new reader
next(reader) # Skip the first row
months = [row[0].split("-")[0] for row in reader]
One answer to your question is to use fileHandle.readlines().
lines = fileHandle.readlines()
# print(lines)
# ['Date,Profit/Losses\n', 'Jan-10,867884\n', 'Feb-10,984655\n', 'Mar-10,322013\n',
# 'Apr-10,-69417\n', 'May-10,310503\n', 'Jun-10,522857\n', 'Jul-10,1033096\n', 'Aug-10,604885\n',
# 'Sep-10,-216386\n', 'Oct-10,477532\n', 'Nov-10,893810\n', 'Dec-10,-80353\n']
for line in lines[1:]:
# Starting from 2nd item in the list since you just want months
months.append(line.split("-")[0])
Try this if you really want to do it the hard way:
months = []
for line in lines[1:]:
months.append(line.split("-")[0])
lines[1:] will skip the first row and line.split("-")[0] will only pull out the month and append to your list months.
However, as suggested by AChampion, you should really look into the csv or pandas packages.
This should deliver desired results (assuming that file named data.csv in same directory):
result = []
with open('data.csv', 'r', encoding='UTF-8') as data:
next(data)
for record in data:
result.append(record.split('-')[0])
I have a .json file where each line is an object. For example, first two lines are:
{"review_id":"x7mDIiDB3jEiPGPHOmDzyw","user_id":"msQe1u7Z_XuqjGoqhB0J5g","business_id": ...}
{"review_id":"dDl8zu1vWPdKGihJrwQbpw","user_id":"msQe1u7Z_XuqjGoqhB0J5g","business_id": ...}
I have tried processing using ijson lib as follows:
with open(filename, 'r') as f:
objects = ijson.items(f, 'columns.items')
columns = list(objects)
However, i get error:
JSONError: Additional data
Its seems due to multiple objects I'm receiving such error.
Whats the recommended way for analyzing such Json file in Jupyter?
Thank You in advance
The file format is not correct if this is the complete file. Between the curly brackets there must be a comma and it should start and end with a square bracket. Like so: [{...},{...}]. For your data it would look like:
[{"review_id":"x7mDIiDB3jEiPGPHOmDzyw","user_id":"msQe1u7Z_XuqjGoqhB0J5g","business_id": ...},
{"review_id":"dDl8zu1vWPdKGihJrwQbpw","user_id":"msQe1u7Z_XuqjGoqhB0J5g","business_id": ...}]
Here is some code how to clean your file:
lastline = None
with open("yourfile.json","r") as f:
lineList = f.readlines()
lastline=lineList[-1]
with open("yourfile.json","r") as f, open("cleanfile.json","w") as g:
for i,line in enumerate(f,0):
if i == 0:
line = "["+str(line)+","
g.write(line)
elif line == lastline:
g.write(line)
g.write("]")
else:
line = str(line)+","
g.write(line)
To read a json file properly you could also consider using the pandas library (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html).
import pandas as pd
#get a pandas dataframe object from json file
df = pd.read_json("path/to/your/filename.json")
If you are not familiar with pandas, here a quick headstart, how to work with a dataframe object:
df.head() #gives you the first rows of the dataframe
df["review_id"] # gives you the column review_id as a vector
df.iloc[1,:] # gives you the complete row with index 1
df.iloc[1,2] # gives you the item in row with index 1 and column with index 2
While each line on it's own is valid JSON, your file as a whole is not. As such, you can't parse it in one go, you will have to iterate over each line parse it into an object.
You can aggregate these objects in one list, and from there do whatever you like with your data :
import json
with open(filename, 'r') as f:
object_list = []
for line in f.readlines():
object_list.append(json.loads(line))
# object_list will contain all of your file's data
You could do it as a list comprehension to have it a little more pythonic :
with open(filename, 'r') as f:
object_list = [json.loads(line)
for line in f.readlines()]
# object_list will contain all of your file's data
You have multiple lines in your file, so that's why it's throwing errors
import json
with open(filename, 'r') as f:
lines = f.readlines()
first = json.loads(lines[0])
second = json.loads(lines[1])
That should catch both lines and load them in properly
I am extremely new to python(coding, for that matter).
Could I please get some help as to how can I achieve this. I have gone through numerous threads but nothing helped.
My input file looks like this:
I want my output file to look like this:
Just replication of the first column, twice in the second excel sheet. With a line after every 5 rows.
A .csv file can be opened with a normal text editor, do this and you'll see that the entries for each column are comma-separated (csv = comma separated values). Most likely it's semicolons ;, though.
Since you're new to coding, I recommend trying it manually with a text editor first until you have the desired output, and then try to replicate it with python.
Also, you should post code examples here and ask specific questions about why it doesn't work like you expected it to work.
Below is the solution. Don't forget to configure input/output files and the delimiter:
input_file = 'c:\Temp\input.csv'
output_file = 'c:\Temp\output.csv'
delimiter = ';'
i = 0
output_data = ''
with open(input_file) as f:
for line in f:
i += 1
output_data += line.strip() + delimiter + line
if i == 5:
output_data += '\n'
i = 0
with open(output_file, 'w') as file_:
file_.write(output_data)
Python has a csv module for doing this. It is able to automatically read each row into a list of columns. It is then possible to simply take the first element and replicate it into the second column in an output file.
import csv
with open('input.csv', 'rb') as f_input:
csv_input = csv.reader(f_input)
input_rows = list(csv_input)
with open('output.csv', 'wb') as f_output:
csv_output = csv.writer(f_output)
for line, row in enumerate(input_rows, start=1):
csv_output.writerow([row[0], row[0]])
if line % 5 == 0:
csv_output.writerow([])
Note, it is not advisable to write the updated data directly over the input file as if there was a problem you would lose your original file.
If your input file has multiple columns, this script will remove them and simple duplicate the first column.
By default, the csv format separates each column using a comma, this can be modified by specifying a desired delimiter as follows:
csv_output = csv.writer(f_output, delimiter=';')
I have two CSV files and I want python to open file1.csv and read line 7 from that file and look for that same binary code on the WHOLE file2.csv.
This is what I have so far but it does not work:
import csv
a = open('file1.csv','r').readline[7]
with open('file2.csv') as infile:
for row in csv.reader(infile):
if row[1:] == a: # This part is fine because i want to skip the first row
print row[0], ','.join(row[1:])
Looks like you need to read up on how the python csv library works :) You might also want to read up on how list slicing works. I'll try to help you based on what I understood about your problem.
I have the same question that #oliver-w had but I'll just assume your 'csv' files have only one column.
import csv
with open('file1.csv', 'r') as file1:
# this is the value you will be searching for in file2.csv
# you might need to change this to [6] if there is no header row in file1.csv
val = list(csv.reader(file1))[7]
with open('file2.csv', 'r') as file2:
reader = csv.reader(file2)
reader.next() # this skips the first row of the file
# this iteration will start from the second row of file2.csv
for row in reader:
if row[0] == val:
# your question doesn't clarify what your actual purpose is
# so i don't know what should be here