Python create dict from CSV and use the file name as key - python

I have a simple CSV text file called "allMaps.txt". It contains the following:
cat, dog, fish
How can I take the file name "allMaps" and use it as a key, along with the contents being values?
I wish to achieve this format:
{"allMaps": "cat", "dog", "fish"}
I have a range of txt files all containing values separated by commas so a more dynamic method that does it for all would be beneficial!
The other txt files are:
allMaps.txt
fiveMaps.txt
tenMaps.txt
sevenMaps.txt
They all contain comma separated values. Is there a way to look into the folder and convert each one on the text files into a key-value dict?

Assuming you have the file names in a list.
files = ["allMaps.txt", "fiveMaps.txt", "tenMaps.txt", "sevenMaps.txt"]
You can do the following:
my_dict = {}
for file in files:
with open(file) as f:
items = [i.strip() for i in f.read().split(",")]
my_dict[file.replace(".txt", "")] = items
If the files are all in the same folder, you could do the following instead of maintaining a list of files:
import os
files = os.listdir("<folder>")

Given the file names, you can create a dictionary where the key stores the filenames with a corresponding value of a list of the file data:
files = ['allMaps.txt', 'fiveMaps.txt', 'tenMaps.txt', 'sevenMaps.txt']
final_results = {i:[b.strip('\n').split(', ') for b in open(i)] for i in files}

Related

Making a dictionary from files in which keys are filenames and values are strings with specific character

so my problem is - I have proteomes in FASTA format, which look like this:
Name of the example file:
GCA_003547095.1_protein.faa
Contents:
>CAG77607.1
ABCDEF
>CAG72141.1
CSSDAS
And I also have files that contain just names of the proteins, i.e.:
Filename:
PF00001
Contents:
CAG77607.1
CAG72141.1
My task is to iterate through proteomes using list of proteins to find out how many proteins are in each proteome. PE told me that it should be a dictionary made from filenames of proteomes as keys and sequence names after ">" as values.
My approach was as follows:
import pandas as pd
file_names = open("proteomes_list").readlines()
d = {x: pd.read_csv("/proteomes/" + "GCA_003547095.1_protein.faa").columns.tolist() for x in file_names}
print (d)
As You can see I've made proteome filenames into list (using simple bash "ls", these are ONLY names of proteomes) and then creating dictionary with sequence names as values - unfortunetly each proteome (including the tested proteome) has only one value.
I will be grateful if You could shed some light on my case.
My goal was to make dictionary where key would be i.e. GCA_003547095.1_protein.faa and value i.e. CAG77607.1, CAG72141.1.
Is this the output you expect. This function should iterate over your file and grab the fasta file header or the name of the proteins that are expected in the file. Here is a quick function that can create a list of the fasta header.
You can create the dictionary you mentioned buy iterating over the file names and update the parent dictionary
import os
def extract_proteomes(folder: str, filename: str) -> list[str]:
with open(os.path.join(folder, filename), mode='r') as file:
content: str = file.read().split('\n')
protein_names = [i[1:] for i in content if i.startswith('>')]
if not protein_names:
protein_names = [i for i in content if i]
return protein_names
folder = "/Users/user/Downloads/"
files = ["GCA_003547095.1_protein.faa", "PF00001"]
d = {}
for i in files:
d.update({i: extract_proteomes(folder=folder, filename=i)})

Creating runtime variable in python to fetch data from dictionary object

I have created dictionary object my parsing a json file in python....lets assume the data is as follows
plants = {}
# Add three key-value tuples to the dictionary.
plants["radish"] = {"color":"red", "length":4}
plants["apple"] = {"smell":"sweet", "season":"winter"}
plants["carrot"] = {"use":"medicine", "juice":"sour"}
This could be a very long dictionary object
But at runtime, I need only few values to be stored in a commaa delimited csv file.....The list of properties desired is in a file....
e.g
radish.color
carrot.juice
So, how would I create in python an app, where I can created dynamic variables such as below to get data of the json object & create a csv file....
at runtime i need variable
plants[radish][color]
plants[carrot][juice]
Thank you to all who help
Sanjay
Consider parsing the text file line by line to retrieve file contents. In the read, split the line by period which denotes the keys of dictionaries. From there, use such a list of keys to retrieve dictionary values. Then, iteratively output values to csv, conditioned by number of items:
Txt file
radish.color
carrot.juice
Python code
import csv
plants = {}
plants["radish"] = {"color":"red", "length":4}
plants["apple"] = {"smell":"sweet", "season":"winter"}
plants["carrot"] = {"use":"medicine", "juice":"sour"}
data = []
with open("Input.txt", "r") as f:
for line in f:
data.append(line.replace("\n", "").strip().split("."))
with open("Output.csv", "w") as w:
writer = csv.writer(w, lineterminator = '\n')
for item in data:
if len(item) == 2: # ONE-NEST DEEP
writer.writerow([item[0], item[1], plants[item[0]][item[1]]])
if len(item) == 3: # SECOND NEST DEEP
writer.writerow([item[0], item[1], item[2], plants[item[0]][item[1]][item[2]]])
Output csv
radish,color,red
carrot,juice,sour
(Note: the deeper the nest, the more columns will output conflicting with key/value pairs across columns -maybe output different structured csv files like one-level files/second-level files)

Loop through files in a subdirectory, append filenames as dictionary keys

I have a directory of text files in a subdirectory, /directory_name/. I would like to loop through all files within this subdirectory, and append the filename as a string as the dictionary key. Then I would like to take the first word in each text file, and use this as the value for each key. I'm a bit stuck on this part:
import os
os.path.expanduser(~/directory_name/) # go to subdirectory
file_dict = {} # create dictionary
for i in file_directory:
file_dict[str(filename)] = {} # creates keys based on each filename
# do something here to get the dictionary values
Is it possible to do this in two separate steps? That is, create dictionary keys first, then do an operation on all text files to extract the values?
To change directories, use os.chdir(). Assuming the first word in each file is followed by a space,
import os
file_dict = {} # create a dictionary
os.chdir(os.path.join(os.path.expanduser('~'), directory_name))
for key in [file for file in os.listdir(os.getcwd()) if os.path.isfile(file)]:
value = open(key).readlines()[0].split()[0]
file_dict[key] = value
works for me. And if you really want to do it in two steps,
import os
os.chdir(os.path.join(os.path.expanduser('~'), directory_name))
keys = [file for file in os.listdir(os.getcwd()) if os.path.isfile(file)] # step 1
# Do something else in here...
values = [open(key).readlines()[0].split()[0] for key in keys] # step 2
file_dict = dict(zip(keys, values)) # Map values onto keys to create the dictionary
gives the same output.

Extract number from file name in python

I have a directory where I have many data files, but the data file names have arbitrary numbers. For example
data_T_1e-05.d
data_T_7.2434.d
data_T_0.001.d
and so on. Because of the decimals in the file names they are not sorted according to the value of the numbers. What I want to do is the following:
I want to open every file, extract the number from the file name, put it in a array and do some manipulations using the data. Example:
a = np.loadtxt("data_T_1e-05.d",unpack=True)
res[i][0] = 1e-05
res[i][1] = np.sum[a]
I want to do this for every file by running a loop. I think it could be done by creating an array containing all the file names (using import os) and then doing something with it.
How can it be done?
If your files all start with the same prefix and end with the same suffix, simply slice and pass to float():
number = float(filename[7:-2])
This removes the first 7 characters (i.e. data_T_) and the last 2 (.d).
This works fine for your example filenames:
>>> for example in ('data_T_1e-05.d', 'data_T_7.2434.d', 'data_T_0.001.d'):
... print float(example[7:-2])
...
1e-05
7.2434
0.001
import os
# create the list containing all files from the current dir
filelistall = os.listdir(os.getcwd())
# create the list containing only data files.
# I assume that data file names end with ".d"
filelist = filter(lambda x: x.endswith('.d'), filelistall)
for filename in filelist:
f = open(filename, "r")
number = float(filename[7:-2])
# and any other code dealing with file
f.close()

Parsing CSV file (Python)

I have a CSV file with the following format:
"SHA-1","MD5","CRC32","FileName","FileSize","ProductCode","OpSystemCode"
Basically what I'm looking to do in Python 2.x is read the file and if within the filename column, any files exist with a specified file extension from a list, the data from the MD5 hash column is parsed out into a text document.
So my pseudo code is looking like this:
list = [.doc,.xls,.ppt]
with open(new.csv) as new_f:
with open(x.csv) as old_f:
x = f.readlines()
if list in x:
# *copy out the value from the MD5 value column to new.csv*
I just don't know how to extract the MD5 hash.
Any suggestions?
Create one list for the MD5-Hash and one for the filename, if the list is in the item of the filename list, save the index and use it for your MD5 list (since you've got a table the index has to be the same)
Solution identified:-
import csv
results = []
filetypes = ['jpg','bmp','jpeg','mov','mp4','avi','wmv','wav','tif','gif','png']
reader = csv.reader(open('c:\users\me\Desktop\x.csv'))
for extension in filetypes:
for line in reader: # iterate over the lines in the csv
if extension in line[3]:
print line[1] + "\t" + line[3]

Categories