I am dealing with tab separated file that contains multiple columns. Each column contain more than ~3000 records.
Column1 Column2 Column3 Column4
1000041 11657 GenNorm albumin
1000043 24249 GenNorm CaBP
1000043 29177 GenNorm calcium-binding protein
1000045 2006 GenNorm tropoelastin
Problem: Using Python, How to read the tab separated file and store each column (with its record) in a single variable. Use "print" to print out a specific column(s)
Preliminary code: I used this code so far to read the tsv file
import csv
Dictionary1 = {}
with open("sample.txt", 'r') as samplefile:
reader = csv.reader(samplefile, delimiter="\t")
I think you're just asking how to "transpose" a CSV file from a sequence of rows to a sequence of columns.
In Python, you can always transpose any iterable of iterables by using the zip function:
with open("sample1.txt") as samplefile:
reader = csv.reader(samplefile, delimiter="\t")
columns = zip(*reader)
Now, if you want to print each column in order:
for column in columns:
print(column)
Here, columns is an iterator of tuples. If you want some other format, like a dict mapping the column names to a list of values, you can transform it easily. For example:
columns = {column[0]: list(column[1:]) for column in columns}
Or, if you want to put them in four separate variables, you can just use normal tuple unpacking:
col1, col2, col3, col4 = columns
But there doesn't seem to be a very good reason to do that.
Not sure the code in python but use this loop. Once you store everything into the dictionary then use this loop then use the function to call the index to print the method you can modify the function to suit what you want the key to be you can pass through a word to search etc
int mainCounter = 0;
int counter1 = 0;
string arrColumn1[3000];
int counter2 = 0;
string arrColumn1[3000];
int counter3 = 0;
string arrColumn1[3000];
int counter4 = 0;
string arrColumn1[3000];
for(int i = 0; i<dictionary.length; ++i){
switch ( mainCounterounter )
{
case 0:
arrColumn1[counter1] = dictionary[i];
++counter1;
++mainCounter;
break;
case 1:
arrColumn2[counter2] = dictionary[i];
++counter2;
++mainCounter;
break;
case 2:
arrColumn3[counter3] = dictionary[i];
++counter3;
++mainCounter;
break;
case 3:
arrColumn4[counter4] = dictionary[i];
++counter4;
mainCounter = 0;
break;
}
}
void printRecordFunction(int colToSearch, string findThis, string arr1[], string arr2[], string arr3[], string arr4[]){
int foundIndex=0;
if(colToSearch == 1){
for(int i = 0; i<arr1.length; ++i){
if(strcmp(arr1[i], findthis)==0){
foundIndex = i;
break;
}
}
}else if(colToSearch == 2){
for(int i = 0; i<arr2.length; ++i){
if(strcmp(arr2[i], findthis)==0){
foundIndex = i;
break;
}
}
}else if(colToSearch == 3){
for(int i = 0; i<arr3.length; ++i){
if(strcmp(arr3[i], findthis)==0){
foundIndex = i;
break;
}
}
}else if(colToSearch == 4){
for(int i = 0; i<arr4.length; ++i){
if(strcmp(arr4[i], findthis)==0){
foundIndex = i;
break;
}
}
}
count<<"Record: " << arr1[i] << " " << arr2[i] << " " << arr3[i] << " " << arr4[i] << endl;
}
Sorry this is all pretty hard code but I hope it gives you some idea and you can adjust it
Related
The problem itself is simple. I have to count the number of occurence of s2 in s1.
And length of s2 is always 2. I tried to implement it with C, but it did not work even though i know the logic is correct. So i tried the same logic in pyhton and it works perfectly. Can someone explain why? Or did i do anything wrong in C. I given both codes below.
C
#include<stdio.h>
#include<string.h>
int main()
{
char s1[100],s2[2];
int count = 0;
gets(s1);
gets(s2);
for(int i=0;i<strlen(s1);i++)
{
if(s1[i] == s2[0] && s1[i+1] == s2[1])
{
count++;
}
}
printf("%d",count);
return 0;
}
Python
s1 = input()
s2 = input()
count = 0
for i in range(0,len(s1)):
if(s1[i] == s2[0] and s1[i+1] == s2[1]):
count = count+1
print(count)
Your python code is actually incorrect, it would raise an IndexError if the last character of s1 matches the first of s2.
You have to stop iterating on the second to last character of s1.
Here is a generic solution working for any length of s2:
s1 = 'abaccabaabaccca'
s2 = 'aba'
count = 0
for i in range(len(s1)-len(s2)+1):
if s2 == s1[i:i+len(s2)]:
count += 1
print(count)
output: 3
First, as others have pointed out, you do not want to use gets(), try using fgets(). Otherwise, your logic is correct but when you read in the input, the new line character will be included in the string.
If you were to input test and es, your strings will contain test\n and es\n (with both respectively containing the null terminating byte \0). Then leads to you searching the string test\n for the substring es\n which it will not find. So you must first remove the new line character from, at least, the substring you want to search for which you can do with strcspn() to give you es.
Once the trailing newline (\n) has been replaced with a null terminating byte. You can search the string for occurances.
#include<stdio.h>
#include<string.h>
int main() {
char s1[100], s2[4];
int count = 0;
fgets(s1, 99, stdin);
fgets(s2, 3, stdin);
s1[strcspn(s1, "\n")] = '\0';
s2[strcspn(s2, "\n")] = '\0';
for(int i=0;i < strlen(s1) - 1;i++) {
if(s1[i] == s2[0] && s1[i+1] == s2[1]) {
count++;
}
}
printf("%d\n",count);
return 0;
}
I want to replace some value in order
for example, below is a sample of xpath
/MCCI_IN200100UV01[#ITSVersion='XML_1.0'][#xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']
/PORR_IN049016UV[r]/controlActProcess[#classCode='CACT']
[#moodCode='EVN']/subject[#typeCode='SUBJ'][1]/investigationEvent[#classCode='INVSTG']
[#moodCode='EVN']/outboundRelationship[#typeCode='SPRT'][relatedInvestigation/code[#code='2']
[#codeSystem='2.16.840.1.113883.3.989.2.1.1.22']][r]/relatedInvestigation[#classCode='INVSTG']
[#moodCode='EVN']/subjectOf2[#typeCode='SUBJ']/controlActEvent[#classCode='CACT']
[#moodCode='EVN']/author[#typeCode='AUT']/assignedEntity[#classCode='ASSIGNED']/assignedPerson[#classCode='PSN']
[#determinerCode='INSTANCE']/name/prefix[1]/#nullFlavor",
and, I would like to extract [r] in order and to replace from [0] to [n] depending on the number of elements.
how can I replace [r] ?
const txt = `/MCCI_IN200100UV01[#ITSVersion='XML_1.0'][#xsi:schemaLocation='urn:hl7-org:v3 MCCI_IN200100UV01.xsd']
/PORR_IN049016UV[r]/controlActProcess[#classCode='CACT']
[#moodCode='EVN']/subject[#typeCode='SUBJ'][1]/investigationEvent[#classCode='INVSTG']
[#moodCode='EVN']/outboundRelationship[#typeCode='SPRT'][relatedInvestigation/code[#code='2']
[#codeSystem='2.16.840.1.113883.3.989.2.1.1.22']][r]/relatedInvestigation[#classCode='INVSTG']
[#moodCode='EVN']/subjectOf2[#typeCode='SUBJ']/controlActEvent[#classCode='CACT']
[#moodCode='EVN']/author[#typeCode='AUT']/assignedEntity[#classCode='ASSIGNED']/assignedPerson[#classCode='PSN']
[#determinerCode='INSTANCE']/name/prefix[1]/#nullFlavor",`;
const count = (txt.match(/\[r\]/g) || []).length; // count occurrences using RegExp
let replacements; // set replacement values in-order
switch (count) {
case 0:
break
case 1:
replacements = ["a"];
break;
case 2:
replacements = ["___REPLACEMENT_1___", "___REPLACEMENT_2___"];
break;
case 3:
replacements = ["d", "e", "f"];
break;
}
let out = txt; // output variable
for (let i = 0; i < count; i++) {
out = out.replace("[r]", replacements[i], 1); // replace each occurrence one at a time
}
console.log(out);
With str.replace(). For example:
>>> 'test[r]test'.replace('[r]', '[0]')
'test[0]test'
Here's the docs on it.
In Java, it's necessary to strip with \r\n, e.g. split( "\r\n") is not splitting my string in java
But is \r\n necessary in Python? Is the following true?
str.strip() == str.strip('\r\n ')
From the docs:
Return a copy of the string with the leading and trailing characters
removed. The chars argument is a string specifying the set of
characters to be removed. If omitted or None, the chars argument
defaults to removing whitespace. The chars argument is not a prefix or
suffix; rather, all combinations of its values are stripped
From this CPython test, str.strip() seems to be stripping:
\t\n\r\f\v
Anyone can point me to the code in CPython that does the string stripping?
Are you looking for these lines?
https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Objects/unicodeobject.c#L12222-L12247
#define LEFTSTRIP 0
#define RIGHTSTRIP 1
#define BOTHSTRIP 2
/* Arrays indexed by above */
static const char *stripfuncnames[] = {"lstrip", "rstrip", "strip"};
#define STRIPNAME(i) (stripfuncnames[i])
/* externally visible for str.strip(unicode) */
PyObject *
_PyUnicode_XStrip(PyObject *self, int striptype, PyObject *sepobj)
{
void *data;
int kind;
Py_ssize_t i, j, len;
BLOOM_MASK sepmask;
Py_ssize_t seplen;
if (PyUnicode_READY(self) == -1 || PyUnicode_READY(sepobj) == -1)
return NULL;
kind = PyUnicode_KIND(self);
data = PyUnicode_DATA(self);
len = PyUnicode_GET_LENGTH(self);
seplen = PyUnicode_GET_LENGTH(sepobj);
sepmask = make_bloom_mask(PyUnicode_KIND(sepobj),
PyUnicode_DATA(sepobj),
seplen);
i = 0;
if (striptype != RIGHTSTRIP) {
while (i < len) {
Py_UCS4 ch = PyUnicode_READ(kind, data, i);
if (!BLOOM(sepmask, ch))
break;
if (PyUnicode_FindChar(sepobj, ch, 0, seplen, 1) < 0)
break;
i++;
}
}
j = len;
if (striptype != LEFTSTRIP) {
j--;
while (j >= i) {
Py_UCS4 ch = PyUnicode_READ(kind, data, j);
if (!BLOOM(sepmask, ch))
break;
if (PyUnicode_FindChar(sepobj, ch, 0, seplen, 1) < 0)
break;
j--;
}
j++;
}
return PyUnicode_Substring(self, i, j);
}
Essentially:
str.strip() == str.strip(string.whitespace) == str.strip(' \t\n\r\f\v') != str.strip('\r\n')
Unless you are explicitly trying to remove ONLY newline characters, str.strip() and str.strip('\r\n') are different.
>>> '\nfoo\n'.strip()
'foo'
>>> '\nfoo\n'.strip('\r\n')
'foo'
>>> '\r\n\r\n\r\nfoo\r\n\r\n\r\n'.strip()
'foo'
>>> '\r\n\r\n\r\nfoo\r\n\r\n\r\n'.strip('\r\n')
'foo'
>>> '\n\tfoo\t\n'.strip()
'foo'
>>> '\n\tfoo\t\n'.strip('\r\n')
'\tfoo\t'
This all seems fine, but note that if there is whitespace (or any other character) between a newline and the start or end of a string, .strip('\r\n') won't remove the newline.
>>> '\t\nfoo\n\t'.strip()
'foo'
>>> '\t\nfoo\n\t'.strip('\r\n')
'\t\nfoo\n\t'
I'm trying to solve a problem where I get the string as input and then delete the duplicate characters of even count.
Input:azxxzyyyddddyzzz
Output: azzz
can you help me with this.
My Attempt is working fine for removing duplicate characters but I'm stuck at how to remove duplicate characters of even count
# Utility function to convert string to list
def toMutable(string):
temp = []
for x in string:
temp.append(x)
return temp
# Utility function to convert string to list
def toString(List):
return ''.join(List)
# Function to remove duplicates in a sorted array
def removeDupsSorted(List):
res_ind = 1
ip_ind = 1
# In place removal of duplicate characters
while ip_ind != len(List):
if List[ip_ind] != List[ip_ind-1]:
List[res_ind] = List[ip_ind]
res_ind += 1
ip_ind+=1
# After above step string is efgkorskkorss.
# Removing extra kkorss after string
string = toString(List[0:res_ind])
return string
# Function removes duplicate characters from the string
# This function work in-place and fills null characters
# in the extra space left
def removeDups(string):
# Convert string to list
List = toMutable(string)
# Sort the character list
List.sort()
# Remove duplicates from sorted
return removeDupsSorted(List)
# Driver program to test the above functions
string = "geeksforgeeks"
print removeDups(string)
Here's an attempt with itertools.groupby. I'm not sure if it can be done with better time complexity.
from itertools import groupby
def rm_even(s):
to_join = []
for _, g in groupby(s):
chars = list(g)
if len(chars) % 2:
to_join.extend(chars)
if to_join == s:
return ''.join(to_join)
return rm_even(to_join)
Demo:
>>> rm_even('azxxzyyyddddyzzz')
>>> 'azzz'
>>> rm_even('xAAAAx')
>>> ''
Count the letters with Counter and remove the ones that have even count:
from collections import Counter
word = 'azxxzyyyddddyzzz'
count = Counter(word) # Counter({'z': 5, 'y': 4, 'd': 4, 'x': 2, 'a': 1})
for key, value in count.items():
if value%2 == 0:
word = word.replace(key, "")
print(word) # 'azzzzz'
def remove_even_dup(string):
spans = []
for idx, letter in enumerate(string):
if not len(spans) or spans[-1][0] != letter:
spans.append((letter, {idx}))
else:
spans[-1][1].add(idx)
# reverse the spans so we can use them as a stack
spans = list(reversed(spans))
visited = []
while len(spans):
letter, indexes = spans.pop()
if len(indexes) % 2 != 0:
visited.append((letter, indexes))
else:
# if we have any previous spans we might need to merge
if len(visited):
prev_letter, prev_indexes = visited[-1]
next_letter, next_indexes = spans[-1]
# if the previous one and the next one have the same letter, merge them
if prev_letter == next_letter:
# remove the old
visited.pop()
spans.pop()
# add the new to spans to be visited
spans.append((letter, prev_indexes | next_indexes))
to_keep = { idx for _, indexes in visited for idx in indexes }
return ''.join(letter for idx, letter in enumerate(string) if idx in to_keep)
I used Collection because it is easy to delete and we have to convert into the string.
import java.util.*;
public class RemoveEvenCount {
public static void main(String[] args) {
//String a="azxxzyyyddddyzzz";
String a="xAAAAx";
ArrayList a2=new ArrayList<>();
for(int i=0;i<a.length();i++)
{
a2.add(a.charAt(i));
}
a2=removeData(a2);
System.out.print(a2);
}
public static ArrayList removeData(ArrayList a2)
{
if(a2.size()==2)
{
if(a2.get(0)==a2.get(1))
return null;
}
for(int i=0;i<a2.size();i++)
{
int count =1;
for(int j=0;j<a2.size()-1;j++)
{
if(a2.get(j)==a2.get(j+1))
{
count++;
}else if(count%2==0)
{
for(int k=0;k<count;k++)
{
a2.remove(j-k);
}
return removeData(a2);
}
Count=1
}
}
return a2;
}
}
For some experiments with syntax highlighting, I create the following raw string in Python 3.6 (please note that the string itself contains a snippet of C-code, but that's not important right now):
myCodeSample = r"""#include <stdio.h>
int main()
{
char arr[5] = {'h', 'e', 'l', 'l', 'o'};
int i;
for(i = 0; i < 5; i++) {
printf(arr[i]);
}
return 0;
}"""
I have noticed that each line ends in a Unix-style \n newline character. But I actually want - for the sake of my experiment - to have every line ending in the Windows-style \r\n newline character. Is there a way to do this elegantly?
just define your string as you're doing, but apply replace on the literal:
myCodeSample = r"""#include <stdio.h>
int main()
{
char arr[5] = {'h', 'e', 'l', 'l', 'o'};
int i;
for(i = 0; i < 5; i++) {
printf(arr[i]);
}
return 0;
}""".replace("\n","\r\n")