genShortGenomeSeq

Syntax

genShortGenomeSeq(X, window)

Alias: genSGS

Arguments

X is a STRING scalar or CHAR vector.

window is a positive integer in [2,28].

Details This function slides a window of fixed size (based on the number of characters) over the input DNA sequence. It encodes the characters in each window and returns an integral vector containing the encoded values. The returned vector has the same length as the number of characters in X.

Note:

  • This function adopts a forward sliding window approach, starting from the first character of the sequence. The sliding window moves by one character at a time. It first takes the current character, then the next character, continuing until window characters are included.

  • If window exceeds the total length of X, an empty integral vector is returned.

Return Value:

window Range Return Type
[2,4] FAST SHORT VECTOR
[5,12] FAST INT VECTOR
[13,28] FAST LONG VECTOR

Examples

$ genShortGenomeSeq("NNNNNNNNTCGGGGCAT",3)
[,,,,,,,,795,815,831,831,830,824,801,,]

$ genShortGenomeSeq("TCGGGGCATNGCCCG",4)
[1135,1215,1279,1278,1272,1249,,,,,1258,1195,,,]

$ genShortGenomeSeq("GCCCGATNNNNN",6)
[396972,395953,,,,,,,,,,]
$ genShortGenomeSeq("TCGATCGTCGATCGTCGATCGTCGATCGG",5)
[328113,328390,328475,327789,328118,328411,328556,328113,328390,328475,327789,328118,328411,328556,328113,328390,328475,327789,328118,328411,328556,328113,328390,328475,327791,,,,]
$ genShortGenomeSeq("ACTT",8)
[,,,]

Related functions: encodeShortGenomeSeq, decodeShortGenomeSeq