Tuesday, June 30, 2009

Regular expressions quick list

One of the most essential things you need to know as a software engineer or as a software engineering student is regular expressions. This post will brief you about the basics of regular expressions, again remember this is just a layman's view and not a geeks perspective, so read this and move on.....

Metacharacters -------------------------Matches
  1. . -------------------------------Anything except newline character
  2. \n -----------------------------Newline
  3. * ------------------------------Zero or more copies of preceding expression
  4. + -----------------------------One or more occurrence of preceding expression
  5. ? ------------------------------Zero or one copies of preceding expression
  6. ^ -----------------------------Beginning of a line
  7. $ ------------------------------End of line
  8. a|b ---------------------------a or b
  9. (ab)+ -----------------------One or more copies of ab (grouping)
  10. "a+b" -----------------------Literal "a+b"
  11. [] ------------------------------Character class
Examples :-
  1. abc------------------------------------- abc
  2. abc* -----------------------------------ab abc abcc abcc abccc ......
  3. abc+ ----------------------------------abc abcc abccc.....
  4. a(bc)+ -------------------------------abc abcbc abcbcbc....
  5. a(bc)? --------------------------------a abc
  6. [abc] ----------------------------------one of:a,b,c
  7. [a-z] -----------------------------------any letter, a-z
  8. [a\-z] ----------------------------------one of : a,-,z
  9. [-az] ---------------------------------- one of : - , a, z
  10. [a-zA-Z0-9]+ ----------------------one or more alphanumeric characters
  11. [ \t\n]+ -------------------------------whitespace
  12. [^ab] ---------------------------------anything except : a,b
  13. [a^b] ---------------------------------one of : a, ^, b
  14. [a|b] ----------------------------------one of : a, | , b
  15. a|b ------------------------------------one of : a, b
  16. ^[a-z] --------------------------------line starting with a character from a-z
  17. [a-z]$ --------------------------------line ending with a character from a-z
  18. ^[a-z]$ -----------------------------line containing only a character from a-z
Thanking you,
Layman

Monday, June 29, 2009

Linux Commands: Quicklist

    1. Who…….whoami
    2. date
    3. use of ; to separate commands
    4. man
    5. ls………. –F folder /…………. –l line by line
    6. cat Types contents onto screen …… -n option for line no -b for line no excluding blank lines
    7. wc word count -l for line count -w for word count -c for character count
    8. cp copy source file to destination -i ask before overwrite

      -r option for copying directories ……multiple directories may also be specified

      More than one sources can be specified. Last one will be taken as destination

    1. mv ……. –i option is available for preventing the overwrite.

      mv can also be used along with directories(moving)

    1. rm removes the files
    2. rmdir removes a directory
    3. ls –l gives the permissions …………..

    1st 3 charas give owner permissions

    2nd 3 gives group permissions

    3rd 3 gives other privileges

    1. chmod

      expression is either symbolic or octal. In symbolic letters are used. In octal nos from 0-7 are used to set the permissions.

      In symbolic form the syntax is (who) (action) (permission)

Letter Represents

u Owner

g Group

o Other

a All

Symbol Represents

+ Adding permissions to the file

- Removing permission from the file

= Explicitly set the file permissions

Letter Represents

r Read

w Write

x Execute

eg. $ chmod guo+rx *

gives read and execute permissions to group, owner and others(all).More than one set of permissions can be set by separating by commas.

    Octal methods

Read permission value = 4; Write permission value=2; Execute permission value=1;

Adding the value of the required permission will give us a value b/w 0-7. We can use this no to specify the permission for the owner, group, and others in the respective order.

$ chmod 0777 *

Will mean to add rwx permissions to all

$chmod 0770 *

will add rwx permissions to owner and group members whereas others will be prevented from reading , writing or executing the file.

Here * specifies that the change of permission is applicable for all the files in the pwd (present working directory)

  1. chown option user : group files

    chown is used to change the ownership of a file or group of files.

    Option is basically all crap. Don’t worry over that.

    User will be a an existing user . This will specify the new owner of the file(s)

    Group files will be a path. It can be a single file or maybe a directory path.

    In recent systems chgrp is a new command available.

    For portability sake we better skip this option and stick on to chown.

    1. ps

    This command lists all the presently running processes. Each process has a unique id called process id or PID. It will be unique during a runtime. Its usually a 5 digit number. Max value of PID is 32767.

    Each process will have a PID and a Parent PID attached to it. Each process will have a parent process. We will note that most processes have the PPID as the PID of shell.

    With all commands we may run them in foreground or background. If we run smthing on foreground we cannot do any other process until it completes. So to save time we can simply put them on to the background. To run a command as a background process just add a ‘&’ sign at the end of the command. Eg. $ps &

    TIP : stty –a command will give you the list of all key combinations for various functions.

    1. bg %job number

    To move a foreground process to background we can use the ^Z key combination so that the process is suspended and then using the bg command. By default if no job number is specified bg command resumes the most recently suspended process to background.

    1. fg %job number

    To move a suspended process or maybe a background process to foreground we can use this command. Job number maybe specified. In case it’s not supplied, the shell takes it as the last job suspended.

    1. nohup

    This command can be prefixed with any other command. It suppresses the action of the hang up signal and thus causes a command to run even after the user logs out. This command is usually used to run commands in background. The results if not redirected will go to a file called nohup.out . Otherwise it can be suitably redirected

    Eg. $nohup ls & will redirect the op to nohup.out

    1. wait

    wait can be used to put the shell to wait state until background process(es) proceed to completion. The optional parameter maybe a process id , a process number prefixed by % sign , it maybe blank. In case the parameter is blank(absent) the shell waits until all background processes attain completion.

    1. jobs

    This command shows the list of all processes that the user has suspended and also the ones running in the background. In the op we can see – and + signs appear. The job holding + sign will be the most recent one.

    1. kill

    The word says it all. It can kill a process. The parameter maybe a PID or it may be the process no prefixed by the %sign. Actually the kill command sends a TERM signal to the process. The process may ignore this signal or otherwise it can perform an orderly shutdown. In case it ignores we can go for the kill -9 (or kill –KILL) command followed by the parameter so that it will be forcibly killed.

    1. exec

    This command replaces the current process with a new process specified in the command. The real mechanism is overlaying the new process on the older running one.

    Variables

    Scalar variables are available in unix. The variable assignment is fairly simple.

    Variable name=value

    The lexical rules for variable names are same as that of C. In case we need to get the value of a variable we need to prefix the variable name by a $ sign. When the variable name prefixed by the $ sign is parsed, the idea is to textually substitute for the $variable by its value . Use $ along with the variable name only when the value is needed and not anywhere else. When assigning a string better use quotes.

    So far we have seen how to use scalar variables. Now we shall move on to the more organized arrays. The concept of arrays is same as in C. Array initialization is fairly simple.

    Array name[index]=value

    Or

    Array name =(element1 element2 element3 …….elementn)

    Or

    Array name=([index1]=element1 [index5]=element5…….)

    Accessing an array element is a different matter. The syntax will be…

    {Arrayname[index]}

    Unlike C ,here the shell doesn’t keep track of all the positions in an array. Rather it keeps track of those elements that are assigned values. To access all elements in an array 2 options are available.

    {arrayname[*]}

    Or

    {arrayname[@]}

    If we assign a value to the arrayname, its equivalent of assigning to the 0th element in the array.

    We may set a variable as read-only by the following command

    readonly variablename

    Once a variable has been set as readonly, then we can’t overwrite it.

    Just like we declared variables we can unset or rather remove them from the computer’s memory. The command format is shown below. This is same for both arrays as well as scalar variables. But we cannot use this command format to remove variables that have been set readonly.

    unset variablename

    So far we have discussed about variables. Now we shall see the different types of variables.

    • Local variable : Those variables which we had used until now. These are user modifiable and can be set by the user. These are the variables that are currently available within the shell and are not available to other processes of the shell.
    • Environment variable : These are variables that are accessible to every child process of the shell. These are also user-modifiable. Only those environment variables that are required for the proper functioning of the child program is usually reserved.
    • Shell variable : These are special types of variables that are set by the shell (and none else) for the proper functioning of itself. They maybe of either local or environment type.

    Now we will see how to export a variable to the environment.

    export varname=value varname1=value1……….varnamen=valuen

    Substitutions

    There are basically 4 types of substitutions

    • Filename substitution or globbing
    • Value based variable substitution
    • Command substitution
    • Arithmetic substitution

    In globbing we basically use three wildcards to match filenames. They are *(which matches any no of any characters),? (which matches any 1 character), [character list] (which matches any 1 character from the character list). Rules that are to be used in the character list are same as those that we have learnt in Perl (minor differences yet exist .for eg negation is specified by ! and not by ^).

    In value based substitutions, a value is substituted or replaced for a parameter according to certain conditions. There are basically 4 forms shown below.

    ${parameter:-word} Will mean to substitute the word for the parameter in case the parameter is either unset or is null. Here its blind substitution of the word instead of the parameter as in case of a macro and it’s the word that matters, not its value.

    ${parameter:=value}

    ………………………………….

    Command substitution is yet another form. Its fairly simple. The basic idea is that the command itself gets replaced by the output of a command. The usual context of implementation is in the case of assigning an output to a variable.

    Variable=’command’

    Arithmetic substitution basically deals with an expression as shown below. Here all the basic operations like +,-,*,/ exists. Also its possible to change the precedence by using braces. Arithmetic is integer arithmetic. Hence floating point values maybe truncated.

    $((expression))

    Flow control

    If statement:

    If list1

    then

    list2

    elif list3

    then

    list4

    else

    list5

    fi

    if statement finds application in error handling. List1 can be any operation, it maybe a encode operation or maybe a file creation operation. In any case if the operation fails(ie it returns nonzero) list2 is not executed.

    There is yet another representation.

    if list1 ; then list2; elif list3 ;then list4; else list5; fi

    test is a command used in association with if statement. It returns 0 or 1 based on the expression it receives as argument. Syntax is shown below.

    test expression

    or

    [ expression ]

    Now we shall see the various kind of tests.

    • File tests
    • String comparisons
    • Numerical comparisons

    File tests are used for testing files. The syntax is shown below.

    [ option filename ]

    Some of the options are:

    -b if file exists and if it’s a block special file

    -c if the file exists and if it’s a character special file

    -d if the file exists and id it’s a directory

    -e if file exists

    -f if file exists and if it’s a regular file

    -r if file exists and it is readable

    -w if file exists and it is writable

    -x if file exists and its executable

    Now we shall take a look at test operations on strings. There are basically 2 things we would like to find out.

    • Check whether a string is empty
    • Check whether 2 strings are equal

    [ -z string ] checks whether a string is of zero length

    [ -n string ] checks whether a string is of nonzero length

    [ string1=string2 ] if the strings are equal returns 0

    [ string1 != string2 ] if the strings are unequal returns 0

    Test can be done on integers by the following format.

    [ int1 operator int2 ]

    Operator can be:

    -eq equal to

    -le less than or equal to

    -lt less than

    -gt greater than

    -ge greater than or equal to

Here I would like to mention a very important point.

When we try a command in the prompt we get the status of the command immediately. Errors will be immediately reported. Whereas in a shell script this won’t happen. So we must have some provision to check whether the last command was successful.

The exit status of the last command will be stored in a special variable $?. We can simply check this variable for the value 0 to make sure the last command was successful.

Continuing with our discussion, now we will see compound expressions. These include combining simple test expressions into a compound expression. The general form is shown below.

[ expression1 ] operator [ expression2 ]

Operator maybe

-a and

-o or

Its also possible to negate the outcome of an expression by using the following format

[ ! expression ]

Some important points need special mention. The spaces in the formats can’t be ignored. Also all these test expressions are used in association with if expression.

Case statement:

Switch case statements serve the same purpose as in C. The syntax is shown below.

case word in

pattern1)

list1

;;

pattern2)

list2

;;

esac

or

case word in

pattern1) list1 ;;

pattern2) list2 ;;

esac

Here the word is matched with each pattern provided. If it matches the corresponding list is executed.

Each pattern may include the wildcards.

Loops
while loop:

All of us know how a while loop works. The syntax is shown below.

while command

do

    list

done

or

while command ; do list ; done

Command is usually a test expression that we have already dealt with.

List is any set of commands.

It’s possible to nest loops as in any other language.

There exists yet another loop. The until loop runs the list as long as the command(test expression) is false. Syntax is shown below.

until command

do

list

done

for loop:

The for loop is a slightly different concept from the for loop we know. It repeats a set of commands for each item in a list after assigning the item in the list to a variable. This is not the exact functioning of the loop. But for simplicity’s sake we shall look at it that way. Whatever be the loop deals with repeating a set of statements for each value in a list specified. Syntax is shown below.

for name in word1 word2 word3……….wordn

do

list

done

or

for name in word1 word2 word3……wordn ; do list ; done

Here the loop assigns each word to the name and then performs the list for each word.

Select loop:

This loop finds application in menu driven programs. The syntax is shown below.

select name in word1 word2 word3……….wordn

do

list

done

Here the bash prints a numbered list in the order word1…….wordn. Then it waits for the user to input. The input from the user is expected to be a number preceding any item that was displayed. If it is then the corresponding word is stored into the name.

Loop control

There are basically 2 loop control instructions.

  • break This command will break the control out of the loop.
  • continue This command will move the control out of the loop for the current iteration.




Input and output

Output to the terminal can be done 2 ways.

  • echo
  • printf

The usage of printf is the same as that in C.

Output of a command or a list of commands can be redirected to a file using the redirection operator.

Syntax is shown below.

{ command1; command2;…..commandn; } > filename

If no such file exists a new file is created. If it exists the file content is overwritten.

Appending to a file needs another redirection operator. Syntax is shown below.

{ command1; command2; command3; …………commandn; } >> filename

Sometimes it is necessary to write the result of commands onto the screen as well as a file. In that case we can use the following format.

command |tee filename


In input redirection, a text file can be redirected as an input to a command. The syntax is shown below.

command <>

There is yet another possibility.

command <<>

input1

input2

.

.

Inputn

delimiter

Here the redirection operator reads the input from the user until the specified delimiter is met with again. It then feeds the output to the command as input.

Now we will learn how to read an input from the user. The syntax is very simple.

read variable

Pipelining is yet another concept available.

command1 | command2……………

Here the output of command1 is fed to the input of command2 and so on.

File handles

We can associate a number with each filehandle and then use it to do operations on files.

3 file descriptors are always open along with each and every command.

  • STDIN value 0
  • STDOUT value 1
  • STDERR value 2

We can associate a filehandle with a file using the following command.

'exec n> filename'

or

'exec n>>filename'

Here n is any integer. If n is 1 then STDOUT is explicitly redirected to the specified file. In the second form the file is open in append mode.

Now to redirect the output of a command to a file we can use the following format.

'command n> filename' (Append mode is also available)

Also to read the contents from a file to the standard input of a command the following syntax can be used.

command n<>

Here n is the filehandle.

We may redirect STDIN, STDOUT and STDERR to any of the file by simply choosing 0,1,2 as the values for n.

It’s possible to redirect the input of STDIN/STDOUT and STDERR to different files.

'command 1>file1 2>file2'

The /dev/null file requires special mention. Anything redirected to this file will be discarded.

Sometimes we may want to send both STDOUT and STDERR to the same file. In that case the syntax is shown below.

'command > filename 2>&1'

Reading input from a file

Now we shall see how to read input from a file line by line and then use it for text filtering.

while read variablename

do

…………..

…………..

…………

'done <>'


Now let’s move on to the most important section as far as Y! is concerned. Most questions are asked from the text filtering section.









Text Filtering


Head and tail commands

We can extract the first/last n lines of a file by using head and tail commands respectively

head [ -n ] filename

tail [ -n ] filename

n is any integer. Head will return the first n lines and tail returns the last n lines. In case no integer is specified a default of 10 lines will be returned.

Grep command

This is yet another powerful text filtering command. Syntax is shown below.

grep option word filename

Option maybe

-i for case independent match

-v for unmatched set to be returned.

-n for line no’s to be attached

-l for the list of files including the word.

transliterate(tr) command :

This is yet another powerful text filtering command. The function performed by tr is to transliterate one set into another. The syntax is shown below.

tr 'set1' 'set2'

Here the characters included in set1 in the file are searched and replaced by the ones in set2. In case the filename is not specified the shell waits for the input from the keyboard. We must be careful while using characters like [ & ]. These have a special meaning when used along with tr. Thus we must quote them (using backslash) suitably. To capitalize or to do vice versa we can use set1 as 'A-Z' and set2 as 'a-z' .

Sometimes you might feel that all characters of set1 are being mapped to a single character of set2. In fact some versions of tr behaves differently. So to be on the safer side its always better to use the same number of characters in set1 and set2 (just in case tr behaves differently).

There is yet another option available for transliterate. This is the SQUEEZE option. The syntax is shown below.

tr -s 'set1'

Here if the shell finds multiple consecutive occurrences of any character in set1 then it gets replaced by a single occurrence.

Sort and uniq commands :

Sort command as we know sorts each line of input. We can use the tr command to transliterate the each word in the text input to a new line. Then we can sort the lines.

Syntax is shown below.

sort filename for ascending order sort

sort -r filename can be used for reversing the result.
























Text filtering

Sed

Its the short for stream editor ,all the input we feed into it goes through it and finally reaches the STDOUT. It doesn't change the input file.

With each line of the input file specified it makes a copy of the line and then pattern matches the line with the pattern specified. If a match is found then the corresponding action is performed. Then it proceeds to the next pattern and repeats the same procedure. Since the matching is done on the copy of the line and not the real line the changes that are done on the line doesn't affect the real line.

Syntax is shown below.

sed [option] '/pattern1/' action1

'/pattern2/' action2

.......

...................... filenames

patterns are usually regular expressions. We are already familiar with regular expressions. The only major difference is that there is no + meta character here. Although we are already familiar with regular expressions I would like to add a few important points here.

[^chars] matches character set not specified by chars.

^[chars] matches character set that starts with any one specified in chars

[chars]$ matches character set that ends with a character specified in chars.

sed '/pattern/p' file prints the lines that can be matched by pattern

sed '/pattern/d' file prints the lines after deleting those lines that matches pattern

sed '/pattern/s/pattern1/pattern2/' file

First a list of all the lines that match pattern are created then in these lines those characters that match pattern1 are replaced by pattern2. Then the entire lines are printed.

sed 's/pattern1/pattern2/' file

here all those characters that match pattern1 in a line are substituted by pattern2.

But there is a problem with the last 2 cases. Only the first character set in a line that matches the pattern1 is substituted. All other patterns in the same line are not substituted. In case we need to replace all the character sets that match pattern1 then the following syntax is to be used.

sed 's/pattern1/pattern2/g' file

The following syntax replaces the 4th pattern match in a line by the substitute provided.

sed 's/pattern1/pattern2/4' file

Sometimes we may need to replace the a pattern by another only if it doesn't contain a particular pattern. In that case we can use the following syntax.

sed '/pattern1/!s/pattern2/pattern3/g'

Here sed replaces all pattern2 by pattern3 for all the lines that doesn't match pattern1.

Deleting leading white spaces

sed 's/^[ \t]*//g'

Deleting trailing white spaces

sed 's/[ \t]*$//g'


Sometimes it becomes necessary to reuse a value in an expression. Suppose that we need to append a $ sign in front of all the numbers(floating) that is in a textfile. Then we can use the following syntax.

sed 's/[0-9][0-9]*\.[0-9][0-9]*/\$&/'


Its also possible to redirect the output of the sed command to a file by the following pattern.

sed '/pattern/' action filename >destination filename

Sometimes there occur situations in which there is a need to do more than one substitutions. In that case we can use the following syntax to perform the substitution.

sed 's/pattern1/pattern2/g' -e 's/pattern3/pattern4/g' ......................

Here the shell searches for pattern1 and substitutes all occurrences of pattern1 by pattern2 also it does the same thing with pattern3 and 4.

So far we have dealt with the easy ones that sed can give you. Now we will consider pattern space as a buffer. Just imagine that there are 2 buffers pattern buffer that holds the pattern to be printed and hold buffer that holds the current line. With this idea in mind read along.

= To print the current line number

sed = filename

x To exchange contents of pattern and hold

sed '/pattern/x' filename

Initially hold has null and pattern has 1st line(that matches the pattern) then it exchanges their content and then prints the content of pattern. Next pattern gets next pattern matched. Hold has 1st pattern matched. They exchange the content and then prints content of pattern.

n or N To read the next line in to the pattern space

sed '/pattern/n' filename

Here whenever a pattern is matched the next line is read into the pattern buffer.

g or G Copy/append hold buffer to pattern buffer.

sed '/pattern/g' filename

h or H Copy/append to pattern buffer to hold buffer.

sed '/pattern/h' filename

w filename Write the current pattern buffer to file

sed '/pattern/w filename' filename1

r filename Append text read from file

sed '/pattern/r filename' filename1

Double spacing a line that matches pattern

sed '/pattern/G' filename

Double space a file that has already blank lines in it.

sed '/^$/d;G' filename here the blanklines are first removed and then doublespaced

Delete double spacing

sed 'n;d' filename

Insert a blank line above every line that matches a pattern

sed '/pattern/{x;p;x}' filename

Insert a blank line below every line that matches a pattern.

sed '/pattern/G' filename

Insert a blankline above and below a pattern.

sed '/pattern/{x;p;x;G}'

Inserting the line number at the beginning of the every line

sed = filename | sed 'N;s/\n/\t/'

The first part of the command will give us the line nos attached to the lines but the line nos and lines are on separate lines. Hence we append every line (along with \n to the pattern buffer which initially contains the line number. Then we substitute tab for newline.

Inserting line number along with lines but print the linenos only if line is not blank]

sed '/./=' filename | sed 'N;s/\n/\t/'

Count the number of lines in a file

sed -n '$=' filename


Thanking you,

Layman


SEO 3

I haven't yet seen any response for my previous posts. But I'm a self driven person and will keep on writing. So I'd like to give you my third and probably the final post on SEO. In my last post I talked to you about a brute force approach. Here I shall talk to you about a new approach , which is almost the opposite of what you have.

If you have a site on which you have some contents and you want to extract the useful data from the site (typically names of people, technology names ....etc). We shall call them entities. You need to put in some data and you need to get an array of words in the list which has some semantic importance. Have you guys come across any such system ?? . My boss was the first one to recommend me such a system. It is called CALAIS.

www.opencalais.com

You can go in there and find out its functionality.

So my recommendation,
Whenever a new post is made , get all the terms returned by calais and then do a google sktool search for the keywords related to each term.
Get the list returned by Google sktool and then add these terms to the link farm mentioned in the post SEO2
This will help you to build a linkfarm withou much burden on the server.
Thanking you
Layman