GT's Blog

Myths and Facts About Programming

Myths and Facts About Programming - Stuff that I wish I knew in my early career

What's this?

A collection of common myths and facts (opinionated) about computer programming that I wish I knew in my early career.

Programming requires math

Neutral.
Only a few percentages of programmers deal with math problems in their careers.
Analytical skills help to break down the problem. Think of programming as understanding the problem, breaking down into smaller steps, and solving it. Similar to math right?
However, people who are bad at Math can be a good programmer. It also depends on the type of role and type of problem they are trying to solve.

Programming job is similar to a typist. It's all about typing code.

False
Programming(at entry level) is about:
- reading documentation and requirements
- documenting stuff
- thinking how to write code
- writing code
- testing
- debugging bugs
- deploying
- discuss with team member/management
The amount of time you spend typing code depends on your role and job description. There will be days you won't be typing any code.
Majority of programming job requires maintaining an existing system written over the years by several people. You will be required to add features, customize, fix bugs, etc

You won't require a college degree to be a programmer

Everyone can learn and be a programmer within months

Programming is really hard

Neutral.
It depends on the individual, their learning/intellectual capability, and the type of programming role they learn/get into.
There will be certain things you can learn easily. But a college degree will help to broaden your perspective and learn things quickly.

Programming is monotonous. Its like working in the assembly line at a factory

False
On certain days or working in the same role for a long time, you may get a feel of your job being monotonous.
But it's not like working in the assembly line. It requires lots of thinking and analysis.

Programming is not for girls

False

You need to keep reading new stuff throughout your career

Neutral
You don't "need to". But learning new stuff helps advance your career.
Also, it depends on the type of tool and technologies you are using. Some tools/technology (eg: JS Frameworks) get deprecated every few years. Sometimes
Learning a new paradigm, best practices, new architecture concepts is always useful.

Machine Learning and AI seems easy to learn.

I don't have any knowledge of statistics/probability/modeling. However, the ML/AI tutorial I found online is just 10 lines of code and it seems easy.

False
It may seem easy to use ML/AI tools created by somebody else or follow a cookbook. But you will need to understand many concepts to use those tools when solving real problems. Don't get intimidated by simple tutorials. Start by the basics and dig into the tools.

Using long variable makes program slow. So I should program like this:

int a = read()
int b = 1000
if(a > 18 && b > 50)
    println("Entry allowed")

False
With compiled languages, no. With interpreted languages, possibly but the difference would be negligible.
Always focus on readability. Compare the above code with the following:

int age = readMemberAge()
int balance = 1000
if(age > 18 && balance > 50)
    println("Entry allowed")

I have to learn as many programming languages eg C, Python, Java, Ruby, Kotlin, Scala, Groovy, C#, Go to be a good programmer.

False
Think of programming language as natural language eg: Nepali, French, English, Japanese, and Chinese. And the art of writing a novel or poem as the actual programming. If you mastered five languages but do not have a skill of writing a (good) poem in either of those you are still not an artist.
Think of programming as art. Try to be an artist in at least one language. Think of a hobby project and develop with paying attention to code quality, performance, UI, features, etc.
Focus on learning programming rather than learning a language.
- Programming is a skill that you can gain with just one language. If you know how to do X in Y language then you can do it in Z language too with little effort.

HackerRank, LeetCode will guarantee me a job

False
There's no doubt that the questions on those sites help you think critically and solve a problem.
Its a widely used screening method to filter our candidates these days.
Pet project(s) and your college projects will also help you land the first job.

Google, Amazon and Facebook are using X tool. It must be good so I should learn.

False
A lot of tools developed by tech-giants are being deprecated after a couple of years.
Looks for a tool/language/framework that's being used by a lot of companies for a long time.

X was developed by Google, Amazon, and Facebook so it must be good. I should learn and use it.

False
There's no guarantee that those tools MUST be good. Don't fall for advertisements
Review 100 job descriptions on Linkedin/Indeed etc and find yourself what's popular on the market

I must learn Angular, React, Vue and XYZ web framework to master my web development skills

False
It's better to start the web development without the frameworks so that you understand how those frameworks are solving the problems of not using those frameworks
You don't need to learn all of these, one would be enough. If you started learning web development without using frameworks, switching between frameworks would be easier.

I know X1 framework/library/tool. But the job vacancy says mentions X2(the alternative of X1). I should not apply for this job.

False
Test yourself if you know X1 framework/library/tool how long you will take to learn X2.
As long as you know the abstract concepts and have worked on at least one pet/professional project on your own there's a high chance that you can learn another framework/library/tool quickly. They all are trying to solve a similar project but slightly different ways.
Also look for 'preferred' vs 'required' skills on job vacancies.

Everyone on social media hates language/framework X. X must be bad.

False
Don't fall for people's 'opinions'. People think languages/frameworks/tools as religion. They hate each other.
The best way to find out what to learn is to look at job vacancies. At least a hundred of them.

Language X does that in one line. So, it is the best language.

Neutral

 DB.allRecords().read().toCsv("file.csv");

It's nice that they provided that functionality in one line out of the box. But there is a great deal of code hidden behind the scene.
All languages support creating library modules to extend the feature. Some languages are by nature too abstract/low level and it requires developers to write libraries around it to make things simpler.
So, that doesn't mean language X is best.

Want to add more Q/A and correct sth?

Please submit a Pull Request at https://github.com/GT-Corp/myths-and-facts-about-programming/blob/master/README.md

AWS Java SDK - automatically detect the region

When the app is deployed in multiple regions in AWS, its useful to detect the region automatically without specifying the region by using a property/environment variable ourself.

We can detect the region by using the AWS SDK:

Regions.getCurrentRegion(); //returns Regions enum

Or by using:

EC2MetadataUtils.getEC2InstanceRegion(); //returns region String

Or:

System.getenv("AWS_REGION")

AWS DynamoDB - dynamic table prefix using DynamoDBMapper

We can use DynamoDBMapperConfig.TableNameOverride to configure the DynamoDBMapper and provide a custom/dynamic table name prefix using TableNameOverride.withTableNamePrefix(String).

Plain Java Example:

import com.amazonaws.services.dynamodbv2.*;
import com.amazonaws.services.dynamodbv2.datamodeling.*;

import java.util.UUID;

//code:

String prefix = "SOME_DYNAMIC_PREFIX"; //can be pulled from a dynamic logic eg: profile, env variable etc
var mapperConfig = new DynamoDBMapperConfig.Builder()
        .withTableNameOverride(DynamoDBMapperConfig.TableNameOverride.withTableNamePrefix(prefix + "-"))
        .build();

var dynamoDB = AmazonDynamoDBClientBuilder.standard().build();
var dbMapper = new DynamoDBMapper(dynamoDB, mapperConfig);


// use it
dbMapper.load(MyTable.class, UUID.randomUUID());

Spring DynamoDB dynamic table prefix example


import com.amazonaws.services.dynamodbv2.*;
import com.amazonaws.services.dynamodbv2.datamodeling.*;
import org.springframework.context.annotation.*;
import java.util.UUID;

@Configuration
class AwsConfig {
    @Bean
    AmazonDynamoDB dynamoDB() {
        return AmazonDynamoDBClientBuilder.standard().build();
    }

    @Bean
    DynamoDBMapperConfig dynamoDBMapperConfig() {
        String prefix = "SOME_DYNAMIC_PREFIX"; //can be pulled from a dynamic logic eg: profile, env variable etc
        return new DynamoDBMapperConfig.Builder()
                .withTableNameOverride(DynamoDBMapperConfig.TableNameOverride.withTableNamePrefix(prefix + "-"))
                .build();
    }

    @Bean
    DynamoDBMapper dynamoDBMapper(AmazonDynamoDB dynamoDB, DynamoDBMapperConfig dynamoDBMapperConfig)
    {
        return new DynamoDBMapper(dynamoDB, dynamoDBMapperConfig);
    }
}


import com.amazonaws.services.dynamodbv2.datamodeling.*;
import java.util.UUID;
@DynamoDBTable(tableName = "person")
public class MyTable {
    @DynamoDBHashKey
    @DynamoDBAutoGeneratedKey
    UUID id;

    String name;
    //getter setter/other fields

Spring Boot - How to skip cache thyemeleaf template, js, css etc to bypass restarting the server everytime

The default template resolver registered by Spring Boot autoconfiguration for ThyemeLeaf is classpath based, meaning that it loads the templates and other static resources from the compiled resources i.e, /target/classes/**.

To load the changes to the resources (HTML, js, CSS, etc), we can

Restart the application every time- which is of course not a good idea!
Recompile the resources using CTRL+F9 on IntelliJ or (CTRL+SHIFT+F9 if you are using eclipse keymap) or simply Right Click and Click Compile
Or a better solution as described below !!

Thymeleaf includes a file-system based resolver, this loads the templates from the file-system directly not through the classpath (compiled resources).

See the snippet from DefaultTemplateResolverConfiguration#defaultTemplateResolver

@Bean
public SpringResourceTemplateResolver defaultTemplateResolver() {
 SpringResourceTemplateResolver resolver = new SpringResourceTemplateResolver();
 resolver.setApplicationContext(this.applicationContext);
 resolver.setPrefix(this.properties.getPrefix());

Where the property prefix is defaulted to "classpath:/template/". See the snippet ThymeleafProperties#DEFAULT_PREFIX

public static final String DEFAULT_PREFIX = "classpath:/templates/";

The Solution:

Spring Boot allows us to override the property 'spring.thymeleaf.prefix' to point to source folder 'src/main/resources/templates/ instead of the default "classpath:/templates/" as folllows.

In application.yml|properties file:

spring:
    thymeleaf:
        prefix: file:src/main/resources/templates/  #directly serve from src folder instead of target

This would tell the runtime to not look into the target/ folder. And you don't need to restart server everytime you update a html template on our src/main/resources/template

What about the JavaScript/CSS files?

You can further go ahead and update the 'spring.resources.static-locations' to point to your static resource folder (where you keep js/css, images etc)

spring:
    resources:
        static-locations: file:src/main/resources/static/ #directly serve from src folder instead of target        cache:
          period: 0

The full code:

It a good practice to have the above configuration during development only. To have the default configuration for production system, you can use Profiles and define separate behaviour for each environment.

Here's the full code snippets based on what we just described!

Project Structure:

Pom.xml:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <artifactId>my-sample-app</artifactId>
    <packaging>jar</packaging>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.1.3.RELEASE</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>

    <properties>
        <java.version>11</java.version>
    </properties>

    <dependencies>
        <!-- the basic dependencies as described on the blog -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-thymeleaf</artifactId>
        </dependency>
    </dependencies>

    <build>
        <finalName>${build.profile}-${project.version}-app</finalName>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>

    <profiles>

        <!-- Two profiles -->

        <profile>
            <id>dev</id>
            <activation>
                <activeByDefault>true</activeByDefault>
            </activation>
            <properties>
                <spring.profiles.active>dev</spring.profiles.active>
                <build.profile>dev<build.profile>
            </properties>
        </profile>

        <profile>
            <id>prod</id>
            <properties>
                <spring.profiles.active>prod</spring.profiles.active>
                <build.profile>prod<build.profile>
            </properties>
        </profile>

    </profiles>

</project>

The property files (yml)

application-dev.yml

spring:
    profiles:
        active: dev
    thymeleaf:
        cache: false        prefix: file:src/main/resources/templates/  #directly serve from src folder instead of target    resources:
        static-locations: file:src/main/resources/static/ #directly serve from src folder instead of target        cache:
            period: 0

application-prod.yml (doesn't override anything)

spring:
    profiles:
        active: prod

Hope this helps!

Web Scrapping in Java using JSoup

Example of Web Scrapping in Java using JSoup

In this blog I'm going to describe how we can use JSoup library to scrap content from a website. The websites uses a standard markup called HTML to display documents in a web browser. They contain XML like document structure composed of elements and attributes.

<rootElement> //element with tag rootElement

<aTag width="10" height="20" color="RED"> //sub element aTag with attributes width, height etc

<content>Hello</content> //another nested sub element

</aTag>

<summary> This is summary.</summary> //another element under root element

</rootElement>

Although a HTML document starts with <HTML> and the content are kept under <BODY> element, the actual semantics of HTML is irrelevant to web Scrapping because HTML is really an XML document. All the web scrapping libraries deals with parsing the XML and reading the data out of the XML document.

Let's build a Quotes scrapping app!

In this example we are going to extract quotes from goodreads.com(https://www.goodreads.com/quotes.

Step 1: Setup a skeleton Java Project with JSoup dependency

We are going to use Maven to add the JSoup dependency and build the project.

Step 1.a Generate Maven Project using maven archetype

mvn archetype:generate -DgroupId=gt -DartifactId=web-scrapper-java -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false

It generated the following files. Note that I deleted the AppTest.java under /src/test/java/gt/ because we won't be writing unit tests for this app.

├── pom.xml
├── src
│   └── main
│       └── java
│           └── gt
│               ├── App.java

Step 1.b Add JSoup dependency

I searched for jsoup dependency at https://mvnrepository.com/artifact/org.jsoup/jsoup and copied the following definition for the current version of jsoup and pasted inside <dependency> section


    <dependency>
        <groupId>org.jsoup</groupId>
        <artifactId>jsoup</artifactId>
        <version>1.13.1</version> <!-- use the new version -->
    </dependency>

I also deleted junit dependency from pom.xml since we won't be writing unit tests.

Step 2: Basic Scrapping Examples

Let's play with JSoup API first. See the examples below. Here we are parsing XML content from string and extracting several pieces of the content using cssQuery. Please refer to https://www.w3schools.com/cssref/css_selectors.asp for more examples of css query.


import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import static java.lang.System.out;

public class Test {


    public static void main(String[] args) {

        String html = "<rootElement> " +
                "   <aTag width='10' height='20' color='RED' class='C1'>  " +
                "        <content>Hello</content> " +
                "    </aTag>" +
                "   <aTag width='10' height='20' color='GREEN' class='C1'>  " +
                "        <content class = 'small-font'>Hello Again small font</content> " +
                "    </aTag>" +
                "    <summary>" +
                "       <content class = 'small-font'> This is summary in small font </content>" +
                "    </summary> " +
                "</rootElement>";

        Document doc = Jsoup.parse(html);

        //print all content element
        /*
        it prints:
            Hello
            Hello Again small font
            This is summary in small font
         */
        Elements els = doc.select("content");
        for (Element e : els) {
            out.println(e.text());
        }

        //text inside content element under aTag
        /*
        it prints:
            Hello
            Hello Again small font
         */
        for (Element e : doc.select("aTag > content")) {
            out.println(e.text());
        }

        //get all elements that have a color attribute and display the value of the attribute
        /*
        int prints
            RED
            GREEN
         */
        for (Element e : doc.getElementsByAttribute("color")) {
            out.println(e.attributes().get("color"));
        }

        //get all elements that have a attribute class = C1 attribute and display the value of the attribute
        /*
        int prints
            RED
            GREEN
         */
        for (Element e : doc.select(".C1")) {
            out.println(e.attributes().get("color"));
        }

        //read text inside a tag
                /*
        it prints:
            Hello Again small font
            This is summary in small font
         */
        for (Element e : doc.select(".small-font")) {
            out.println(e.text());
        }

    }
}

Step 3: Scrapping goodreads.com

Step 3.a Examine the html content

The first step is to examine the structure of the document to see where our data is located. Here we want to read the quote, author and the tags.

After inspecting the structure of the HTML through the inspect tool on browser, we can notice that:

The <div class='quote'> is repeated for each Quote.
The text inside 'quoteText' class.
Author name is inside authorOrTitle class under the quoteText class.
Tags are inside the 'quoteFooter' class

Here's the html content we are interested in. We want to extract the text in red.
<div class="quoteText">
      “I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.”
<br> ―
<span class="authorOrTitle">
    Marilyn Monroe
</span>
</div>
<div class="quoteFooter">
   <div class="greyText smallText left">
     tags:
       <a href="/quotes/tag/attributed-no-source">attributed-no-source</a>,
       <a href="/quotes/tag/best">best</a>,
       <a href="/quotes/tag/life">life</a>,
       <a href="/quotes/tag/love">love</a>,
       <a href="/quotes/tag/mistakes">mistakes</a>,
       <a href="/quotes/tag/out-of-control">out-of-control</a>,
       <a href="/quotes/tag/truth">truth</a>,
       <a href="/quotes/tag/worst">worst</a>
   </div>
   <div class="right">
     <a class="smallText" title="View this quote" href="/quotes/8630-i-m-selfish-impatient-and-a-little-insecure-i-make-mistakes">151963 likes</a>
   </div>
</div>

Step 3.b Read quotes from goodreads.com

In the above example we used a static String to parse. We can use Jsoup.connect(THE URL).get() to read a webpage and get the Document object as below:

Document doc = Jsoup.connect("https://www.goodreads.com/quotes?page=1").get();

The full code to read quote text, author and tags

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.IOException;
import java.util.List;
import java.util.stream.Collectors;

public class GoodReadsScrapper {

    public static void main(String[] args) throws IOException {
        Document doc = Jsoup.connect("https://www.goodreads.com/quotes?page=1").get();

        Elements quoteElements = doc.select(".quoteText");

        for (Element e : quoteElements) {

            //read quote text and the author from the body of quoteText css
            //e.text() returns all the visible text inside this element which also includes the author... use ownText to not look at child elements
            String qStr = e.ownText();
            String quoteText = qStr.replaceAll("“", "").replaceAll("”", "");

            //author is inside span inside authorOrTitle class within the current element
            String author = e.select(".authorOrTitle").text();

            //Tags: read sibling element of div with class 'quoteText', choose the one with class 'quoteFooter' and read the  a tags
            Elements tagElements = e.nextElementSiblings().select(".quoteFooter").select(".greyText").select("a");
            List<String> tags = tagElements.stream().map(Element::text).collect(Collectors.toList());

            System.out.println(quoteText + " By:" + author + " , Tags:" + tags);
        }
    }

}

Step 4: Thinking Bigger:

What if we want to read quotes from multiple web sites?

What if we want to store the quotes to DB?

What if we want to run the scrapping job periodically?

For these 'what-ifs', I updated the above code to include following:

├── pom.xml
├── src
│   └── main
│       └── java
│           └── gt
│               ├── GoodReadsScrapper.java //implementation for GoodReads
│               ├── Quote.java //wrapper class to hold quote data
│               ├── QuoteScrapper.java //base interface
│               ├── ScrapperService.java //a job
│               ├── Source.java //enum to hold sources

The source is available at https://github.com/gtiwari333/java-web-scrapping-jsoup

A bigger (web app) application that uses Spring Boot, Angular is available here: https://github.com/gtiwari333/spring-boot-keycloak-angular-quote-app

Subscribe to: Posts ( Atom )