is there a manual for this thing? I've got some questions like:
What is Ngram keyword and ngram global.
My guess is that ngram keyword is closely related to the keyword while global is page or site related.
The ngram keywords are grouped together under numbers. Do those numbers have any meaning?
In the competitor matrix when I enter my site/page some of the columns are bordered with colors. Some red columns have green border others yellowish. Does that mean anything?
Sorry, not right now as the GUI and all is just in progress. It would make little sense to write a manual as it would probably be outdated with a new update.
> What is Ngram keyword and ngram global.
ngram keyword: the keywords where your keyword is included. No filter is used here.
ngram-global: all keywords which are closely related to your keywords or google finds to be important t make up the ranking of the sites as they are.
> Do those numbers have any meaning?
Yes, they mean the amount of sites that have this on there page. So the bigger the number, the more important it seems.
>In
the competitor matrix when I enter my site/page some of the columns are
bordered with colors. Some red columns have green border others
yellowish. Does that mean anything?
Yes they have a meaning of course. green means no change, red usually to fix or lower, yellow to increase. The value to compare against can be chosen in options. By default it is compared against Top 5 I think.
Say it sounds like what Google’s quality score calculation should be like.
When you type a keyword in Google, it will give you the results that it finds most relevant (and therefore that contain the semantic juice that we need).
With the calculation of ngrams we will be able to extract what makes the ranking of the best sites on Google but also all the associated context (longtrail / full sentence). It is neither more nor less than reverse engerining from Google ... are you aware of the potential that this represents?
It is very different from what we know and nothing that this option is worth gold! Very sincerely I keep telling Sven to increase the price of his software because it is really not the kind of tool that I would like to see in the hands of my competitors ^^
nGrams are basically keywords you need to use on your site to rank better. For that purpose, the tool analyses the top results of a query and shows you the keywords/phrases that are used on the results and shows you details for it.
Yeah, I know that. I was more interested in seeing what exactly the tool is displaying so I can compare it to the TF/IDF tools I'm currently using without having to install the trial.
1) Is there a reason you implemented ngram rather than TF/IDF (or WDF/IDF as it's called in Germany)?
2) Any plans to enrich the suggested keywords with results from Google's NLP API?
Say it sounds like what Google’s quality score calculation should be like.
When you type a keyword in Google, it will give you the results that it finds most relevant (and therefore that contain the semantic juice that we need).
With the calculation of ngrams we will be able to extract what makes the ranking of the best sites on Google but also all the associated context (longtrail / full sentence). It is neither more nor less than reverse engerining from Google ... are you aware of the potential that this represents?
It is very different from what we know and nothing that this option is worth gold! Very sincerely I keep telling Sven to increase the price of his software because it is really not the kind of tool that I would like to see in the hands of my competitors ^^
Are you living under a rock?
These tools exist already since many years, e.g. Page Optimizer Pro, Cora, Surfer SEO. There's even free ones such as https://www.seobility.net/en/tf-idf-keyword-tool/, although limited to content only while the former compare way more parameters.
I guess your competitors have been using them for a long time already
If you have ideas, this is the right place to explain them
Edit No I do not live on a rock ^^ but it is true that I am not used much. Having said that they are generally limited or quite expensive while we have all the research results on hand. With the right ideas, you can have it all in one software, just explain it to Sven submit it some results to understand how it works.
Below is a screenshot showing results for SEO in for google.de.
It shows per column: Top3,5,10 (avg of the results), the individual results and on the 4th column "your site" that you compare against (in this case gsa-online.de).
Now the ngrams-global shows possible keywords that should be shown in order to rank same or better as your compeditors.
A green X means you have that word/phrase on page, a red - means it is missing.
That 000011 means it's on 11 sites. 000010 on 10 sites and so on...
Edit By cons there are only unique words, it is more interesting to recover all that is the best, whatever the size. Besides, whatever the name used by the sites, I do not see what could be better.
Edit : Does anyone have a link or it would be possible to recover the semantics of a word or group of words? Maybe with Google Knowledge Graph api ?
@Sven Thanks for the screenshot and explanation. The ngram feature goes into the right direction, but TF/IDF takes this one step further:
1) TD/IDF is known as WDF/IDF in Germany and consists of two parts. WDF stands for "within document frequency" while IDF stands for "inverse document frequency".
The WDF value is used to determine how often a word or combination of words occurs in a document. However, this value is not completely alone (like the pure keyword density), but is set in relation to the relative occurrence of other terms that occur on the page. A special algorithm prevents that a pure spamming of a single word also spits out a better result in the analysis.
The second value in the analysis is the IDF value. This is about how often certain terms are found in documents. The number of known documents (= 10 URLs on 1st SERP) is compared to how often terms occur in a text. This algorithm is primarily intended to find out how relevant a text is with regard to a specific keyword.
If both formulas are combined, the well-known WDF*IDF analysis, which determines the relative term weighting of a document, results in relation to the possible documents that also contain the same main keyword.
2) I think it would be great to separately show keywords which are considered relevant to the first 3 SERP URLs by the Google NLP API. I would only display "new" keywords (and topics) which are not already listed in the ngram (or tf*idf) list. This would possibly give us an edge over the competition since it will show keywords considered relevant by Google but missing on the top 3 sites. Here's a tutorial with fully working code on how to achieve this: https://sashadagayev.com/systematically-analyze-your-content-vs-competitor-content-and-make-actionable-improvements/
1) Thanks for the explanation. Though I think it's just the same as it's added now but with different statistics behind. What you see in the listing now is really the data you would expect, maybe in some different order but still I think it's ranked good.
Bur yes, I will probably make a new listing for found terms on a new form to show more stats on each term/phrase.
2) I really would like not to add google APIs here.
Below we have the 10 sites. Would it be possible to have check boxes to deselect the sites we want to remove from the analysis?
Edit I would also like to know if all is scraped only once from the start (all the html of each page) in order to avoid the use of proxy in the event of multiple manipulation.
It could also be interesting to know which platform (wordpress, drupal ...) the pages of the sites use.
Another option which could be really nice, would be that if we launch an analysis and one of the results fails due to an IP blocking, that automatically and in a transparent way Keyword Research goes to seek an adequate proxy to finish the job. It would be a luxury !
I know that there is already a tool to scrape proxies but here I am talking about full automation. The moment we land with a good coffee, ready to analyze our competitors and optimize our site should not be interrupted to scrape proxies even if it's very fast, you can lose ideas, in productivity. For the moment I do not need it (proxy) but if you want to do the analysis, page after page it will quickly take. Of course on the proxy scraper is a good thing, having said that doing a compatibility test directly on the target should be very fast. What do you think of that ? If the proxy to find takes a little time to be found, the loading of the results can be displayed without delay, just explain what is happening and that it will happen just after.
Would it be possible at the competition research module to add and delete websites? Maybe a negative list would come handy there if someone makes mass competitor research.
My main problem is that I have social media sites like pinterest, youtube, etc.. which results I don't really need. Also when I compare against the first 3 results and the first 3 are pinterest the results doesn't worth much.
Also I think it would come handy the ability to add sites and comparing them to each other too.
2.
Ngrams look to be very handy but maybe a negative word list could come handy there to narrow the results if needed. Like not much use I see in policy, privacy policy, facebook, navigation, etc.. words.
I'm not totally against of these words as it may come handy at the first stage of keyword research when you are trying to determine what you need to have on a site in a given niche but later stages it's rather useless.
It could crawl the site or simply read the URLs from a sitemap depending on which method we choose to acquire the data. It could show all the URLs a site has with various technical information about each URL like: Status code (useful to find errors) Indexability (to see if the url is blocked by robots.txt or by other means) Inbound links (number of links a given URL recives from the rest of the site, this should be exportable in some way) This is useful to build silos. It would be also useful to visualize these links in some way but I'm still thinking about a way to do it. Inbound links that are unique to the examined page only Outbound links (number of links a URL has to an external domain, this also should be exportable) Outbound links (unique) Percentage of Response time Redirect type Redirect URLs Structured data info like Errors, warnings
A bit more content focused research tab could contain information about Page title (showing the whole page title to spot possible errors and character length with pixel width and optional red and green indictor) Meta description Meta keywords H1 H2 Content word count Size Last modified
These are all what I can think to be useful right now but others may chime in and tell what are their favorites. Also you can sneak a peak at screamingfrog and see what other technical stuff they show. Probably they do that for a reason ;D
OK I see what you mean...indeed it's not really much work in adding the current COMPARE function for all URLs of a domain and it's links. Though lets focus on this COMPARE for now an let it transform into a bigger stuff later.
What do you think of showing who is the owner of the domain (and maybe even the host)? If this is on a top 10 we could have to do several times with the same competitor who use the same strategy.
I did not pay attention if the age of the domain was integrated during the last updates.
I have an idea which seems to me as sympathetic but which reverses the process a little (code). To be able to select for example x urls / sites / pages .. and compared them compared to a request.
Let's say that I want to compose the writing of an article and that I want to compare that of the competitors to compose mine more effectively.
It is a bit the opposite of comparing several results to integrate what is missing, here it would rather be when we start from zero.
Speaking of which Sven, do you plan to add some sort of text editor? For example I compose my article and it makes me keyword suggestions, color the words used, ect ... for example with Google suggestions or why not if I import keywords from Google planner?
He could also suggest me ngrams, longtrail, context sentences extracted from my competitors.
In Competitor Matrix, the names of the sites do not appear when hovering over the red boxes (not present). For Keyword competition, before the matrix, we do not have the information (age of the domain and other) of the domain to compare since we enter it by clicking on compare. It may be necessary to enter an operation just before
Comments
Very sincerely I keep telling Sven to increase the price of his software because it is really not the kind of tool that I would like to see in the hands of my competitors ^^
1) Is there a reason you implemented ngram rather than TF/IDF (or WDF/IDF as it's called in Germany)?
2) Any plans to enrich the suggested keywords with results from Google's NLP API?
These tools exist already since many years, e.g. Page Optimizer Pro, Cora, Surfer SEO. There's even free ones such as https://www.seobility.net/en/tf-idf-keyword-tool/, although limited to content only while the former compare way more parameters.
I guess your competitors have been using them for a long time already
If you have ideas, this is the right place to explain them
Edit No I do not live on a rock ^^ but it is true that I am not used much. Having said that they are generally limited or quite expensive while we have all the research results on hand. With the right ideas, you can have it all in one software, just explain it to Sven submit it some results to understand how it works.
Edit By cons there are only unique words, it is more interesting to recover all that is the best, whatever the size. Besides, whatever the name used by the sites, I do not see what could be better.
Edit : Does anyone have a link or it would be possible to recover the semantics of a word or group of words? Maybe with Google Knowledge Graph api ?
1) TD/IDF is known as WDF/IDF in Germany and consists of two parts. WDF stands for "within document frequency" while IDF stands for "inverse document frequency".
2) I think it would be great to separately show keywords which are considered relevant to the first 3 SERP URLs by the Google NLP API. I would only display "new" keywords (and topics) which are not already listed in the ngram (or tf*idf) list. This would possibly give us an edge over the competition since it will show keywords considered relevant by Google but missing on the top 3 sites. Here's a tutorial with fully working code on how to achieve this: https://sashadagayev.com/systematically-analyze-your-content-vs-competitor-content-and-make-actionable-improvements/
Edit I would also like to know if all is scraped only once from the start (all the html of each page) in order to avoid the use of proxy in the event of multiple manipulation.
It could also be interesting to know which platform (wordpress, drupal ...) the pages of the sites use.
A classification of the loading time also.
If the proxy to find takes a little time to be found, the loading of the results can be displayed without delay, just explain what is happening and that it will happen just after.
It could show all the URLs a site has with various technical information about each URL like:
Status code (useful to find errors)
Indexability (to see if the url is blocked by robots.txt or by other means)
Inbound links (number of links a given URL recives from the rest of the site, this should be exportable in some way) This is useful to build silos. It would be also useful to visualize these links in some way but I'm still thinking about a way to do it.
Inbound links that are unique to the examined page only
Outbound links (number of links a URL has to an external domain, this also should be exportable)
Outbound links (unique)
Percentage of
Response time
Redirect type
Redirect URLs
Structured data info like Errors, warnings
A bit more content focused research tab could contain information about
Page title (showing the whole page title to spot possible errors and character length with pixel width and optional red and green indictor)
Meta description
Meta keywords
H1
H2
Content word count
Size
Last modified
These are all what I can think to be useful right now but others may chime in and tell what are their favorites. Also you can sneak a peak at screamingfrog and see what other technical stuff they show. Probably they do that for a reason ;D
Thanks @Sven for the next "GREAT" tool
For Keyword competition, before the matrix, we do not have the information (age of the domain and other) of the domain to compare since we enter it by clicking on compare. It may be necessary to enter an operation just before